Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document random reproducibility policy #33350

Merged
merged 5 commits into from
Sep 23, 2019
Merged

document random reproducibility policy #33350

merged 5 commits into from
Sep 23, 2019

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Sep 21, 2019

See e.g. discussion in #30494 and on discourse.

See also https://numpy.org/neps/nep-0019-rng-policy.html for a similar discussion by the NumPy developers.

@stevengj stevengj added domain:docs This change adds or pertains to documentation domain:randomness Random number generation and the Random stdlib labels Sep 21, 2019
@tpapp
Copy link
Contributor

tpapp commented Sep 21, 2019

Thanks for writing this up.

Do you think we can/should make a commitment to reproducibility across OSs/architectures? The native Julia code should have this property (though we should be careful about endianness if the issue arises), but I don't know if the dSMFT code can guarantee this.

@stevengj
Copy link
Member Author

I would hope that the MT code would guarantee portability, since it is supposed to be following a specific number-theoretic sequence...

@stevengj
Copy link
Member Author

Should be good to merge? CI failure (cp: cannot stat 'dist-extras/7z.*': No such file or directory) is unrelated, obviously.

@KristofferC KristofferC merged commit cc6ae96 into master Sep 23, 2019
@delete-merged-branch delete-merged-branch bot deleted the randdoc branch September 23, 2019 12:08
@KristofferC
Copy link
Sponsor Member

cc @staticfloat for the win failure

@GregPlowman
Copy link
Contributor

I would hope that the MT code would guarantee portability, since it is supposed to be following a specific number-theoretic sequence...

Yes, but perhaps higher-level rand functions might be architecture-specific.
32-bit vs 64 bit
Alternative/future RNGs (other than MT)
Secure/crypto rands with hardware support

Would it be better to explicitly state policy w.r.t architecture?

#29240 (comment)
#29240 (comment)
#29240 (comment)


Software tests that rely on *specific* "random" data should also generally save the data or embed it into the test code. On the other hand, tests that should pass for *most* random data (e.g. testing `A \ (A*x) ≈ x` for a random matrix `A = randn(n,n)`) can use an RNG with a fixed seed to ensure that simply running the test many times does not encounter a failure due to very improbable data (e.g. an extremely ill-conditioned matrix).

The statistical *distribution* from which random samples are drawn *is* guaranteed to be the same across any minor Julia releases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this rule out the possibility to change the distribution of rand(::Type{<:AbstractFloat}) from uniform in [0,1) into uniform in (0,1) which is currently discussed? or is this a "minor" enough change?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In measure theoretic terms, those are technically the same distribution 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation domain:randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants