Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use SHA512 when seeding MersenneTwister #37766

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

rfourquet
Copy link
Member

This hashes the provided seeds before feeding dSFMT initialization routine.
The passed seeds are still stored as-is in the MersenneTwister.seed field.

Goals:

  • possibly lead to more independance between two streams corresponding to two
    user-provided seeds
  • by feeding VERSION.minor and VERSION.major to the hashing process, this
    guarantees that random streams change between minor releases, helping users
    to not believe in stream stability

Closes #37165.

Performance

Random.seed!(rng, 0) takes, on my machine,

  • 12.655 μs on master
  • 13.460 μs on this branch but using SHA1 (6% slowdown)
  • 14.320 μs on this branch, with SHA2_512 (13% slowdown)

As seeding is usually not a time-critical operation, the slowdown here seems acceptable (it will probably be a different story when Xoshiro lands as the default RNG...)

Handling our tests

Many of our tests use Random.seed!(), and these would have to be changed for all minor releases. Alternatives include:

  • having an escape hatch, e.g. seed!(seed, hash=false), or Random.seednohash(seed)
  • implement a simple RNG with stable streams in our tests...

This hashes the provided seeds before feeding dSFMT initialization routine.
The passed seeds are still stored as-is in the `MersenneTwister.seed` field.

Goals:
* possibly lead to more independance between two streams corresponding to two
  user-provided seeds
* by feeding `VERSION.minor` and `VERSION.major` to the hashing process, this
  guarantees that random streams change between minor releases, helping users
  to not believe in stream stability
@rfourquet rfourquet added the domain:randomness Random number generation and the Random stdlib label Sep 26, 2020
@@ -14,6 +14,8 @@ using .DSFMT
using Base.GMP.MPZ
using Base.GMP: Limb

import SHA
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As SHA is accessed, I guess yes? I prefer using qualifed names like SHA.update! here. Or did I misunderstand the question?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see: it's used in RNG.jl but not in this file. Might be clearer to move the import SHA into the file that uses it, but this is valid as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok I understand. it's just that Random.jl already happens to have all the import/using directives necessary for the other files it includes.

@StefanKarpinski
Copy link
Sponsor Member

Aside from my question (and CI), I approve very much.

@rfourquet
Copy link
Member Author

and CI

Yes, I'm not sure what to do here. We don't want to update the seeds in tests for each minor release (finding a working seed is very tedious). I think I will go with a the hack seed!(rng, hash=false) for now.

@rfourquet
Copy link
Member Author

Also, many jldoctests will stop working, and there we don't want to expose seeding with hash=false. I would just remove the jldoctest part, but wanted to gather suggestions before going this path. The obvious alternative is to update the examples for each release, which would be less annoying than updating all the tests (in LinearAlgebra in particular), but still tedious.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Sep 30, 2020

If it's painful for us, then perhaps it's too painful to inflict on everyone else too. Finding seeds that work shouldn't really be that hard: if the tests are valid, then most seeds should work. The trouble, I guess, is that while most seeds should work for each test, the chance that a given global seed will allow all tests to pass is slim. It seems like each stochastic test should get its own seed then and you just try new random seeds for any test that fails and most choices should work.

@rfourquet
Copy link
Member Author

For doctests, I don't have a strong opinion what should be done. But for tests, I think it's easier than in the past to find fitting seeds in Base, probably because indeed seeding is done at a more granular level. I don't mind at all doing it for this PR, but the simple PR of bumping the minor version of Julia will likely not pass tests.

If it's painful for us, then perhaps it's too painful to inflict on everyone else too

So for tests, that's kinda the point of the PR, to motivate people to not rely on MersenneTwister for this kind of tests. But in Base we don't really have an alternative.

@rfourquet
Copy link
Member Author

If this is to be done, consider putting the hashing logic within make_seed, cf. #41558 (comment)

rfourquet added a commit that referenced this pull request Sep 23, 2023
This addresses a part of #37165:

> It's common that sequential seeds for RNGs are not as independent as one might like.

This clears out this problem for `MersenneTwister`, and makes it easy to
add the same feature to other RNGs via a new `hash_seed` function, which
replaces `make_seed`.

This is an alternative to #37766.
rfourquet added a commit that referenced this pull request Sep 29, 2023
This addresses a part of #37165:

> It's common that sequential seeds for RNGs are not as independent as one might like.

This clears out this problem for `MersenneTwister`, and makes it easy to
add the same feature to other RNGs via a new `hash_seed` function, which
replaces `make_seed`.

This is an alternative to #37766.
rfourquet added a commit that referenced this pull request Sep 29, 2023
This addresses a part of #37165:

> It's common that sequential seeds for RNGs are not as independent as
one might like.

This clears out this problem for `MersenneTwister`, and makes it easy to
add the same feature to other RNGs via a new `hash_seed` function, which
replaces `make_seed`.

This is an alternative to #37766.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

default RNG seeding: use SHA2(minor, seed) to improve seeding
2 participants