Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force the random number generator to be seeded #1502

Merged
merged 5 commits into from Nov 16, 2023

Conversation

PeterLombaers
Copy link
Member

This pull request tries to improve how we deal with random number generation and random seeds.

At the moment, we use a random number generator, and optionally you can choose to set a seed. I propose to make the random seed obligatory. This will make simulations or systematic reviews that use ASReview more reproducible.

The way I force random seeds to be used is by introducing an new SeededRandomState class, and a constructor function get_random_state. The SeededRandomState class is exactly the same as np.random.RandomState, but has the added attribute seed, which is always set. The way to use this class in the code is by using the get_random_state function, which will make sure that the random seed is always set correctly. This follows the pattern from numpy where they have the np.random.Generator class and a constructor function np.random.default_rng() which is the preferred way to instantiate the class.

Remarks:

  • I use the legacy class np.random.RandomState, and not the newer np.random.Generator class. The reason is that with the newer class results of simulations would change, even when the random seed is the same. The downside is that this legacy class is frozen and will not be improved anymore by numpy. So if in the future someone needs random number generating capabilities not in np.random.RandomState then we would need to switch to the newer class. I have some comments in the code about how it should look you we want to use the newer class.
  • I have not yet the new random seed to the state file, so this is not yet stored. The main reason is that I think some more changes are needed to the settings object before we can do this properly. I think it's useful to make the change in this pull request as a first step.
  • There are always some caveats with how reproducible random number generation is. See also here. In general, I don't think we use very complicated algorithms, so I would expect that in our case this won't really matter.
  • This pull request is created by extracting the useful parts from New settings object #1424

Use the legacy np.random.RandomState class instead of the newer np.random.Generator class. With the newer class the results of simulations would change, even when the seed would be the same.
@J535D165 J535D165 added this to the v2.0 milestone Nov 6, 2023
Copy link
Member

@J535D165 J535D165 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge in v2 branch.

Comment on lines +294 to +297
# # For the newer np.random.Generator class, the seed setting would be as follows:
# # https://numpy.org/doc/stable/reference/random/index.html#quick-start
# if seed is None:
# seed = secrets.randbits(128)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bump numpy and use this.

@PeterLombaers
Copy link
Member Author

I tried to use the newer np.random.Generator class but stumbled onto a problem: scikit-learn/scikit-learn#16988
Scikitlearn doesn't actually support the new RNG yet. It's probably smarter to wait until they support it before making the change. So I would propose to merge this pull requests as is.

@J535D165
Copy link
Member

Thanks for the update! Let's keep it like this then.

@J535D165 J535D165 removed this from the v2.0 milestone Nov 16, 2023
@J535D165 J535D165 merged commit e8d5754 into asreview:master Nov 16, 2023
10 checks passed
@J535D165
Copy link
Member

!! Great!

cskaandorp pushed a commit to cskaandorp/asreview that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants