New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force the random number generator to be seeded #1502
Conversation
Use the legacy np.random.RandomState class instead of the newer np.random.Generator class. With the newer class the results of simulations would change, even when the seed would be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge in v2 branch.
# # For the newer np.random.Generator class, the seed setting would be as follows: | ||
# # https://numpy.org/doc/stable/reference/random/index.html#quick-start | ||
# if seed is None: | ||
# seed = secrets.randbits(128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's bump numpy and use this.
I tried to use the newer |
Thanks for the update! Let's keep it like this then. |
!! Great! |
This pull request tries to improve how we deal with random number generation and random seeds.
At the moment, we use a random number generator, and optionally you can choose to set a seed. I propose to make the random seed obligatory. This will make simulations or systematic reviews that use ASReview more reproducible.
The way I force random seeds to be used is by introducing an new
SeededRandomState
class, and a constructor functionget_random_state
. TheSeededRandomState
class is exactly the same asnp.random.RandomState
, but has the added attributeseed
, which is always set. The way to use this class in the code is by using theget_random_state
function, which will make sure that the random seed is always set correctly. This follows the pattern from numpy where they have thenp.random.Generator
class and a constructor functionnp.random.default_rng()
which is the preferred way to instantiate the class.Remarks:
np.random.RandomState
, and not the newernp.random.Generator
class. The reason is that with the newer class results of simulations would change, even when the random seed is the same. The downside is that this legacy class is frozen and will not be improved anymore by numpy. So if in the future someone needs random number generating capabilities not innp.random.RandomState
then we would need to switch to the newer class. I have some comments in the code about how it should look you we want to use the newer class.