Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Encore uses weak random numbers #1452
In ap.c the normal
# copy-pasted from scikit-learn utils/validation.py def check_random_state(seed): """Turn seed into a np.random.RandomState instance If seed is None (or np.random), return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError. """ if seed is None or seed is np.random: return np.random.mtrand._rand if isinstance(seed, (numbers.Integral, np.integer)): return np.random.RandomState(seed) if isinstance(seed, np.random.RandomState): return seed raise ValueError('%r cannot be used to seed a numpy.random.RandomState' ' instance' % seed)
The noise addtion in
def run(...., noise=False, seed=None): if noise: rng = check_random_state(seed) matndarray += matndarray * rng.uniform(0, 1e-16, size=matndarray.shape)
This will allow to pass a separate seed for each python process ensuring that the random numbers are not correlated. Additionally the numpy uses the Mersenne Twister, a widely accepted RNG for scientific applications.
@mtiberti if it doesn't matter that the added noise might be correlated it would be nice if you can still add a comment in the C code.
Thanks for this. Since noise is added to remove degeneracies from the similarity matrix it shouldn't be a problem if it's slightly correlated - a potentially problematic case would be when the same noise is added to elements of the similarity matrix which are identical, which I think is unlikely. However since the change you're proposing is not very time-consuming and still adds to the robustness of the code, I'm going to implement it. The only problem is that I'm about to leave for holidays and I'll be back on the 26th of July and I'll unlikely be able to work on it before then