-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose set_seed
method
#40
Comments
Yes, could do if we do it carefully. For R extensions we often prefer to use R's RNGs to be reproducible which is why I had not gone there yet -- for (Rcpp)Annoy it is arguably as important to play nice with the Python (and other languages) behaviour. The code is a wee bit byzantine as we use Rcpp Modules in a templated fashion--let me know if you get lost. Sending a single scalar down should be pretty straightforward. |
I just had ten minutes and a cup of tea at hand -- can you check the branch I just pushed? I will probably make it a PR in due course. |
Thanks for looking into it with such alacrity. Looks good to me. I made the same changes to There's a test/seed_test.py in Annoy, but it doesn't do much except test that keeping the seed the same doesn't affect the results. Other than that, I was thinking of modifying the vignette to note that the seed can be modified. I can make that a separate PR if desired. |
Sounds good on all fronts. Will merge now -- thanks for checking -- and had pondered a unit test too. Validating that two two different seeds yield different vectors seems good enough. Ditto with a brief mention in the vignette, so sync with what the repo will look like in a second and ship a follow-up PR. |
The underlying C++ classes (and the Python wrapper) expose a
set_seed
method which seeds the internalKissRandom
rng. Could it be added to RcppAnnoy?I have an issue where if an Annoy index gets written to disk and is larger than 2GB in size, Annoy is unable to read it back in (at least on my machine, the
off_t
type used internally by Annoy is 32-bit). A possible work-around would be to contruct multiple smaller indexes, and then merge the results. But this only works if the seed can be altered for each build.I would be happy to work on a pr for this if there is a chance of it being accepted.
The text was updated successfully, but these errors were encountered: