Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose set_seed method #40

Closed
jlmelville opened this issue Apr 10, 2019 · 4 comments
Closed

Expose set_seed method #40

jlmelville opened this issue Apr 10, 2019 · 4 comments

Comments

@jlmelville
Copy link
Contributor

The underlying C++ classes (and the Python wrapper) expose a set_seed method which seeds the internal KissRandom rng. Could it be added to RcppAnnoy?

I have an issue where if an Annoy index gets written to disk and is larger than 2GB in size, Annoy is unable to read it back in (at least on my machine, the off_t type used internally by Annoy is 32-bit). A possible work-around would be to contruct multiple smaller indexes, and then merge the results. But this only works if the seed can be altered for each build.

I would be happy to work on a pr for this if there is a chance of it being accepted.

@eddelbuettel
Copy link
Owner

Yes, could do if we do it carefully. For R extensions we often prefer to use R's RNGs to be reproducible which is why I had not gone there yet -- for (Rcpp)Annoy it is arguably as important to play nice with the Python (and other languages) behaviour.

The code is a wee bit byzantine as we use Rcpp Modules in a templated fashion--let me know if you get lost. Sending a single scalar down should be pretty straightforward.

@eddelbuettel
Copy link
Owner

I just had ten minutes and a cup of tea at hand -- can you check the branch I just pushed?

I will probably make it a PR in due course.

@jlmelville
Copy link
Contributor Author

Thanks for looking into it with such alacrity. Looks good to me. I made the same changes to annoy.cpp (but was not quick enough to submit the PR), and I am getting different results with different seeds, so it's working.

There's a test/seed_test.py in Annoy, but it doesn't do much except test that keeping the seed the same doesn't affect the results.

Other than that, I was thinking of modifying the vignette to note that the seed can be modified. I can make that a separate PR if desired.

@eddelbuettel
Copy link
Owner

Sounds good on all fronts. Will merge now -- thanks for checking -- and had pondered a unit test too. Validating that two two different seeds yield different vectors seems good enough. Ditto with a brief mention in the vignette, so sync with what the repo will look like in a second and ship a follow-up PR.

jlmelville added a commit to jlmelville/rcppannoy that referenced this issue Apr 11, 2019
eddelbuettel pushed a commit that referenced this issue Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants