Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update MinHash constructor API in Rust to use scaled instead of max_hash #1134

Closed
ctb opened this issue Jul 30, 2020 · 3 comments · Fixed by #1139
Closed

update MinHash constructor API in Rust to use scaled instead of max_hash #1134

ctb opened this issue Jul 30, 2020 · 3 comments · Fixed by #1139

Comments

@ctb
Copy link
Contributor

ctb commented Jul 30, 2020

In #1128, we are deprecating the old max_hash parameter in favor of the new scaled parameter. They can be interconverted easily with no additional information (see functions _get_max_hash_for_scaled and _get_scaled_for_max_hash).

However, it is hard to fully remove max_hash from the Python API because it is used in the Rust API as well!
See the call to lib.kmerminhash_new in sourmash/minhash.py:MinHash.__init__ (currently in #1128 only).

So the goal of this issue is to swap out the max_hash parameter with the scaled parameter in the Rust API.

There are two places to do this in Rust - one is in the kmerminhash_new function, and the other is in the KmerMinHash::new code. I would suggest starting with the former and doing the conversion there, so that
we can get the API bridge between Python and Rust working first, but YMMV :)

Some suggestions for getting started -

  • get a full Python and Rust development environment up and running per the docs
  • dig into the implementation of kmerminhash_newin src/core/src/ffi/minhash.rs
  • add the scaled_to_max_hash and max_hash_to_scaled conversion functions into Rust
  • swizzle the mx parameter over to scaled in kmerminhash_new and then convert it to max_hash before passing it into the ::new method.
@ctb
Copy link
Contributor Author

ctb commented Jul 30, 2020

(please indicate if you are planning to work on this but don't yet have a PR up - thanks! we suggest creating a PR early on in the process, but if you don't want to, please "claim" this issue so that others don't work on it in tandem!)

@xmnlab
Copy link
Contributor

xmnlab commented Jul 30, 2020

@ctb thanks for the detailed information. I am starting to work on that. thanks :)

@luizirber
Copy link
Member

Note that we want to keep max_hash internally (in Rust), because it's easier to use in .add_hash. scaled_for_max_hash and max_hash_for_scaled are already implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants