Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust Bindings #73

Closed
michaelgrigoryan25 opened this issue Jan 25, 2024 · 15 comments
Closed

Rust Bindings #73

michaelgrigoryan25 opened this issue Jan 25, 2024 · 15 comments
Assignees
Labels

Comments

@michaelgrigoryan25
Copy link
Contributor

Continuation of #66.

@michaelgrigoryan25
Copy link
Contributor Author

michaelgrigoryan25 commented Jan 25, 2024

@ashvardanian regarding the "fingerprints" in the table that you've shared in the PR, is it the same as sz_hash?

@ashvardanian
Copy link
Owner

Not the same, but related. Fingerprints are rolling hashes, which are used to populate a bitset.

@michaelgrigoryan25
Copy link
Contributor Author

In that case which is the function for generating fingerprints using StringZilla?

@ashvardanian
Copy link
Owner

ashvardanian commented Jan 25, 2024

@michaelgrigoryan25, it's called sz_fingerprint_rolling 🤗

I am not sure about what's the best Rust interface for it should look like, so let's keep it for the end.

@michaelgrigoryan25
Copy link
Contributor Author

These are the most commonly used string types in Rust:

  • &str
  • String
  • &String
  • Cow<'_, str>
  • Cow<'_, String>

@michaelgrigoryan25
Copy link
Contributor Author

These are the most commonly used string types in Rust:

  • &str
  • String
  • &String
  • Cow<'_, str>
  • Cow<'_, String>

I can implement a macro which implements a common trait for all these types, so that methods like sz_find can be accessed directly, by only importing the trait via use.

@ashvardanian
Copy link
Owner

Sure. How about the AsRef<[u8]> I currently use?

@michaelgrigoryan25
Copy link
Contributor Author

That would work.

@michaelgrigoryan25
Copy link
Contributor Author

@ashvardanian
Copy link
Owner

@michaelgrigoryan25 this looks good! Want to open a PR or want to add a few more things before that?

@michaelgrigoryan25
Copy link
Contributor Author

Sure, let's do it right now.

@ashvardanian
Copy link
Owner

Thanks a lot, great patches, @michaelgrigoryan25! In C++ I've implemented lazy-evaluated convenience functions, like find_all, rfind_all, split_all, rsplit_all, and so on. Took around 400 lines of code. I think it might be a great idea to implement them in Rust as well. What do you think? Would you be interested in adding those and the Levenshtein / Needleman-Wunsch alignment scores??

@michaelgrigoryan25
Copy link
Contributor Author

@ashvardanian definitely, let's do it!

@ashvardanian
Copy link
Owner

As mentioned in #79, I am not sure about the right course of action here. The other operations, like #82 or random string generation might be more relevant. We should also benchmark against memchr and other native Rust string projects.

@ashvardanian
Copy link
Owner

Benchmarks are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants