-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Jaccard index? #2
Comments
Hey @luizirber, I think it would be a bit difficult to integrate Jaccard distance into the current codebase, since it is mainly adapted for the Levenshtein distance calculation. It would likely be easier to make a separate crate for Jaccard index. I considered this feature before, but I thought it would be better to only focus on edit distance. Additionally, I'm current too busy with other projects to implement it. Sorry. Eventually, I would likely implement SIMD popcount in Rust in a separate project, since I don't believe anyone has done it yet and its very useful. If you have time to look into it, then I think it shouldn't be too hard to get a Rust port, as the code is not too long. The dense macros are just to make it work on multiple platforms. |
No problem =] |
Hey @luizirber, I am interested in implementing an SSE/AVX popcount routine in another project: https://github.com/natir/nuc2bit. Perhaps you can use the code (when I am done) to implement a Jaccard distance function? |
Sounds great, and happy to see you working with Pierre =] |
Hey, I wrote up a quick popcount implementation here using the shuffle-based algorithm. For Jaccard distance, you are going to have to modify the code to AND and OR the vectors before applying I haven't thoroughly benchmarked my implementation since I just finished it, but it should be pretty fast. |
Hi @Daniel-Liu-c0deb0t!
I didn't dig down too much into the codebase, but I wonder if there is interest in adding a Jaccard index function (as described in Faster Population Counts Using AVX2 Instructions.
The implementation from the paper is at https://github.com/CountOnes/hamming_weight/blob/1dd7554c0fc39e01c9d7fa54372fd4eccf458875/src/avx_jaccard_index.c
and while it is a bit heavy on the macros I think it could be translated using the infra you already have in triple_accel?
(This would be incredibly useful for me, so I can start looking deeper into it if you're also interested =])
The text was updated successfully, but these errors were encountered: