New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jaccard #1461
Jaccard #1461
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good. One minor comment:
return map_of_chars; | ||
} | ||
|
||
static inline map<char, idx_t> TabulateCharacters(map<char, idx_t> str, map<char, idx_t> txt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are making copies of the map here, is that intended?
This function also appears to only be used in one place. Perhaps better to inline it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right!
Thanks, this looks good now. |
This PR adds the Jaccard similarity function at character level: jaccard('ab', 'aaabbbb') = 1.0.