-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about version string_grouper group_similar_strings #80
Comments
Hi @dariswan The latest version is supposed to be much faster than older versions as your dataset-size increases. I would be interested to see how Thanks. |
There are no failures in group_similar_strings but I saw them as human eyes, sometimes giving inaccurate results to a single term.
Those 3 email suppose to in one group as human eyes |
Hi @dariswan For such a small set of strings the default similarity threshold (80%) is too large. Try 60%: import pandas as pd
from string_grouper import group_similar_strings emails = pd.Series(['messi1@gmail.com', 'messi12@gmail.com', 'messi21@gmail.com'])
email_df = emails.to_frame()
email_df[['group_id', 'group_rep']] = group_similar_strings(emails, min_similarity=0.64)
email_df
|
Yes I agreed with you, my threshold right now is 70% Thank you for the answer |
Dear developer,
Could you get me an explanation about the different versions of string_grouper?
I only use one function named as "group_similar_strings", currently I am using 0.1.1 version, but the latest version now is 0.6.1
this library is very helpful and great, but when I used function group_similar_strings with customer similarity, sometimes the result missed group the group as I checked human eyes.
Is it worth it if I upgrade the version to the latest version,? what is the improvement?
The text was updated successfully, but these errors were encountered: