Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing unit test for market name #24

Open
mre opened this issue Jun 2, 2019 · 3 comments
Open

Fix failing unit test for market name #24

mre opened this issue Jun 2, 2019 · 3 comments

Comments

@mre
Copy link
Member

mre commented Jun 2, 2019

In #23 (comment), @kiwita88 found the likely reason why our unit test for the market name 'p e n ny' fails. We should fix that.

@rayrrr
Copy link

rayrrr commented Sep 7, 2019

difflib is what is currently in use for this buggy feature. The diffing algorithm used by difflib is called Ratcliff-Obershelp and seems to be generic in regards to data type (binary data, strings, etc.). There are better algorithms for determining fuzzy string similarity such as Levenshtein. I believe switching algorithms is the best solution here.

What do you think, @mre? I could be convinced to write up a PR if no other contributor can. If you're comfortable adding a dependency, it might make sense to lean on https://github.com/seatgeek/fuzzywuzzy for this too.

@rayrrr
Copy link

rayrrr commented Sep 7, 2019

Update: the existing packages I mentioned are GPLv2 licensed which may not be desired so perhaps just a direct implementation of the Levenshtein algorithm could be added for this feature. Plenty of inspiration is available.

@mre
Copy link
Member Author

mre commented Sep 8, 2019

Hey @rayrr,
thanks for your input. Yes, switching to Levenshtein would be worth a try. Whether we use a library or not doesn't matter to me. Also GPLv2 is fine in my book.
So if you like and you find the time, please go ahead and whip up a PR for this. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants