Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Damerau-Levenshtein #152

Open
Entomy opened this issue Aug 23, 2020 · 1 comment
Open

Damerau-Levenshtein #152

Entomy opened this issue Aug 23, 2020 · 1 comment
Labels
🛠 Enhancement New feature or request 👨🏻‍🎓 Good First Issue Good for newcomers 🆘 Help Wanted Extra attention is needed
Milestone

Comments

@Entomy
Copy link
Owner

Entomy commented Aug 23, 2020

Damerau-Levenshtein is an extension of the Levenshtein edit-distance algorithm to additionally support basic transposition. As such, it gives noticeably better results, and should be implemented and preferentially used.

@Entomy Entomy transferred this issue from another repository Aug 29, 2020
@Entomy Entomy changed the title Damerau-Levenshtein Metrics: Damerau-Levenshtein Aug 29, 2020
@Entomy Entomy transferred this issue from Entomy/LibLangly Sep 28, 2020
@Entomy Entomy changed the title Metrics: Damerau-Levenshtein Damerau-Levenshtein Sep 28, 2020
Repository owner deleted a comment from issue-label-bot bot Sep 28, 2020
Repository owner deleted a comment from issue-label-bot bot Sep 28, 2020
Repository owner deleted a comment from issue-label-bot bot Sep 28, 2020
@Entomy Entomy transferred this issue from another repository Nov 4, 2020
@Entomy Entomy added 🆘 Help Wanted Extra attention is needed 👨🏻‍🎓 Good First Issue Good for newcomers 🛠 Enhancement New feature or request labels Nov 4, 2020
@Entomy
Copy link
Owner Author

Entomy commented Nov 4, 2020

I'm copying these over from #153

Damerau-Levenshtein edit-distance is probably the best compromise between accuracy and performance for general purpose use (other edit-distance algorithms should still be implemented). However there's two pitfalls when finding a good implementation for this. As a result, lifting an existing implementation isn't likely.

  1. The overwhelming majority of implementations actually implement Optimal String Alignment, which doesn't return the same results. OSA is not the same algorithm, although it is very similar.

  2. The few that implement DL implement it incorrectly. I've emailed the authors about this, and have received no response within a week, so I'm not particularly hopeful about that getting fixed.

So pretty par-for-the-norm with Comp. Sci. stuff.

These may be useful as far as a good implementation goes.

Hyyrö, H. A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances. Nord. J. Comput. 2002.

Balhaf, K. Alsmirat, M. et al. Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory. ICICS. 2017.

@Entomy Entomy added this to the v5.3 milestone Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🛠 Enhancement New feature or request 👨🏻‍🎓 Good First Issue Good for newcomers 🆘 Help Wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant