Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrueDamerauLevenshtein giving wrong results #3

Open
stapelberg opened this issue Apr 2, 2020 · 6 comments
Open

TrueDamerauLevenshtein giving wrong results #3

stapelberg opened this issue Apr 2, 2020 · 6 comments

Comments

@stapelberg
Copy link

I noticed that these 2 Go libraries differ in results for the same input:

(stringdist.TrueDamerauLevenshtein).Calculate(4XHYWD, YLKTW9) = 6

tdl.Distance(4XHYWD, YLKTW9) = 5

Looking at the Perl implementation Text::Levenshtein::Damerau, both implementations (pure perl and XS) return 5:

% perl -MText::Levenshtein::Damerau -E 'say Text::Levenshtein::Damerau::edistance("4XHYWD", "YLKTW9")'
5

…hence, I think the issue lies within your implementation. Could you take a look please?

PS: the edge case where s == t does not return 0 in your implementation either. Not sure if it’s the same bug or a separate issue.

@alextanhongpin
Copy link
Owner

Yup, the implementation was wrong, I've corrected it and added the case above in the tests too.

@stapelberg
Copy link
Author

Thanks for the fix! Unfortunately, the algorithm still doesn’t seem entirely correct (tested with v0.0.2-0.20200403031649-7456955b3f48). Here are a few more samples for input where it spits out the wrong result:

params: (3AWPLE, 9C9TEL), stringdist = 6, tdl = 5
params: (3C494W, CD3L93), stringdist = 6, tdl = 5
params: (3C494W, H4CEHY), stringdist = 6, tdl = 5
params: (3KYPL4, 4EDC4L), stringdist = 6, tdl = 5
params: (3TEALX, 9ET397), stringdist = 6, tdl = 5
params: (3XEKCD, X3EHPT), stringdist = 5, tdl = 4
params: (3YH7KD, 9XEK7C), stringdist = 6, tdl = 5
params: (3YPXC4, CPYXAP), stringdist = 5, tdl = 4
params: (4EDC4L, H4CEHY), stringdist = 6, tdl = 5
params: (4EDC4L, XD7EE4), stringdist = 6, tdl = 5
params: (4EDC4L, YH3WL4), stringdist = 6, tdl = 5
params: (73YPWE, TTPEAW), stringdist = 6, tdl = 5
params: (79XL7H, 97K4YY), stringdist = 6, tdl = 5
params: (79XL7H, EKLXCW), stringdist = 6, tdl = 5
params: (79XL7H, PP47LT), stringdist = 6, tdl = 5
params: (7WWEDC, 9L7CCD), stringdist = 6, tdl = 5
params: (7YHX3E, Y7HLCX), stringdist = 5, tdl = 4
params: (93APH4, X3EHPT), stringdist = 5, tdl = 4
params: (93APH4, YE9A3P), stringdist = 6, tdl = 5
params: (97D79H, T79977), stringdist = 5, tdl = 4
params: (9CXY4L, 97K4YY), stringdist = 5, tdl = 4
params: (9CXY4L, YH3WL4), stringdist = 6, tdl = 5
params: (9L7CCD, L9YTHX), stringdist = 6, tdl = 5
params: (APCL4W, XCPKKH), stringdist = 6, tdl = 5
params: (AX47P3, YE9A3P), stringdist = 6, tdl = 5
params: (CD3L93, 3C494W), stringdist = 6, tdl = 5
params: (CLYALY, CYLHKE), stringdist = 5, tdl = 4
params: (CPKAEX, WADYXE), stringdist = 6, tdl = 5
params: (DCKYYH, CD3L93), stringdist = 6, tdl = 5
params: (E3DL49, CD3L93), stringdist = 5, tdl = 4
params: (ECAYTL, PP47LT), stringdist = 6, tdl = 5
params: (EK7WWK, 97K4YY), stringdist = 6, tdl = 5
params: (H34LL4, 4EDC4L), stringdist = 6, tdl = 5
params: (H4CEHY, 3C494W), stringdist = 6, tdl = 5
params: (H4CEHY, 4EDC4L), stringdist = 6, tdl = 5
params: (H4TW4D, 4XHYWD), stringdist = 5, tdl = 4
params: (H4TW4D, 7CTDD4), stringdist = 5, tdl = 4
params: (KH37YW, HKA447), stringdist = 6, tdl = 5
params: (KWCDWX, 4EDC4L), stringdist = 6, tdl = 5
params: (L3C7LP, 3LL3LK), stringdist = 5, tdl = 4
params: (L9YTHX, 9L7CCD), stringdist = 6, tdl = 5
params: (LH7AWK, Y7HLCX), stringdist = 6, tdl = 5
params: (LLPY7K, CYLHKE), stringdist = 6, tdl = 5
params: (LTE7PD, 9ET397), stringdist = 6, tdl = 5
params: (LTE7PD, XD7EE4), stringdist = 6, tdl = 5
params: (PDDH9W, YLKTW9), stringdist = 6, tdl = 5
params: (PLKHL9, EKLXCW), stringdist = 6, tdl = 5
params: (PPCWXC, XCPKKH), stringdist = 6, tdl = 5
params: (PPCWXC, Y7HLCX), stringdist = 6, tdl = 5
params: (T7LE94, 9L7CCD), stringdist = 6, tdl = 5
params: (TCD74H, 93APH4), stringdist = 6, tdl = 5
params: (W7LXKK, 9L7CCD), stringdist = 6, tdl = 5
params: (WEHETK, 9C9TEL), stringdist = 6, tdl = 5
params: (WEHETK, CYLHKE), stringdist = 6, tdl = 5
params: (WT9Y3H, 9ET397), stringdist = 6, tdl = 5
params: (WXPWY7, 4XHYWD), stringdist = 5, tdl = 4
params: (X3EHPT, 93APH4), stringdist = 5, tdl = 4
params: (X9E7CE, 9XEK7C), stringdist = 4, tdl = 3
params: (X9E7CE, YE9A3P), stringdist = 6, tdl = 5
params: (XD7EE4, 4EDC4L), stringdist = 6, tdl = 5
params: (Y3EE7X, 9ET397), stringdist = 6, tdl = 5
params: (Y7HLCX, LH7AWK), stringdist = 6, tdl = 5
params: (YC4KPT, 97K4YY), stringdist = 6, tdl = 5
params: (YC4KPT, CPYXAP), stringdist = 6, tdl = 5
params: (YC4KPT, H4CEHY), stringdist = 6, tdl = 5
params: (YE9A3P, 93APH4), stringdist = 6, tdl = 5
params: (YEP9L4, 4EDC4L), stringdist = 5, tdl = 4
params: (YEP9L4, CD3L93), stringdist = 6, tdl = 5
params: (YH3WL4, 4EDC4L), stringdist = 6, tdl = 5
params: (YPCAE3, XCPKKH), stringdist = 6, tdl = 5
params: (XD7EE4, 4EDC4L), stringdist = 6, tdl = 5

@alextanhongpin
Copy link
Owner

Out of curiosity, I tried finding other calculators in the wild to check the results, and it seems to match mine...

https://planetcalc.com/1721/
https://www.dcode.fr/levenshtein-distance

Take for example

params: (3AWPLE, 9C9TEL), stringdist = 6, tdl = 5

Both links above returned 6 too.

@alextanhongpin
Copy link
Owner

I believe the difference lies in this line:

https://github.com/lmas/Damerau-Levenshtein/blob/f2fb5f81ec65f73ee836c2adc53e7fa9a3e46bb9/damerau-levenshtein.go#L69

It seems like the author forgot to add 1 for each index. Before:

d[k][l]+(i-k-1)+1+(j-l-1), // transposition

After:

d[k][l]+(i-k)+1+(j-l), // transposition

@stapelberg
Copy link
Author

However:

% perl -MText::Levenshtein::Damerau -E 'say Text::Levenshtein::Damerau::edistance("3AWPLE", "9C9TEL")'        
5

Are you saying the Perl implementation is also incorrect?

@lmas
Copy link

lmas commented Jun 7, 2020

https://planetcalc.com/1721/
https://www.dcode.fr/levenshtein-distance

These links only calculates the levenshtein distance and won't handle transpositions.

For fun I ran your code against my tests:

--- FAIL: TestSimpelWordGroups (0.00s)
    damerau-levenshtein_test.go:24: expected score == 1, got 2 (a='azrety', b='azerty')
    damerau-levenshtein_test.go:24: expected score == 4, got 5 (a='beak', b='water')
    damerau-levenshtein_test.go:24: expected score == 1, got 2 (a='teh', b='the')
    damerau-levenshtein_test.go:24: expected score == 1, got 2 (a='tets', b='test')
    damerau-levenshtein_test.go:24: expected score == 1, got 2 (a='fuor', b='four')

I'm fairly sure your true_damerau_levenshtein implementation isn't handling transpositions correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants