Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_distance('method',str1, str2) as a generic distance function #7324

Closed
simonaubertbd opened this issue Sep 14, 2022 · 5 comments
Closed
Labels
enhancement New feature or request
Milestone

Comments

@simonaubertbd
Copy link

Is your feature request related to a problem? Please describe.
I would like to use several different methods to calculate distance between two strings :
-levenshtein (already exists)
-hamming
-jaro winkler
-damerau levenshtein
-...

Describe the solution you'd like
a generic function string_distance('method',str1, str2) with useful error messages about methods when incorrect

Describe alternatives you've considered
several functions : seems too much for a simple usage

Best regards,

Simon

@mvdvm
Copy link
Contributor

mvdvm commented Sep 15, 2022

For extending MonetDB with more string distance functions, you can create a user defined function (in either SQL or C or Python). See https://www.monetdb.org/documentation/user-guide/sql-programming/function-definitions/
This is also possible for the proposed generic function string_distance('method',str1, str2).

If you require this feature to be implemented natively in MonetDB source code (like levenshtein()), please contact MonetDB Solutions

@mvdvm mvdvm added the enhancement New feature or request label Sep 15, 2022
@sjoerdmullender
Copy link
Member

Some or all of these (or something closely related) is now in the Jun2023 releases. See the release notes at https://www.monetdb.org/release-notes/jun2023/

@njnes njnes closed this as completed Oct 28, 2023
@simonaubertbd
Copy link
Author

Hello @njnes and @sjoerdmullender First of all, thanks, that's cool to see you take users feedback into account. However, I'm a little surprised by what I read in the release notes :
"Renamed previous Levenshtein distance to Damerau-Levenshtein distance."

Levenshtein and Damerau-Levenshtein are two different things. Does it mean the old levenshstein distance function was actually a damerau-levenshtein? I'm ashamed I didn't even notice.

Best regards,

Simon

@njnes
Copy link
Contributor

njnes commented Oct 29, 2023

When I look back at the older branches I indeed see that the levenshtein function we had in that time already included transpositions, which I think is the big difference between levenshtein and Damerau-Levenshtein. So yes we use to have the incorrect (or more advanced) implementation, the rename is correct.

@sjoerdmullender sjoerdmullender added this to the NEXTRELEASE milestone Nov 3, 2023
@simonaubertbd
Copy link
Author

@njnes Thanks for the clarification !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants