Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

This is not the Dice coefficient #114

Open
vibl opened this issue Jun 26, 2021 · 1 comment
Open

This is not the Dice coefficient #114

vibl opened this issue Jun 26, 2021 · 1 comment

Comments

@vibl
Copy link

vibl commented Jun 26, 2021

Your algorithm is not the Dice coefficient. It counts all bigram duplicates, whereas the Dice coefficient only counts distinct bigrams (as defined in Wikipedia).

As an example, let's compare two versions of the main file of this repo (https://github.com/aceakash/string-similarity/blob/2718c82bbbf5190ebb8e9c54d4cbae6d1259527a/compare-strings.js and the latest https://github.com/aceakash/string-similarity/blob/eaeec5d74c98a6f6fcb1b06fad44ad7f3d8c2965/src/index.js. They have a Dice coefficient of 0.90, but this lib string-similarity outputs 0.74 when comparing these two files.

Please have a look at the implementations in Talisman, NLTK or in many languages in https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice%27s_coefficient

@aimeeaidanu
Copy link

frr bruh like "dollar' and "money" return a match of 0 :((( like dawg I want semantic similarity who needs string similarity anyways 🤷

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants