Skip to content

darwinagain/fuzzy-match

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fuzzy-Match

Fuzzy string matching in Python. By default it uses Trigrams to calculate a similarity score and find matches by splitting strings into ngrams with a length of 3. The length of the ngram can be altered if desired. Cosine, Levenshtein Distance, and Jaro-Winkler Distance algorithims are also available as alternatives.

Usage

>>> from fuzzy_match import match
>>> from fuzzy_match import algorithims

Trigram

>>> algorithims.trigram("this is a test string", "this is another test string")
    0.703704

Cosine

>>> algorithims.cosine("this is a test string", "this is another test string")
    0.7999999999999998

Levenshtein

>>> algorithims.levenshtein("this is a test string", "this is another test string")
    0.7777777777777778

Jaro-Winkler

>>> algorithims.jaro_winkler("this is a test string", "this is another test string")
    0.798941798941799

Match

>>> choices = ["simple strings", "strings are simple", "sim string", "string to match", "matching simple strings", "matching strings again"]
>>> match.extract("simple string", choices, limit=2)
    [('simple strings', 0.8), ('sim string', 0.642857)]
>>> match.extractOne("simple string", choices)
    ('simple strings', 0.8)

You can also pass additional arguments to extract and extractOne to set a score cutoff value or use one of the other algorithims mentioned above. Here is an example:

>>> match.extract("simple string", choices, match_type='levenshtein', score_cutoff=0.7)
    [('simple strings', 0.9285714285714286), ('sim string', 0.7692307692307693)]

match_type options include trigram, cosine, levenshtein, jaro_winkler

About

Fuzzy string matching in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages