# Fuzzy Wuzzy in Python

Fuzzywuzzy is a python library that uses Levenshtein Distance to calculate the differences between sequences and patterns that was developed and also open-sourced by SeatGeek.

To install the library, you can use pip:
pip install fuzzywuzzy
pip install python-Levenshtein

#### First we have to import the fuzzywuzzy modules:

In [2]:
from fuzzywuzzy import fuzz 
from fuzzywuzzy import process

Now, we can get the similarity score of two strings by using the following methods two methods ratio() or partial_ratio():

In [21]:
fuzz.ratio("1st Advantage bank","1st Advantage")

84

In [22]:
fuzz.partial_ratio("1st Advantage bank","1st Advantage")

100

We see the fuzz.partial_ratio gives 100 while fuzz.ratio only gives 97.This is because the fuzz.ratio() method just calculates the edit distance between some ordering of the token in both input strings using the difflib.ratio. The fuzz.partial_ratio() takes in the shortest string, which in this case is "1st Advantage" (length 12) , then matches it with all the sub-strings of length(12) in "1st Advantage bank" which means matching with "1st Advantage" which gives 100%. You can play around with the strings until you get the gist.

What if we switched up two names in one string? In the following example, I’ve interchanged the name “1st Advantage” to “Advantage 1st”. Let’s see the scores:

In [14]:
fuzz.ratio("1st Advantage Bank", "Advantage 1st")

65

In [15]:
fuzz.partial_ratio("1st Advantage Bank", "Advantage 1st")

77

We see that both methods are giving out low scores, this can be rectified by using token_sort_ratio() method. This method attempts to account for similar strings that are out of order. Example, if we used the above strings again but using token_sort_ratio() we get the following:

In [25]:
fuzz.token_sort_ratio("1st Advantage bank","1st Advantage")

84

But what happens if we have a repetition of string

In [30]:
fuzz.token_sort_ratio("1st Advantage bank","Advantage 1st Advantage")

78

To handle that There's one more method fuzz.token_set_ratio it treats repeated strings as same. let's see.

In [31]:
fuzz.token_set_ratio("1st Advantage Bank", "Advantage 1st Advantage")

100

The FuzzyWuzzy library is built on top of difflib library, python-Levenshtein is used for speed. So it is one of the best way for string matching in python.So, Based on the use-case we can use whichever method works for the scenario.