# String matching

In this notebook we use the popular string matching library [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy). For more information on the different methods available and their differences, see the blog post [FuzzyWuzzy: Fuzzy String Matching in Python](https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/).

## 1. Imort

In [1]:
from fuzzywuzzy import fuzz, process

## 2. Beispiel

In [2]:
berlin = ['Berlin, Germany', 
          'Berlin, Deutschland', 
          'Berlin', 
          'Berlin, DE']

## String similarity

The match of the first two strings and seems low: `'Berlin, Germany'` and `'Berlin, Deutschland'`:

In [3]:
fuzz.ratio(berlin[0], berlin[1])

65

## Partial string similarity

Inconsistent substrings are a common problem. To get around this, fuzzywuzzy uses a heuristic called _best partial_.

In [4]:
fuzz.partial_ratio(berlin[0], berlin[1])

60

## Token sorting

With token sorting, the relevant character sequence is provided with a token, the tokens are sorted alphabetically and then reassembled into a character sequence, for example:

In [5]:
fuzz.ratio(berlin[1], berlin[2])

48

In [6]:
fuzz.token_set_ratio(berlin[1], berlin[2])

100

## Additional Information

In [7]:
fuzz.ratio?

## Extract from a list

In [8]:
choices = ['Germany',
           'Deutschland',
           'France', 
           'United Kingdom',
           'Great Britain', 
           'United States']

In [9]:
process.extract('DE', choices, limit=2)

[('Deutschland', 90), ('Germany', 45)]

In [10]:
process.extract('Vereinigtes Königreich', choices)

[('United Kingdom', 51),
 ('United States', 41),
 ('Germany', 39),
 ('Great Britain', 35),
 ('Deutschland', 31)]

In [11]:
process.extractOne('frankreich', choices)

('France', 62)

In [12]:
process.extractOne('U.S.', choices)

('United States', 86)

## Known ports

FuzzyWuzzy is also ported to other languages. Here are some known ports:

* Java: [xpresso](https://github.com/WantedTechnologies/xpresso)
* Java: [xdrop fuzzywuzzy](https://github.com/xdrop/fuzzywuzzy)
* Rust: [fuzzyrusty](https://github.com/logannc/fuzzyrusty)
* JavaScript: [fuzzball.js](https://github.com/nol13/fuzzball.js)
* C++: [tmplt fuzzywuzzy](https://github.com/Tmplt/fuzzywuzzy)
* C#: [FuzzySharp](https://github.com/BoomTownRoi/BoomTown.FuzzySharp)
* Go: [go-fuzzywuzzy](https://github.com/paul-mannino/go-fuzzywuzzy)
* Pascal: [FuzzyWuzzy.pas](https://github.com/DavidMoraisFerreira/FuzzyWuzzy.pas)
* Kotlin: [FuzzyWuzzy-Kotlin](https://github.com/willowtreeapps/fuzzywuzzy-kotlin)
* R: [fuzzywuzzyR](https://github.com/mlampros/fuzzywuzzyR)