# What is Fuzzy Matching
Fuzzy matching is a technique used to compare and match two strings that are not exactly identical. While the naive approach of comparing strings and checking for equality would only make a direct comparison that is binary and doesn't consider the degree to which two strings might match each other. Fuzzy matching algorithms use various techniques to measure the similarity between two strings, even if they're not exactly identical. These techniques can include things like measuring the number of characters or words that are the same, or looking for common patterns or sequences of characters. By taking a more nuanced approach to comparing strings, fuzzy matching can help identify matches even when there are slight differences or variations in the data being compared.

In [10]:
from thefuzz import fuzz # python lib for fuzzy logic

string1 = "Hello World"
string2 = "hello world"

print(string1 == string2)  # True or False equality
print(f'The strings have a {fuzz.partial_ratio(string1,string2)}% similarity') # Degree to which we belive the strings to be matching

False
The strings have a 82% similarity


The codeblock above shows the value of using Fuzzy Matching. We could have potentially lost out on a word/phrase that has a signifcantly high similarity.

# Use cases of Fuzzy Matching
- Clean Customer Data: Companies use fuzzy matching to find and merge duplicate customer records. This helps to keep the customer database accurate and up-to-date, which is important for sales and marketing efforts.

- Match Medical Records: Healthcare providers use fuzzy matching to match patient records across different hospitals or clinics. This helps to ensure that patients receive consistent and accurate medical care.

- Detect Fraud: Fuzzy matching can help to identify suspicious patterns in financial data, such as transactions that are similar but not identical. This can be used to detect potential fraud.

- Personalize Product Recommendations: E-commerce websites use fuzzy matching to recommend products to users based on their search history or past purchases. This can help to improve the user experience and increase sales.

- Verify Addresses: Postal services and shipping companies use fuzzy matching to verify and correct address information. This helps to ensure that packages are delivered to the correct address and reduces the risk of misdeliveries or lost packages.

# Approaches to computing similarity of two Strings
Lets consider two strings String 1 and String 2

- Number of edits required to transition from String 1 to String 2
    - Edits would consist of the number of character replacements, deletions or insertions.
- Common word counts/tallies
- Frequency of letters and phrases
- Longest common Substrings

## Overview of core Fuzzy Matching Algorithms 

### Levenshtein distance
The Levenshtein distance algorithm is a way to compare two words or strings to see how different they are from each other. It counts how many changes you need to make to one word to turn it into the other. The changes can be deleting a letter, adding a letter, or changing a letter. The fewer changes you need to make, the more similar the words are. 

### Soundex Algorithm
Soundex is an algorithm that is used to index words by their sound, so that similar-sounding words can be matched even if they are spelled differently. It works by taking a word, dropping its vowels (except for the first letter), and assigning a code to each of the remaining consonants based on their sound. Consonants that have the same sound get the same code. The resulting code can then be used to compare words and find matches. 