# Fuzzy matching to regex pattern

In [1]:
import regex
import random
import string

All you need to do is add a little code to your regex pattern to get fuzzy-matching working:
- After a capture group just add `{e<2}` to specify matches to with 0 or 1 errors (eg. `(cat){e<2}` will match on `bat`)
- There are different kinds of edits you can specify
   - i: insertion
   - d: deletion
   - s: substitution
   - e: any of the above
- You can also specify combination of errors and even weight them, such as `{i<=2,d<=2,e<=3}`, or `{2i+2d+1s<=4}`.

Let see it in action

In [2]:
correct_string = "before 2018033101Vte0000007 after"
regex_pattern = r"(?b)((20[0-9]{2})(0[1-9]|1[0-2])(0[0-9]|1[0-9]|2[0-9]|3[0-1])([a-zA-Z0-9]{12})){e<4}"

regex.search(regex_pattern, correct_string, flags=1)

<regex.Match object; span=(7, 27), match='2018033101Vte0000007'>

In [24]:
regex.match(r"(\d){e<4}", "34")

<regex.Match object; span=(0, 2), match='34'>

In [20]:
import re
std_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[0-9]|1[0-9]|2[0-9]|3[0-1])([a-zA-Z0-9]{12}))"
re.match(std_pattern, correct_string)

<_sre.SRE_Match object; span=(0, 20), match='2018033101Vte0000007'>

## More Rigorous Test

To test this out a little more, we're gonna create some random edits to our string, and see how the fuzzy matching goes

In [88]:
def insert_space(word, loc):
    return word[:loc] + " " + word[loc:]

def change_char(word, loc):
    if loc == 0:
        return word
    new_char = random.choice(string.digits + string.ascii_letters)
    return word[:loc - 1] + new_char + word[loc:]

def delete_char(word, loc):
    return word[:loc - 1] + word[loc:]

edit_funcs = [insert_space, change_char, delete_char]

def messup_string(word, edits=2):
    for _ in range(edits):
        editor = random.choice(edit_funcs)
        edit_idx = random.randint(0,len(word))
        print(f"Running \"{editor.__name__}\" on char {edit_idx}")
        word = editor(word, edit_idx)
    return word

## Match (3 Random Edits)

In [96]:
edited_string = messup_string(correct_string, 3)
regex.match(regex_pattern, edited_string)

Running "change_char" on char 7
Running "change_char" on char 12
Running "delete_char" on char 13


<regex.Match object; span=(0, 19), match='201803N101Vz0000007', fuzzy_counts=(1, 0, 1)>

## No Match (too many edits)

In [97]:
edited_string = messup_string(correct_string, 4)
regex.match(regex_pattern, edited_string)

Running "insert_space" on char 16
Running "change_char" on char 2
Running "insert_space" on char 15
Running "change_char" on char 19


In [7]:
bad_string = "foobizz"

pattern = r"(?b)(foobar){e<4}"

regex.fullmatch(pattern, bad_string)

<regex.Match object; span=(0, 7), match='foobizz', fuzzy_counts=(2, 1, 0)>