# Levenshtein Distance
In a speech to text scenario (STT) three different kind of errors may occur:

1) Insertion (kitten -> kiatten)

2) Deletion (kitten -> kiten)

3) Substitution (kitten -> kisten)

The Levenshtein distance is a simple algorithm used for evaluating the distance between two different strings by adding up the insertions, deletions and substitutions.

A detailled description of the Levenshtein distance can be found under:

https://en.wikipedia.org/wiki/Levenshtein_distance

In [1]:
def levenshtein_distance(s1, s2):
    if len(s1) < len(s2):
        return levenshtein_distance(s2, s1)

    if len(s2) == 0:
        return len(s1)

    previous_row = range(len(s2) + 1)

    for i, c1 in enumerate(s1):
        current_row = [i + 1]

        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))

        previous_row = current_row

    return previous_row[-1]

# Beispiel
word1 = "kitten"
word2 = "sitting"
distance = levenshtein_distance(word1, word2)
print(f"Die Levenshtein-Distanz zwischen '{word1}' und '{word2}' beträgt {distance}.")

Die Levenshtein-Distanz zwischen 'kitten' und 'sitting' beträgt 3.
