<h1>Building Levenshtein Distance Calculator</h1>

<h3>What is Levenshtein Distance?</h3>

In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Levenshtein_distance_animation.gif/600px-Levenshtein_distance_animation.gif" alt="lev_dist_wiki">

Levenshtein distance is a lexical similarity measure which identifies the distance between one pair of strings. It does so by counting the number of times you would have to insert, delete or substitute a character from string 1 to make it like string 2.

<h3>What is Lexical Similarity?</h3>

In linguistics, lexical similarity is a measure of the degree to which the word sets of two given languages are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

<h3>Resources</h3>
<ol>
    <li><a href="https://en.wikipedia.org/wiki/Levenshtein_distance">https://en.wikipedia.org/wiki/Levenshtein_distance</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Lexical_similarity">https://en.wikipedia.org/wiki/Lexical_similarity</a></li>
    <li><a href="https://en.wikipedia.org/wiki/Vladimir_Levenshtein">https://en.wikipedia.org/wiki/Vladimir_Levenshtein</a></li>
</ol>


In [1]:
import numpy as np

Building <code>max</code> funtion for calculading length

In [2]:
def max(a, b):
    if a>=b and a>=b:
        return a
    else:
        return b

In [3]:
max(4, 6)

6

Building <code>min</code> function for Levenshtein distance

In [4]:
def min(a, b, c):
    if a<=b and a<=c:
        return a
    elif b<=a and b<=c:
        return b
    else:
        return c

In [5]:
min(3, 1, 0)

0

Building <code>normalize</code> function for Levenshtein distance

In [6]:
def normalize(X, size):

    if len(X) < size:
        dif = size - len(X)
        for i in range(dif):
            X = X + " "

    return X

Building <code>LevensteinDistance</code> function for finding difference between two sentences.

<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/6224efffbe9a4e01afbddeeb900bfd1b3350b335" alt="lev_dist_eq">

<br>

The Levenshtein distance between two strings a, b (of length |a| and |b| respectively) is given by lev(a, b) where where the tail of some string x is a string of all but the first character of x, and x[n] is the n th character of the string x, counting from 0.

In [7]:
def LevenshteinDistance(A, B):
    
    K = np.zeros((len(A) + 1, len(B) + 1))
    A_len = len(A)
    B_len = len(B)

    for i in range(A_len):
        K[i][0] = i
    for i in range(B_len):
        K[0][i] = i
        
    remove = 0
    add= 0
    change_position = 0

    for i in range(1, A_len + 1):
        for j in range (1, B_len + 1):
            if A[i-1] == B[j-1]:
                K[i][j] = K[i-1][j-1]
            else:
                remove =  K[i-1][ j] + 1
                add = K[i][ j-1] + 1
                change_position = K[i-1][ j-1] + 1
                K[i][j] = min(remove, add, change_position)

    return K[B_len-1][A_len-1]

In [8]:
sentence_1 = input("Sentence 1: ").lower()
sentence_2 = input("Sentence 2: ").lower()

Sentence 1: Levenshtein
Sentence 2: Levenshtein Distance


In [9]:
max_length = max(len(sentence_1), len(sentence_2))
max_length

20

Normalize inputs

In [10]:
sentence_1 = normalize(sentence_1, max_length)
sentence_2 = normalize(sentence_2, max_length)

In [11]:
sentence_1

'levenshtein         '

In [12]:
sentence_2

'levenshtein distance'

In [13]:
distance = LevenshteinDistance(sentence_1, sentence_2)
print('Distance between sentence_1 and sentence_2 is %f.' % distance)

Distance between sentence_1 and sentence_2 is 7.000000.


In [14]:
similarity = (max_length - distance) / max_length
print('Similarity between sentence_1 and sentence_2 is %f.' % similarity)

Similarity between sentence_1 and sentence_2 is 0.650000.


<hr>

## Contact Me

If you have something to say to me please contact me: 

 - Twitter: [Doguilmak](https://twitter.com/Doguilmak)  
 - Mail address: doguilmak@gmail.com