# Task 1

Write down a python program which takes two strings as input and calculate the
Levenshtein/Edit distance between the two strings.

Explanation:-

Levenstein/Edit distance gives us a measure of similarity between two strings/sequences. Going by formal definition it is minimum number of single character edits required to transform one string into another.
Single character edits include:-
-	Insertion
-	Deletion
-	Substitution



In [1]:
import numpy as np

## Function to implement Levenshtein Distance logic

In [2]:
def levenshtein_distance(source_str_list, target_str_list, show_processed_array=True):
    #Adding character on start of string
       
    source_str_list.insert(0,'/')
    target_str_list.insert(0,'/')
    
    
    #Creating an array of size len(source_str_list) * len(target_str_list)
    arr = np.zeros((len(target_str_list), len(source_str_list)), dtype='int')
    
    row_count = arr.shape[0]
    col_count = arr.shape[1]
    
    #initializing first row and first column with default values
    for i in range(row_count):
        arr[i][0] = i
        
    for j in range(col_count):
        arr[0][j] = j

    
    
    #Applying Levenshtein Distance algorithm steps 
    for i in range(1, row_count):
        for j in range(1, col_count):

            if (source_str_list[j] == target_str_list[i]):
                arr[i][j] = arr[i-1][j-1]

            else:
                min_num = min(arr[i][j-1], arr[i-1][j], arr[i-1][j-1])
                arr[i][j] = min_num +1
                
    if (show_processed_array):
        print("\nProcessed Array: \n", arr)
    
    return arr[-1,-1]


## Taking Input

In [3]:
# input_str = 'kitten'
# output_str = 'sitting'

In [4]:
input_str = input('Enter string 1 "("Source")": ')
output_str = input('Enter string 2 "("Target")": ')

Enter string 1 "("Source")": kitten
Enter string 2 "("Target")": sitting


In [5]:
input_str_list = list(input_str)
output_str_list = list(output_str)

## Call function and Display results

In [6]:
print('string 1 "("Source")" is:  '+ input_str)
print('output 2 "("Target")" is: '+ output_str)

levenshtein_distance_result = levenshtein_distance(input_str_list.copy(), output_str_list.copy())

print("\nLevenshtein Distance: ",levenshtein_distance_result)

string 1 "("Source")" is:  kitten
output 2 "("Target")" is: sitting

Processed Array: 
 [[0 1 2 3 4 5 6]
 [1 1 2 3 4 5 6]
 [2 2 1 2 3 4 5]
 [3 3 2 1 2 3 4]
 [4 4 3 2 1 2 3]
 [5 5 4 3 2 2 3]
 [6 6 5 4 3 3 2]
 [7 7 6 5 4 4 3]]

Levenshtein Distance:  3


# Task 2
Now modify the above written program in such a way that it takes two text files containing single- line and lowercase English sentences named as reference.txt and hypothesis.txt, and outputs the file result.txt containing Levenstein distance of these two files as below. The distance should be word level and not character level.

In [346]:
reference_file = open('reference.txt', 'r')
hypothesis_file = open('hypothesis.txt', 'r') 

In [347]:
reference_data = reference_file.read()
hypothesis_data = hypothesis_file.read()

In [348]:
reference_tokens = reference_data.split(' ')
hypothesis_tokens = hypothesis_data.split(' ')

In [349]:
print('Reference File Data "(Source") is:  \n', reference_data)
print('\nHypothesis File Data 2 "(Target") is:\n', hypothesis_data)

levenshtein_distance_result = levenshtein_distance(reference_tokens.copy(), hypothesis_tokens.copy())
print("\nLevenshtein Distance: ",levenshtein_distance_result)

Reference File Data "(Source") is:  
 this is some text and we would like to see if it has been identified correctly by speech recognition system

Hypothesis File Data 2 "(Target") is:
 this is a text and we would like to check what has been identified by the speech recognition

Processed Array: 
 [[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]
 [ 1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
 [ 2  1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18]
 [ 3  2  1  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18]
 [ 4  3  2  2  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17]
 [ 5  4  3  3  2  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16]
 [ 6  5  4  4  3  2  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
 [ 7  6  5  5  4  3  2  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
 [ 8  7  6  6  5  4  3  2  1  2  3  4  5  6  7  8  9 10 11 12 13]
 [ 9  8  7  7  6  5  4  3  2  1  2  3  4  5  6  7  8  9 10 11 12]
 [10  9  8  8  7  6  5  4  3  2  2  3  4 

## save Lavenstein Distance in result.txt file

In [350]:
with open('result.txt','w') as result_file:
    save_text = 'Levenstein distance is '+ str(levenshtein_distance_result)
    print(save_text)
    result_file.write(save_text)

Levenstein distance is 7
