## Minimum Edit Distance using Dynamic Programming

1. Given a String Source[0,i] and a String Target[0,j], we will compute all possible combinations of substrings [i,j] and their Minimum Edit Distance.
2. We will maintain the previously computed substrings and use them to calculate the larger substrings
3. We need to maintain a matrix D. 
4. Rows = characters of Source string + (1 extra blank), Col = characters of Target string + (extra blank)
5. We begin with "Initialization"
    a) D[0,0] = 0
    b) Compute edit distances for all elements of 1st row and 1st col
        D[i,0] = D[i-1,0] + del_cost(source[i])
        D[0,j] = D[0,j-1] + ins_cost(source[j])
6. Now compute D[i,j]
        min{ D[i-1,j] + del_cost(source[i])
             D[i,j-1] + ins_cost(target[j])
             D[i-1,j-1] + {rep_cost, i != j
                           0, i = j}
                                   }
    

In [7]:
import numpy as np
import pandas as pd

In [8]:
# minimum amount of edits required given a source and a target string

def minimum_edit_distance(source, target, del_cost=1, ins_cost=1,rep_cost=2):
    
    """
    1. Initialize: 
        D of size [i+1,j+1]
        [0,0]=0
        D[i,0]=D[i-1,0]+del_cost
        D[0,j]=D[0,j-1]+ins_cost
    
    2. Loop through i,j:
     min{ D[i-1,j] + del_cost(source[i])
          D[i,j-1] + ins_cost(target[j])
          D[i-1,j-1] + {rep_cost, i != j
                    0, i = j}
                            }
    
    3. mini edit distance will be the last cell [i,j] of matrix, i=j
    
    4. Make the matrix more readable, by converting to df and adding indexes
    
    Input: 
    Source string
    Target string
    Delete_cost, insert_cost, replace_cost
    
    Output:
    Dataframe D, Minimum edit distance
    """
    #create matrix
    
    i_len = len(source)
    j_len = len(target)
    
    idx = ['#']+list(source)
    jdx = ['#']+list(target)
    
    D = np.zeros((i_len+1,j_len+1),int)
    
    #initialize the matrix
    D[0,0] = 0
    
    for i in range(1,i_len+1):
        D[i,0] = D[i-1,0]+del_cost

    for j in range(1,j_len+1):
        D[0,j] = D[0,j-1]+ins_cost
        
    #fill each cell [i,j]
    for i in range(1,i_len+1):
        for j in range(1,j_len+1):
            
            #initialize rep_cost
            rep=rep_cost
            
            if source[i-1]==target[j-1]:
                rep=0
            
            D[i,j] = min(D[i-1,j]+del_cost, D[i,j-1]+ins_cost, D[i-1,j-1]+rep)
                
                
    med = D[i_len,j_len]
    
    D = pd.DataFrame(D, index=idx, columns=jdx)
    print(f'minimum edit distance is {med}')
    return D, med

In [9]:
minimum_edit_distance('ant','want')

minimum edit distance is 1


(   #  w  a  n  t
 #  0  1  2  3  4
 a  1  2  1  2  3
 n  2  3  2  1  2
 t  3  4  3  2  1, 1)