### Find a edit distance using dynamic programming
seq1: alpha * C (...BSDFDC)  
seq2: beta * A (...GEFDSA)  
*note: the alpha and belta is the prefix of the two sequences without the last base*

**edist(alpha * C, beta * A) = min(edist(alpha, beta) + 1, edist(alpha * C, beta) + 1, edist(alpha, beta * A) + 1)**

The edit distance of seq1 and seq2 is the **minimum** of:
1. edit distance of the prefix of the seq1 and seq2 and do a **substitution**  
2. edit distance of seq1 and the prefix of seq2 and **insert** A at the end of seq1  
2. edit distance of seq2 and the prefix of seq1 and **insert** C at the end of seq2  

The principle can be generalized into:  
  
seq1: alpha * X  
seq2: beta * Y  
  
**edist(alpha * X, belta * Y) = min(edist(alpha, belta) + delta(X, Y), edist(alpha * X, belta) + 1, edist(alpha, belta * Y) + 1)  
where delta(X, Y) = 0 if X = Y, or 1 otherwise**

In [10]:
def editdis(t, p):
    D = []
    for i in range(len(p) + 1):
        D.append([0] * (len(t) + 1))
    
    for i in range(len(p) + 1):
        for j in range(len(t) + 1):
            if i == 0:
                D[i][j] = j
            elif j == 0:
                D[i][j] = i
                
    for i in range(1, len(p) + 1):
        for j in range(1, len(t) + 1):       
            x = D[i-1][j] + 1
            y = D[i][j-1] + 1
            if p[i-1] == t[j-1]:
                z = D[i-1][j-1]
            else:
                z = D[i-1][j-1] + 1
            D[i][j] = min(x, y, z)
    return D[-1][-1]

In [12]:
#    *substitution of 's' with 'S'
#         *insert a ' ' in y
#              *insert a 'r' in x  
x = 'shake spea'
y = 'Shakespear'
editdis(x, y)

3

The computation time of dynamic programming is proportional to the number of characters of P times the number of characters of T, while the computation time of booyer-moore is proportional to the length of T. However, booyer-moore can only do exact matching.