## Edit distance (problem)

Given two strings word1 and word2, calculate their edit distance.
The edit distance in this problem is defined as the minimum number of insertions, deletions, and substitutions of characters to go from word1 to word2.


### Example:

input:
word1 = "inside"
word2 = "index"

output: 3

explanation: To go from "inside" to "index", we can delete the character 's', delete the second character 'i', and insert a character 'x' at the end, in total we need 3 operations
"inside" -> "inide" -> "inde" -> "index"

## The relation


## The bottom-up approach:

In [3]:
word1 = "inside"
word2 = "index"

In [1]:
def dist(word1, word2):
    
    dp = [[0]*(len(word1) + 1) for _ in range(len(word2) + 1)]
    
    for i in range(1, len(word1)+1):
        dp[0][i] = i
        
    for j in range(1, len(word2)+1):
        dp[j][0] = j
    print(dp)
    for i in range(1, len(word1)+1):
        for j in range(1, len(word2)+1):
            if word1[i-1] == word2[j-1]:
                dp[j][i] = dp[j-1][i-1]
            else:
                dp[j][i] = 1 + min(dp[j-1][i], dp[j][i-1], dp[j-1][i-1])
    print(dp)
    return dp[-1][-1]
    

In [2]:
dist("inside","index")

[[0, 1, 2, 3, 4, 5, 6], [1, 0, 0, 0, 0, 0, 0], [2, 0, 0, 0, 0, 0, 0], [3, 0, 0, 0, 0, 0, 0], [4, 0, 0, 0, 0, 0, 0], [5, 0, 0, 0, 0, 0, 0]]
[[0, 1, 2, 3, 4, 5, 6], [1, 0, 1, 2, 3, 4, 5], [2, 1, 0, 1, 2, 3, 4], [3, 2, 1, 1, 2, 2, 3], [4, 3, 2, 2, 2, 3, 2], [5, 4, 3, 3, 3, 3, 3]]


3

## The original solution

## Recursive

Time complexity: $O(3^{n+m})$\
Space complexity: $O(n+m)$

In [4]:
def dist(word1, word2, i=0, j=0):
    
    if i == len(word1):
        return len(word2)-j
    
    elif j == len(word2):
        return len(word1)-i
    
    elif word1[i] == word2[j]:
        return dist(word1, word2, i+1, j+1)
    
    else:
        return 1 + min(dist(word1, word2, i+1, j), dist(word1, word2, i, j+1), dist(word1, word2, i+1, j+1))

In [5]:
dist(word1, word2)

3

## Memoization (top-down)

Time complexity: $O(nm)$\
Space complexity: $O(nm)$

In [6]:
def dist(word1, word2, i=0, j=0, lookup=None):
    
    lookup = {} if lookup is None else lookup
    
    if (i, j) in lookup:
        return lookup[(i, j)]
    
    if i == len(word1):
        return len(word2)-j
    
    elif j == len(word2):
        return len(word1)-i

    elif word1[i] == word2[j]:
        lookup[(i, j)] = dist(word1, word2, i+1, j+1)
        return lookup[(i, j)]
    
    else:
        lookup[(i, j)] = 1 + min(dist(word1, word2, i+1, j), dist(word1, word2, i, j+1), dist(word1, word2, i+1, j+1))
        return lookup[(i, j)]

In [7]:
dist(word1, word2)

3

## Tabulation (bottom-up)

Time complexity: $O(nm)$\
Space complexity: $O(nm)$

In [8]:
def dist(word1, word2):
    n, m = len(word1), len(word2)
    dp = [[0]*(m+1) for i in range(n+1)]
    
    for j in range(1, m+1):
        dp[0][j] = j

    for i in range(1, n+1):
        dp[i][0] = i
    
    for i in range(1, n+1):
        for j in range(1, m+1):
            if word1[i-1] == word2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
    
    return dp[n][m]

In [9]:
dist(word1, word2)

3

But we can do it in:

Time complexity: $O(nm)$\
Space complexity: $O(m)$

In [10]:
def dist(word1, word2):
    
    n = len(word1)
    m = len(word2)
    
    prev_dp = [0]*(m+1)
    dp = [0]*(m+1)
    
    for j in range(1, m+1):
        prev_dp[j] = j
    
    for i in range(1, n+1):
        dp[0] = i
        for j in range(1, m+1):
            if word1[i-1] == word2[j-1]:
                dp[j] = prev_dp[j-1]
            else:
                dp[j] = 1 + min(prev_dp[j], dp[j-1], prev_dp[j-1])
        prev_dp = dp
        dp = [0]*(m+1)
    
    return prev_dp[m]

In [11]:
dist(word1, word2)

3