In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


  from IPython.core.display import display,HTML


<style>.jp-RenderedHTMLCommon table {
  border-collapse: collapse;
  border-spacing: 0;
  border: none;
  color: var(--jp-ui-font-color1);
  font-size: 20px;
  table-layout: fixed;
  margin-left: auto;
  margin-right: auto;
}</style>

# CMPS 2200
# Introduction to Algorithms

## Longest Increasing Subsequence, Edit Distance


Agenda:

- Longest Increasing Subsequence
- Edit Distance

## Longest Increasing Subsequence 

Given a sequence $S = \langle s_0, s_1, \ldots, s_{n-1} \rangle$, what is the longest increasing subsequence? 

> Note that subsequences don't need to be contiguous.

A subsequence is $s_{i_1}, s_{i_2},\dots, s_{i_k}$ where $0\leq i_1<i_2,\dots,<i_k\leq n-1$. An increasing subsequence is one where the numbers are getting strictly larger, such as $s_{i_1}<s_{i_2}$.



Example: $S=\langle 5, 2, 8, 6, 3, 6, 9, 7\rangle$. 

Every subsequence of length 1 is trivially increasing. 



Increasing subsequences include:

- $\langle 2, 6, 9 \rangle$

- $\langle 2, 8, 9 \rangle$

- $\langle 5, 6, 7\rangle$ 



What is the longest?



- $\langle 2, 3, 6, 9 \rangle$

- $\langle 2, 3, 6, 7 \rangle$

## Solving LIS

$$S=\langle 5, 2, 8, 6, 3, 6, 9, 7\rangle$$

Let's reduce this problem to something slightly simpler with the observation that the longest increasing subsequence must start somewhere in $S$.



Let $\mathit{LIS}(S, i)$ be the longest increasing subsequence for $S$ that starts with $S[i]$ as the first element. 



How can we use the function $\mathit{LIS}(S, i)$ to solve the original problem?



<br>

If we can compute $LIS(S, i)$ then we can compute $ \mathit{LIS}(S) = \max_{0\leq i < n} \mathit{LIS}(S, i).$


- If $S[i]$ is the first element, then the next element $j$ in the longest increasing subsequence must have $j>i$ and $S[j] > S[i]$. 
- Whichever element is next, we must have $\mathit{LIS}(S, i) = 1 + \max_{j: S[j] > S[i]} \mathit{LIS}(S,j).$




## Optimal Substructure

**Optimal Substructure for Longest Increasing Subsequence**: Given a sequence $S$, we have that the longest increasing subesquence of $S$ is $ \mathit{LIS}(S) = \max_{0\leq i < n} \mathit{LIS}(S, i)$ where
$$\mathit{LIS}(S, i) = 1 + \max_{j: S[j] > S[i]} \mathit{LIS}(S, j).$$



To compute this optimal substructure property, how many distinct subproblems must be computed from scratch? 



There are only a linear number of starting points for an optimal solution. 

But for each subproblem, the work to compute an optimal solution, even if we have computed all subproblems already, is actually linear in the size of the sequence we consider. 



This optimal substructure property is little different than what we saw in Knapsack where every subproblem depended on two subproblems which could be solved in $O(1)$ time.


## Implementation: Top-down w/out memoization

In [2]:
# longest increasing subsequence starting at position 0
def LIS_helper(S):
    if (S == []):
        return(0)
    else:
        # find elements in the sequence that are larger than S[0]
        rest = [j for j in range(1,len(S)) if S[j]>S[0]]
        if (rest == []):
            return(1)
        else:
            results = [LIS_helper(S[i:]) for i in rest]
            if (results == []):
                return(1)
            else:
                return(1 + max(results))
    
def LIS(S):
    return(max([LIS_helper(L[i:]) for i in range(len(L))]))

L = [5,2,8,6,3,6,9,7]
print(LIS(L))


4


## Runtime

<img src="figures/lis_dag.png" width="70%">

Picture the DAG (directed acyclic graph). The solution is then the longest path in the DAG.

We have to traverse all edges and $|E|$ is at most $n^2$ so our work is $O(n^2)$.

The longest path in the DAG is at most $n$.

$W(n) \in O(n^2)$

$S(n) \in O(n)$


# Edit Distance

Given two strings $S, T \in \Sigma^*$, how similar are they?

We can measure this using *edit distance*, which is the number of insertions and deletions needed to turn $S$ into $T$. Note that we can also go from $T$ to $S$ if we just reverse the edits (by turning insertions into deletions)

Example: 

$S$ = `abcdefghijkl`<br>
$T$ = `abcdghikjl` 

How many edits are needed?

> This might seem like a toy problem, but it is a critical problem in comparing gene and protein sequences. By attaching weights to insertions and deletions, we can assess the evolutionary distance between two sequences.


**Notation Note**: 

We will represent insertions and deletions using dashes. All insertions and deletions refer to modifying the top string.

A dash in the bottom string means delete the character above it. 

E.g.,

`abc`<br>
`a-c`



A dash in the top string means insert a character here to match the character below.

E.g.,

`jkl-l`<br>
`jklol`

**Consider following edit sequence:**

$S$: `abcdefghijkl---`<br>
$T$: `abcd--ghi---kjl`

This has 5 deletions and 3 insertions, for a total of 8 edits. 



What about this one:

$S$: `abcdefghijk-l`<br>
$T$: `abcd--ghi-kjl`



We have 3 deletions and 1 insertion for a total of 4 edits.



Our goal is to compute the **minimum edit distance** between two strings $S$ and $T$ of lengths $m$ and $n$, respectively.




# Greedy Solution?

Suppose we had the following two strings

<br>
$S=$`relevant`

$T=$`elephant`

and suppose we had a greedy choice: 


**If the characters match, do nothing, else insert into $S$**.


<br>
$S=$`--------relevant`

$T=$`elephant`

Clearly not ideal, need to follow it up to delete all extra characters in $S$ past the end of $T$. 



Runtime: S+T, literally the worst!


Greedy choice 2: 
**If the characters match, do nothing, else delete from $S$**.


<br>
$S=$`relevant`

$T=$`-ele----phant`

Also not ideal.


### Why isn't greedy working?



The choice we make at any particular position should depend on how it effects the rest of the problem. 



We should examine both options, delete and insert, to decide what to do with respect to how that choice affects everything after.



Here is an optimal set of edits for "relevant elephant"

$S=$`relev--ant`<br>
$T=$`-ele-phant`

## Optimal substructure?

Let's use case-based reasoning about the optimal solution as we did for Knapsack. 

Let $\mathit{MED}(S, T)$ be the optimal number of edits between $S$ and $T$. 

In an optimal sequence of edits, how would we deal with the first two characters of $S$ and $T$, respectively?

**Three cases:**

$~~$ 1) $~~S$ or $T$ is empty

$~~$ 2) $~~S[0]=T[0]$

$~~$ 3) $~~S[0] \neq T[0]$


### 1) $S$ or $T$ is empty

If $S$ is empty and $T$ is not, what is the edit cost?  

S=`''`<br>
T=`'abcde'`



If either string is empty, then the edit cost is simply the length of the other string.



<br>

### 2) $S[0] = T[0]$ 

S=`'abc'`<br>
T=`'ade'`



There is no benefit to editing position $0$. The edit distance is the edit distance of the tails of these strings.



$\mathit{MED}(S, T) = \mathit{MED}(S[1:], T[1:])$

e.g, 

$\mathit{MED}(S, T) = \mathit{MED}($`'bc'`$, $`'de'`$)$. 



### 3) $S[0] \neq T[0]$

S=`'abc'`<br>
T=`'bde'`

We must either perform a deletion or insertion, whichever is less costly

**Deleting from $S$**. The cost is 1 (the cost of the deletion) plus the $\mathit{MED}$ of the tail of $S$ with all of $T$.

S=`'abc'`<br>
T=`'-bde'`

$\mathit{MED}(S, T) = 1+\mathit{MED}($`'bc'`,`'bde'`$)$ 

$\mathit{MED}(S, T) = 1+\mathit{MED}(S[1:], T)$  


**Inserting into $S$**. If we insert into $S$, we have to match up all of $S$, which still remains, with the tail of T since we've ensured its first character matches.

S=`'-abc'`<br>
T=`'bde'`

$\mathit{MED}(S, T) = 1+\mathit{MED}($`'abc'`,`'de'`$)$;
$\mathit{MED}(S, T) = 1+\mathit{MED}(S, T[1:])$  

## Optimal Substructure

**Optimal Substructure for Edit Distance**: Let $S$ and $T$ be strings of length $m$ and $n$. Then,

$$
\mathit{MED}(S, T) = 
\begin{cases}
\mathit{MED}(S[1:], T[1:]), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{if}~~S[0]=T[0] \\
1+\min\{\mathit{MED}(S[1:], T),\mathit{MED}(S, T[1:])\}, ~~~~~\texttt{otherwise} \\
\end{cases}
$$

<br>


### Problem Size

Just as with Knapsack, this recursion tree for this recurrence yields an exponential number of nodes. How many nodes are there, and what is the depth? 

The recursion tree has $O(2^{m+n})$ nodes and depth $O(m+n)$. Are there shared subproblems?

For $S$=`ABC` and $T$=`DBC` we have the following DAG:

<img src="figures/edit_distance_dag.jpg" width="80%">

How much sharing is possible? In other words, how many distinct subproblems are there?

<img src="figures/edit_distance_dag.jpg" width="80%">

In any recursive call, the subproblems we consider consist of strings with one less character. So there are $O(mn)$ subproblems, each of which can each be computed in $O(1)$ time. The longest path in the recursion DAG is $O(m+n)$.


## Implementation: Top-down w/out memoization 

In [3]:
def MED(S, T):
    #print("S:%s, T:%s" % (S, T))
    if (S == ""):
        return(len(T))
    elif (T == ""):
        return(len(S))
    else:
        if (S[0] == T[0]):
            return(MED(S[1:], T[1:]))
        else:
            return(1 + min(MED(S, T[1:]), MED(S[1:], T)))
        
S= 'abcdefghijkl'
T= 'abcdghikjl'
#   abcdefghijk-l
#   abcd--ghi-kjl
print(MED(S, T))

4


In [4]:
S = 'relevant'
T = 'elephant'
#    relev--ant
#    -ele-phant
print(MED(S, T))

4


## Solving MED using a Bottom-Up Approach

As we did for knapsack, we can use a table to calculate and store solutions to subproblems.



<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:9px 16px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:9px 16px;word-break:normal;}
.tg .tg-0alu{border-color:#ffffff;color:#036400;font-size:20px;font-style:italic;text-align:center;vertical-align:middle;writing-mode:vertical-lr;transform:rotate(-90deg);}
.tg .tg-trnk{border-color:#ffffff;color:#cb0000;font-size:20px;font-style:italic;text-align:center;vertical-align:middle}
.tg .tg-re03{border-color:#000000;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-d999{border-color:#ffffff;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-1e1z{border-color:#c0c0c0;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-trnk" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0alu" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
</tbody>
</table>

<br>

We fill in the values in the first row to give the cost of deleting characters from `'kitten'` to match the empty string `''` to the left.


<style type="text/css">
    .tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:9px 16px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:9px 16px;word-break:normal;}
.tg .tg-0alu{border-color:#ffffff;color:#036400;font-size:20px;font-style:italic;text-align:center;vertical-align:middle;writing-mode:vertical-lr;transform:rotate(-90deg);}
.tg .tg-trnk{border-color:#ffffff;color:#cb0000;font-size:20px;font-style:italic;text-align:center;vertical-align:middle}
.tg .tg-re03{border-color:#000000;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-d999{border-color:#ffffff;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-1e1z{border-color:#c0c0c0;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}

</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-trnk" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0alu" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-re02">0</td>
    <td class="tg-re06">1</td>
    <td class="tg-re06">2</td>
    <td class="tg-re06">3</td>
    <td class="tg-re06">4</td>
    <td class="tg-re06">5</td>
    <td class="tg-re06">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
</tbody>
</table>


We fill in the values in the first column to give the cost of inserting characters from `'sitting'` to match the empty string above.

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:9px 16px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:9px 16px;word-break:normal;}
.tg .tg-0alu{border-color:#ffffff;color:#036400;font-size:20px;font-style:italic;text-align:center;vertical-align:middle;writing-mode:vertical-lr;transform:rotate(-90deg);}
.tg .tg-trnk{border-color:#ffffff;color:#cb0000;font-size:20px;font-style:italic;text-align:center;vertical-align:middle}
.tg .tg-re03{border-color:#000000;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-d999{border-color:#ffffff;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-1e1z{border-color:#c0c0c0;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-trnk" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0alu" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-re02">0</td>
    <td class="tg-re03">1</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-re07">1</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re07">2</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re07">3</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re07">4</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re07">5</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re07">6</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re07">7</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
</tbody>
</table>

We fill in the rest using our recurrence:

$$
\mathit{MED}(S, T) = 
\begin{cases}
\mathit{MED}(S[1:], T[1:]), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{if}~~S[0]=T[0] \\
1+\min\{\mathit{MED}(S[1:], T),\mathit{MED}(S, T[1:])\}, ~~~~~\texttt{otherwise} \\
\end{cases}
$$

<style type="text/css">
.tg .tg-6zma{border-color:#000000;color:#f56b00;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}
.tg .tg-409p{border-color:#ffffff;color:#cb0000;font-size:20px;font-style:italic;text-align:center;vertical-align:middle}
.tg .tg-1um3{border-color:#ffffff;color:#036400;font-size:20px;font-style:italic;text-align:center;vertical-align:middle;writing-mode:vertical-lr;transform:rotate(-90deg);}
.tg .tg-re03{border-color:#000000;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-re08{border-color:#000000;color:#000000;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}
.tg .tg-re09{border-color:#000000;color:#f56b00;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}
.tg .tg-d999{border-color:#ffffff;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-1e1z{border-color:#c0c0c0;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}
.tg .tg-6j81{border-color:#000000;color:#cb0000;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle;text-decoration:underline;}
.tg .tg-4qyb{border-color:#000000;color:#036400;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle;text-decoration:underline;}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-409p" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-1um3" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-re09">0</td>
    <td class="tg-6j81">1</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-4qyb">1</td>
    <td class="tg-re08">2</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re03">7</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
</tbody>
</table>

<style type="text/css">
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-409p" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-1um3" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-ve0v">0</td>
    <td class="tg-ve0v">1</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-ve0v">1</td>
    <td class="tg-xmct">2</td>
    <td class="tg-41fk">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03">7</td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">2</td>
    <td class="tg-295u">3</td>
    <td class="tg-re04">2</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re03">7</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
</tbody>
</table>

<style type="text/css">
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-409p" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-1um3" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-ve0v">0</td>
    <td class="tg-ve0v">1</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-ve0v">1</td>
    <td class="tg-ve0v">2</td>
    <td class="tg-8xes">3</td>
    <td class="tg-41fk">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03">7</td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">2</td>
    <td class="tg-ve0v">3</td>
    <td class="tg-drif">2</td>
    <td class="tg-6zmb">3</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re03">7</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
</tbody>
</table>

<style type="text/css">
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-409p" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-1um3" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-ve0v">0</td>
    <td class="tg-ve0v">1</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-ve0v">1</td>
    <td class="tg-ve0v">2</td>
    <td class="tg-wo5g">3</td>
    <td class="tg-ve0v">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03">7</td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">2</td>
    <td class="tg-ve0v">3</td>
    <td class="tg-ve0v">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03">7</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">4</td>
    <td class="tg-8xes">5</td>
    <td class="tg-6ua1">4</td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re03">7</td>
    <td class="tg-re03">8</td>
    <td class="tg-re03">7</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03">5</td>
    <td class="tg-295u">6</td>
    <td class="tg-6zma">5</td>
  </tr>
</tbody>
</table>

### Tracing back the edits

<img src="figures/edit-trace-1.jpeg" width="80%">

<img src="figures/edit-trace-2.jpeg" width="80%">

<img src="figures/edit-trace-3.jpeg" width="80%">

**Alignments:**

`k-itte-n-` $~~~~~~~$ `-kitt-en-`<br>
`-sitt-ing` $~~~~~~~$ `s-itti-ng`


<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:9px 16px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:9px 16px;word-break:normal;}
.tg .tg-0alu{border-color:#ffffff;font-size:20px;font-style:italic;text-align:center;vertical-align:middle;writing-mode:vertical-lr;transform:rotate(-180deg);}
.tg .tg-trnk{border-color:#ffffff;font-size:20px;font-style:italic;text-align:center;vertical-align:middle}
.tg .tg-re03{border-color:#000000;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-d999{border-color:#ffffff;font-size:20px;text-align:left;vertical-align:middle}
.tg .tg-1e1z{border-color:#c0c0c0;font-size:20px;font-weight:bold;text-align:left;vertical-align:middle}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-d999"></th>
    <th class="tg-trnk" colspan="8">Cost to Delete</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0alu" rowspan="9">Cost to Insert</td>
    <td class="tg-1e1z"></td>
    <td class="tg-1e1z">''</td>
    <td class="tg-1e1z">k</td>
    <td class="tg-1e1z">i</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">t</td>
    <td class="tg-1e1z">e</td>
    <td class="tg-1e1z">n</td>
  </tr>
  <tr>
    <td class="tg-1e1z">''</td>
    <td class="tg-re03">0</td>
    <td class="tg-re03">1</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03">6</td>
  </tr>
  <tr>
    <td class="tg-1e1z">s</td>
    <td class="tg-re03">1</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">2</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">3</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">t</td>
    <td class="tg-re03">4</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">i</td>
    <td class="tg-re03">5</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">n</td>
    <td class="tg-re03">6</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
  <tr>
    <td class="tg-1e1z">g</td>
    <td class="tg-re03">7</td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
    <td class="tg-re03"></td>
  </tr>
</tbody>
</table>

In [5]:

S = 'kitten'
T = 'sitting'
#    -kitt-en-
#    s-itti-ng
print(MED(S, T))

5


<h3>Edit distance variants</h3>

We can create variants on the edit distance that allow more operations than just insertion and substitution. These edit operations should maintain the property that if the zeroth letters of $S$ and $T$ match, then the minimal edit distance can be obtained by calculating the minimial edit distance of $S[1:]$ and $T[1:]$. Otherwise, if the zeroth characters do not match, the minimal edit distance must be obtained by first performing an edit to make the zeroth characters match. Some edits to consider:

- Insertions
- Deletions
- Substitutions
- Transpositions (either of adjacent characters or arbitrary transpositions)