# Longest Common Subsequence

Given a sequence $X[1..n] = \langle x_1, x_2, \ldots, x_n\rangle$, then $\langle x_{i_1}, x_{i_2}, \ldots, x_{i_k}\rangle$ is a subsequence of $X$ with $1 \leq i_1 < i_2 < \cdots < i_k \leq n$. Given two sequences $X$ and $Y$, a longest common subsequence (LCS) $Z$ is a subsequence of both $X$ and $Y$ and for any common subsequence $Z'$ of $X$ and $Y$, we have $|Z'| \leq |Z|$.

How do we find a LCS of $X$ and $Y$?

Let's formulate the problem and seek if there is a recurrence relation that can describe the soluton to the problem through solutions to subproblems.Let $X[i..j]$ denote $\langle x_i, \ldots, x_j\rangle$. Let $X = X[1..m]$ and $Y=Y[1..n]$. Denote by $X[i..j]$ the substring $x_i \cdots x_j$. We first compute the length of the LCS then augment LCS to construct a LCS. 

Let $LCS(i,j)$ denote the length of the LCS of $X[1..i]$ and $Y[1..j]$. Then $LCS(m,n)$ is the length of an LCS of $X$ and $Y$. We have the following recurrence relation:
$$
LCS(i,j) = \left\{
\begin{array}{ll}
LCS(i-1,j-1) + 1, & \mbox{if $i > 0$, $j > 0$, and $x_i = y_j$,} \\
\max\{LCS(i-1, j), LCS(i,j-1)\}, & \mbox{if $i>0$, $j>0$, and $x_i \not= y_j$} \\
0, & \mbox{if $i=0$ or $j=0$.}
\end{array}
\right.
$$

# A Naive Implementation of Recurrsion

In [2]:
def lcs(X, Y, i, j):
    if i == 0 or j == 0:
        return 0
    elif X[i-1] == Y[j-1]:
        return 1 + lcs(X, Y, i-1, j-1)
    else:
        return max(lcs(X, Y, i, j-1), lcs(X, Y, i-1, j))

# Driver program to test the above function
X = "AGGTDACTABCGLAGLADB"
Y = "GXTACBCACGLACGCGBA"
print("Length of LCS is ", lcs(X, Y, len(X), len(Y)))

KeyboardInterrupt: 

# Complexity Analysis of Naive Recursion

Let $T(i,j)$ be the number of steps to compute $LCS(i,j)$. Then 
$$
T(i,j) = \left\{
\begin{array}{ll}
T(i-1,j-1) + 1, & \mbox{if $i,j> 0$ and $x_i = y_j$,}  \\
T(i, j-1) + T(i-1, j), & \mbox{if $i,j>0$ and $x_i \not= y_j$,} \\
1, & \mbox{if $i=0$ or $j=0$}
\end{array}
\right.
$$
In the extreme case that the LCS of $X$ and $Y$ is 0, namely, for all $i$ and $j$, $x_i \not= y_j$, then
$T(m,n) = T(m,n-1) + T(m-1,n) > 2T(m-1,n-1) > 2^2T(m-2,n-2) > \cdots > 2^kT(m-k,n-k)$. Assume that
$m = n$, then $T(n,n) > 2^kT(n-k,n-k)$. When $n-k = 0$, we have $T(0,0) = 1$ and $k = n$. Hence,
$T(n,n) > 2^n$.

# DP Memoization

In [3]:
def memoized_lcs(X, Y, i, j, memo):
    if memo[i][j] >= 0:
        return memo[i][j]
    if i == 0 or j == 0:
        v = 0
    elif X[i-1] == Y[j-1]:
        v = 1 + memoized_lcs(X, Y, i-1, j-1, memo) 
    else:
        v = max(memoized_lcs(X, Y, i, j-1, memo), memoized_lcs(X, Y, i-1, j, memo))
    memo[i][j] = v
    return v


In [4]:
# Driver program to test the above function
X = "AGGTDACTABCGLAGLADB"
Y = "GXTACBCACGLACGCGBA"
m = len(X)
n = len(Y)
memo = [[-1] * (n + 1) for i in range(m+1)]
print("Length of LCS is ", memoized_lcs(X, Y, m, n, memo))

Length of LCS is  11


# DP Bottom Up

In [5]:
def bottom_up_lcs(X, Y):
    # find the length of the strings
    m = len(X)
    n = len(Y)
 
    # declaring the array for storing the dp values
    L = [[None]*(n + 1) for i in range(m + 1)]
 
    for i in range(m + 1):
        for j in range(n + 1):
            if i == 0 or j == 0 :
                L[i][j] = 0
            elif X[i-1] == Y[j-1]:
                L[i][j] = L[i-1][j-1]+1
            else:
                L[i][j] = max(L[i-1][j], L[i][j-1])
 
    # L[m][n] contains the length of LCS of X[0..n-1] & Y[0..m-1]
    return L[m][n]
# end of function lcs

In [6]:
# Driver program to test the above function
X = "AGGTDACTABCGLAGLADB"
Y = "GXTACBCACGLACGCGBA"
print("Length of LCS is ", bottom_up_lcs(X, Y))

Length of LCS is  11


# Complexity Analysis of DP Approach

There are $mn$ different subproblems and the recurrence relation relies on at most two subproblems. Then, the time complexity is at most $2mn +O(1) = O(mn)$.

# Construct an LCS

We would need to remember which case is encountered in each recurrence step. There are three cases: up for $(i-1, j)$, upleft for $(i-1, j-1)$, and left for $(i,j-1)$. Let $P[1..m, 1..n]$ store the case.

In [7]:
def bottom_up_lcs_path(X, Y):
    # find the length of the strings
    m = len(X)
    n = len(Y)
 
    # declaring the array for storing the dp values
    L = [[None]*(n + 1) for i in range(m + 1)]
    P = [[0]*(n + 1) for i in range(m + 1)]
    for i in range(m + 1):
        L[i][0] = 0
    for j in range(n + 1):
        L[0][j] = 0
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i-1] == Y[j-1]:
                L[i][j] = L[i-1][j-1]+1
                P[i][j] = "upleft"
            elif L[i-1][j] >= L[i][j-1]: 
                L[i][j] = L[i-1][j]
                P[i][j] = "up"
            else:
                L[i][j] = L[i][j-1]
                P[i][j] = "left"
    return L[m][n], P

In [8]:
# Driver program to test the above function
X = "AGGTDACTABCGLAGLADB"
Y = "GXTACBCACGLACGCGBA"
print("Length of LCS is ", bottom_up_lcs_path(X, Y)[0])
print("Path of LCS is ", bottom_up_lcs_path(X, Y)[1])

Length of LCS is  11
Path of LCS is  [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 'up', 'up', 'up', 'upleft', 'left', 'left', 'left', 'upleft', 'left', 'left', 'left', 'upleft', 'left', 'left', 'left', 'left', 'left', 'upleft'], [0, 'upleft', 'left', 'left', 'up', 'up', 'up', 'up', 'up', 'up', 'upleft', 'left', 'left', 'left', 'upleft', 'left', 'upleft', 'left', 'left'], [0, 'upleft', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'upleft', 'up', 'up', 'up', 'upleft', 'left', 'upleft', 'left', 'left'], [0, 'up', 'up', 'upleft', 'left', 'left', 'left', 'left', 'left', 'left', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up'], [0, 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up', 'up'], [0, 'up', 'up', 'up', 'upleft', 'left', 'left', 'left', 'upleft', 'left', 'left', 'left', 'upleft', 'left', 'up', 'up', 'up', 'up', 'upleft'], [0, 'up', 'up', 'up', 'up', 'upleft', 'left', 'upleft', 'left', 'upleft', 'left', 'left

In [9]:
# Driver program to test the above function
X = "ABCBDAB"
Y = "BDCABA"
print("Length of LCS is ", bottom_up_lcs_path(X, Y)[0])
print("Path of LCS is ", bottom_up_lcs_path(X, Y)[1])


Length of LCS is  4
Path of LCS is  [[0, 0, 0, 0, 0, 0, 0], [0, 'up', 'up', 'up', 'upleft', 'left', 'upleft'], [0, 'upleft', 'left', 'left', 'up', 'upleft', 'left'], [0, 'up', 'up', 'upleft', 'left', 'up', 'up'], [0, 'upleft', 'up', 'up', 'up', 'upleft', 'left'], [0, 'up', 'upleft', 'up', 'up', 'up', 'up'], [0, 'up', 'up', 'up', 'upleft', 'up', 'upleft'], [0, 'upleft', 'up', 'up', 'up', 'upleft', 'up']]


In [10]:
def print_lcs(P, X, i, j):
    if i == 0 or j == 0:
        return
    if P[i][j] == "upleft":
        print_lcs(P, X, i - 1, j - 1)
        print(X[i-1])
    elif P[i][j] == "up":
        print_lcs(P, X, i - 1, j)
    else:
        print_lcs(P, X, i, j - 1)

In [11]:
P = bottom_up_lcs_path(X, Y)[1]
m = len(X)
n = len(Y)
print_lcs(P, X, m, n)

B
C
B
A


In [12]:
X = "AGGTDACTABCGLAGLADB"
Y = "GXTACBCACGLACGCGBA"
P = bottom_up_lcs_path(X, Y)[1]
m = len(X)
n = len(Y)
print_lcs(P, X, m, n)

G
T
A
C
A
C
G
L
A
G
A
