# Longest common subsequecne 

Given two strings, s1 and s2, the task is to find the length of the Longest Common Subsequence. If there is no common subsequence, return 0. A subsequence is a string generated from the original string by deleting 0 or more characters, without changing the relative order of the remaining characters.

For example, subsequences of "ABC" are "", "A", "B", "C", "AB", "AC", "BC" and "ABC". In general, a string of length n has 2n subsequences.

```
Input: s1 = "ABC", s2 = "ACD"
Output: 2
Explanation: The longest subsequence which is present in both strings is "AC".

Input: s1 = "AGGTAB", s2 = "GXTXAYB"
Output: 4
Explanation: The longest common subsequence is "GTAB".

Input: s1 = "ABC", s2 = "CBA"
Output: 1
Explanation: There are three longest common subsequences of length 1, "A", "B" and "C". 

## [Naive Approach] Recursion - O(2 ^ min(m, n)) Time and O(min(m, n)) Space

The idea is to compare the last characters of s1 and s2. While comparing the strings s1 and s2 two cases arise:
- Match : Make the recursion call for the remaining strings (strings of lengths m-1 and n-1) and add 1 to result.
- Do not Match : Make two recursive calls. First for lengths m-1 and n, and second for m and n-1. Take the maximum of two results. 

Base case : If any of the strings become empty, we return 0.

For example, consider the input strings s1 = "ABX" and s2 = "ACX".
```
LCS("ABX", "ACX") = 1 + LCS("AB", "AC") [Last Characters Match]
LCS("AB", "AC") = max( LCS("A", "AC") , LCS("AB", "A") ) [Last Characters Do Not Match] 
LCS("A", "AC") = max( LCS("", "AC") , LCS("A", "A") ) = max(0, 1 + LCS("", "")) = 1
LCS("AB", "A") = max( LCS("A", "A") , LCS("AB", "") ) = max( 1 + LCS("", "", 0)) = 1
So overall result is 1 + 1 = 2

In [17]:
# A Naive recursive implementation of LCS problem

# Returns length of LCS for s1[0..m-1], s2[0..n-1]
def lcsRec(s1, s2, m, n):
    # Base Case: If Either string is empty, the length of LCS is 0
    if m==0 or n==0:
        return 0
    
    # If last charachter of both string matched
    if s1[m-1] == s2[n-1]:
        # Include this character in LCS and recur for remaining substrings
        return 1 + lcsRec(s1, s2, m-1, n-1)

    # If the last characters do not match
    # Recur for two cases:
    # 1. Exclude the last character of S1 
    # 2. Exclude the last character of S2 
    # Take the maximum of these two recursive calls
    else:
        return max(lcsRec(s1, s2, m, n-1), lcsRec(s1, s2, m-1, n))

def lcs(s1,s2):
    m = len(s1)
    n = len(s2)
    return lcsRec(s1,s2,m,n)

if __name__ == "__main__":
    s1 = "AGGTAB"
    s2 = "GXTXAYB"
    print(lcs(s1, s2))

4


## [Better Approach] Using Memoization (Top Down DP) - O(m * n) Time and O(m * n) Space

To optimize the recursive solution, we use a 2D memoization table of size (m+1)×(n+1)(m+1) \times (n+1)(m+1)×(n+1), initialized to −1-1−1 to track computed values. Before making recursive calls, we check this table to avoid redundant computations of overlapping subproblems. This prevents repeated calculations, improving efficiency through memoization or tabulation.

In [18]:
def lcsRec(s1, s2, m, n, memo):
    # Base Case
    if m==0 or n==0:
        return 0
    
    # Memoization: Check if already exists in memo Table
    if memo[m][n] != -1:
        return memo[m][n]
    
    # Match
    if s1[m-1]==s2[n-1]:
        memo[m][n] = 1 + lcsRec(s1, s2, m-1, n-1, memo)
        return memo[m][n]

    # Do not Match
    memo[m][n] = max(lcsRec(s1, s2, m, n-1, memo), 
                     lcsRec(s1, s2, m-1, n, memo))
    
    return memo[m][n]

def lcs(s1, s2):
    m = len(s1)
    n = len(s2)
    memo = [[-1 for _ in range(n + 1)] for _ in range(m + 1)]
    return lcsRec(s1,s2,m,n,memo)
    
if __name__ == "__main__":
    s1 = "AGGTAB"
    s2 = "GXTXAYB"
    print(lcs(s1, s2))

4


## [Expected Approach 1] Using Bottom-Up DP (Tabulation) - O(m * n) Time and O(m * n) Space
There are two parameters that change in the recursive solution and these parameters go from 0 to m and 0 to n. So we create a 2D dp array of size (m+1) x (n+1).  

We first fill the known entries when m is 0 or n is 0.
Then we fill the remaining entries using the recursive formula.

In [19]:
def lcs(s1, s2):
    m = len(s1)
    n = len(s2)

    # Initialising the matric of size (m+1)x(n+1) 
    dp = [[0]*(n+1) for _ in range(m+1)]

    # Building dp[m+1][n+1]in bottom-up fashion
    for i in range(1, m+1):
        for j in range(1, n+1):
            if s1[i-1] == s2[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j],
                               dp[i][j-1])
                
    # dp[m][n] contains length of LCS for S1[0..m-1] and S2[0..n-1]
    return dp[m][n]

if __name__ == "__main__":
    S1 = "AGGTAB"
    S2 = "GXTXAYB"
    print(lcs(S1, S2))

4


## A Space Optimized Solution of LCS

How to find the length of LCS in O(n) auxiliary space?
- One important observation in the above simple implementation is, in each iteration of the outer loop we only need values from all columns of the previous row. So there is no need to store all rows in our dp matrix, we can just store two rows at a time and use them. In that way, used space will be reduced from dp[m+1][n+1] to dp[2][n+1].

The recurrence relation for the Longest Common Subsequence (LCS) problem is: 

If the last character of s1 and s2 match:

dp[i][j] = 1 + dp[i-1][j-1]
if the last characters of s1 and s2 do not match, we take the maximum of two cases:
1. exclude the last character of s1 
2. exclude the last char of s2

dp[i][j] = max(dp[i-1][j],dp[i][j-1])

Base Case: when the length of either s1 or s2 is 0, LCS is 0.
for i = 0 or  j = 0 dp[i][j] = 0

In the recurrance relation one things that we can observe is for finding the current state dp[i][j]  we don't need to store the entire table, we only need to store the current row and the previous row because each value at position (i, j) in the table only depends on:

- The value directly above it (dp[i-1][j]),
- The value directly to the left (dp[i][j-1]),
- The value diagonally left above it (dp[i-1][j-1]).

Since only the previous row and the current row are required to compute the LCS, we can reduce the space complexity by using just two rows instead of the entire table. We use a 2D array of size 2 x (n+1) to store only two rows at a time.
We have used two array to store the previous and current row, prev for previous row and  curr for current row, once the iteration for current row is done, we will set prev = curr, so that curr row can serve as prev for next index.

In [20]:
def lcs(s1, s2):
    # Here n is for lenght of s1 & m is for length of s2
    n = len(s1)
    m = len(s2)

    # Initialize two arrays, 'prev' and 'cur', to store the DP values
    prev = [0] * (m + 1)
    cur = [0] * (m + 1)

    # Loop through the characters of both strings  to compute LCS
    for ind1 in range(1, n + 1):
        for ind2 in range(1, m + 1):
            if s1[ind1 - 1] == s2[ind2 - 1]:
                # If the characters match, increment LCS length by 1
                cur[ind2] = 1 + prev[ind2 - 1]
            else:
                # If the characters do not match, take 
                # the maximum of LCS
                # by excluding one character from s1 or s2
                cur[ind2] = max(prev[ind2], cur[ind2 - 1])

        # Update 'prev' to be the same as 'cur' for the next iteration
        prev = cur[:]

    # The value in 'prev[m]' represents the length of the Longest Common Subsequence
    return prev[m]


if __name__ == "__main__":
    s1 = "AGGTAB"
    s2 = "GXTXAYB"
    print(lcs(s1, s2))

4


## Using Single Array - O(m*n) Time and O(n) Space

In this approach, the space complexity is further optimized by using a single DP array, where: 

dp[j] represents the value of dp[i-1][j] (previous row's value) before updating. During the computation, dp[j] is updated to represent the current row value dp[i][j]

Now the recurrance relations become:

- If the characters s1[i-1] and s2[j-1] match, dp[j] = 1+ prev. Here, prev is a temporary variable storing the diagonal value (dp[i-1][j-1]).
- If the characters don't match, dp[j] = max(dp[j-1], dp[j]). Here dp[j] represents the value of dp[i-1][j] before updating and dp[j-1] represents the value of dp[i-1][j]. 

In [21]:
def lcs(s1, s2):
    m = len(s1)
    n = len(s2)

    dp = [0]*(n+1)

    for i in range(1, m+1):
        # prev stores the value from the previous row and previous column (i-1), (j -1)
        prev = dp[0]
        for j in range(1, n+1):
            # temp temporarily stores the current dp[j] before it gets updated
            temp = dp[j]
            if s1[i-1]==s2[j-1]:
                dp[j] = 1 + prev
            else:
                dp[j] = max(dp[j-1], dp[j])

            prev = temp

    # The last element of the list contains the length of the LCS
    return dp[n]
 
if __name__ == "__main__":
    s1 = "AGGTAB"
    s2 = "GXTXAYB"
    res = lcs(s1, s2)
    print(res)

4
