# Longest Common Subsequence




In [1]:
project_name = 'longest-common-subsequence'
filename = 'lcs-template.ipynb'

In [2]:
%pip install jovian --upgrade --quiet

Note: you may need to restart the kernel to use updated packages.


In [3]:
import jovian

<IPython.core.display.Javascript object>

In [4]:
jovian.commit(project=project_name, privacy='secret', environment=None, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Creating a new project "zoibderg/longest-common-subsequence"[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

## Problem Statement

> Write a function to find the length of the **longest common subsequence** between two sequences. E.g. Given the strings "serendipitous" and "precipitation", the longest common subsequence is "reipito" and its length is 7.
>
> A "sequence" is a group of items with a deterministic ordering. Lists, tuples and ranges are some common sequence types in Python.
>
> A "subsequence" is a sequence obtained by deleting zero or more elements from another sequence. For example, "edpt" is a subsequence of "serendipitous".




#### General case

<img src="https://i.imgur.com/ry4Y0wS.png" width="420">

## The Method

Here's the systematic strategy we'll apply for solving problems:

1. State the problem clearly. Identify the input & output formats.
2. Come up with some example inputs & outputs. Try to cover all edge cases.
3. Come up with a correct solution for the problem. State it in plain English.
4. Implement the solution and test it using example inputs. Fix bugs, if any.
5. Analyze the algorithm's complexity and identify inefficiencies, if any.
6. Apply the right technique to overcome the inefficiency. Repeat steps 3 to 6.

This approach is explained in detail in [Lesson 1](https://jovian.ai/learn/data-structures-and-algorithms-in-python/lesson/lesson-1-binary-search-linked-lists-and-complexity) of the course. Let's apply this approach step-by-step.

## Solution


### 1. State the problem clearly. Identify the input & output formats.

While this problem is stated clearly enough, it's always useful to try and express in your own words, in a way that makes it most clear for you. 


**Problem**

> Given two sequences we are looking to find the length of the longest common subsequence (LCS) between the two sequences. 

<br/>


**Input**

1. **seq1** - A sequence, this could be a string, list, or even tuple. e.g. `serendipitous`
2. **seq2** - Another sequence. e.g. `precipitation`


**Output**

1. **len_lcs** - The length of the LCS of our 2 sequences. e.g. `7` (Here we speicify that this is the length of the LCS as we may want to know what the LCS is in the future.)


<br/>

Based on the above, we can now create a signature of our function:

In [5]:
def find_len_lcs(seq1, seq2):
    pass

Save and upload your work before continuing.

In [6]:
import jovian

In [7]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

### 2. Come up with some example inputs & outputs. Try to cover all edge cases.

Our function should be able to handle any set of valid inputs we pass into it. Here's a list of some possible variations we might encounter:

1. A sequence that is a string.
2. A sequence that is a list.
3. An 'empty' sequence.
4. 2 'empty' sequences
5. A sequence that is a subsequence of the other sequence.
6. 2 sequences that do not have any subsequences.
7. Multiple subsequences of the same length.
    - e.g. “abcdef” and “badcfe”


We'll express our test cases as dictionaries, to test them easily. Each dictionary will contain 2 keys: `input` (a dictionary itself containing one key for each argument to the function and `output` (the expected result from the function). 

In [8]:
# a sequence that is a string (general case)
T0 = {
    'input': {
        'seq1': 'serendipitous',
        'seq2': 'precipitation'
    },
    'output': 7
}

# a sequence that is a list (general case)
T1 = {
    'input': {
        'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3],
        'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]
    },
    'output': 5
}

# another general case
T2 = {
    'input': {
        'seq1': 'longest',
        'seq2': 'stone'
    },
    'output': 3
}

# 2 sequences that do not have any subsequences
T3 = {
    'input': {
        'seq1': 'asdfwevad',
        'seq2': 'opkpoiklklj'
    },
    'output': 0
}

# a sequence that is a subsequences of the other sequence
T4 = {
    'input': {
        'seq1': 'dense',
        'seq2': 'condensed'
    },
    'output': 5
}

# one empty sequence
T5 = {
    'input': {
        'seq1': '',
        'seq2': 'opkpoiklklj'
    },
    'output': 0
}

# two empty sequences
T6 = {
    'input': {
        'seq1': '',
        'seq2': ''
    },
    'output': 0
}

# mulpitle sub sequences of the same length
T7 = {
    'input': {
        'seq1': 'abcdef',
        'seq2': 'badcfe'
    },
    'output': 3
}

Create one test case for each of the scenarios listed above. We'll store our test cases in an array called `tests`.

In [9]:
lcs_tests = [T0, T1, T2, T3, T4, T5, T6, T7]

### 3. Come up with a correct solution for the problem. State it in plain English.

Our first goal should always be to come up with a _correct_ solution to the problem, which may not necessarily be the most _efficient_ solution. Come with a correct solution and explain it in simple words below:

1. Create two counters `idx1` and `idx2` starting at 0. Our recursive function will compute the LCS of `seq1[idx1:]` and `seq2[idx2:]`
2.  If `seq1[idx1]` and `seq2[idx2]` are equal, then this character belongs to the LCS of `seq1[idx1:]` and `seq2[idx2:]`. Further the length of this LCS is one more than LCS of `seq1[idx1+1:]` and  `seq2[idx2+1:]`
3. If not, then the LCS of `seq1[idx1:]` and `seq2[idx2:]` is the longer one among the LCS of `seq1[idx1+1:], seq2[idx2:]` and the LCS of `seq1[idx1:]`, `seq2[idx2+1:]`
4. 5. If either `seq1[idx1:]` or `seq2[idx2:]` is empty, then they do not have an LCS.


Let's save and upload our work before continuing.


In [10]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

###  4. Implement the solution and test it using example inputs. Fix bugs, if any.

In [11]:
def len_lcs_recursive(seq1, seq2, idx1=0, idx2=0):
    if idx1 == len(seq1) or idx2 == len(seq2):
        return 0

    elif seq1[idx1] == seq2[idx2]:
        return 1 + len_lcs_recursive(seq1, seq2, idx1+1, idx2+1)

    else:
        first_idx = len_lcs_recursive(seq1, seq2, idx1+1, idx2)
        second_idx = len_lcs_recursive(seq1, seq2, idx1, idx2+1)
        return max(first_idx, second_idx)


In [12]:
from jovian.pythondsa import evaluate_test_case, evaluate_test_cases

In [13]:
evaluate_test_case(len_lcs_recursive, T0)


Input:
{'seq1': 'serendipitous', 'seq2': 'precipitation'}

Expected Output:
7


Actual Output:
7

Execution Time:
737.29 ms

Test Result:
[92mPASSED[0m



(7, True, 737.29)

In [14]:
evaluate_test_cases(len_lcs_recursive, lcs_tests)


[1mTEST CASE #0[0m

Input:
{'seq1': 'serendipitous', 'seq2': 'precipitation'}

Expected Output:
7


Actual Output:
7

Execution Time:
752.158 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #1[0m

Input:
{'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3], 'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]}

Expected Output:
5


Actual Output:
5

Execution Time:
10.536 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #2[0m

Input:
{'seq1': 'longest', 'seq2': 'stone'}

Expected Output:
3


Actual Output:
3

Execution Time:
0.354 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #3[0m

Input:
{'seq1': 'asdfwevad', 'seq2': 'opkpoiklklj'}

Expected Output:
0


Actual Output:
0

Execution Time:
173.86 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #4[0m

Input:
{'seq1': 'dense', 'seq2': 'condensed'}

Expected Output:
5


Actual Output:
5

Execution Time:
0.195 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #5[0m

Input:
{'seq1': '', 'seq2': 'opkpoiklklj'}

Expected Output:
0


Actual Output:
0

Execution Tim

[(7, True, 752.158),
 (5, True, 10.536),
 (3, True, 0.354),
 (0, True, 173.86),
 (5, True, 0.195),
 (0, True, 0.003),
 (0, True, 0.002),
 (3, True, 0.069)]

In [15]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

### 5. Analyze the algorithm's complexity and identify inefficiencies, if any.

Due to this algorithms recursive nature, its complexity is exponental. The 'height' of a tree will be `m+n` and at each level we will need to make 2 decesions. `2*m+n`.

Big O - $O(2^{m+n})$

In [16]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

### 6. Apply the right technique to overcome the inefficiency. Repeat steps 3 to 6.

To overcome the inefficiency of going over our idxs multiple times, we will implement memoization. Storing idxs that we have already chosen so that we know not to test them again. This will save us from having to make the same decesions multiple times. 

In [17]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

### 7. Come up with a correct solution for the problem. State it in plain English.

Come with the optimized correct solution and explain it in simple words below:

1. Create a memo dictonary to store values of idx that we have chosen.
2. Store our key as idx1 and idx2
3. If our key (idx1 and idx2) is in our memo, problem is solved, return memo[key]
4. If idx1 or idx2 are equel to seq1 or seq2, there is no LCS, return 0
5. Otherwise, preform our recursive function from before. 

Let's save and upload our work before continuing.



In [18]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

### 8. Implement the solution and test it using example inputs. Fix bugs, if any.

In [27]:
def len_lcs_memo(seq1, seq2):
    memo = {}

    def recurseive(idx1=0, idx2=0):
        key = (idx1, idx2)
        if key in memo:
            return memo[key]

        elif idx1 == len(seq1) or idx2 == len(seq2):
            memo[key] = 0

        elif seq1[idx1] == seq2[idx2]:
            memo[key] = 1 + recurseive(idx1+1, idx2+1)

        else:
            memo[key] = max(recurseive(idx1+1, idx2), recurseive(idx1, idx2+1))

        return memo[key]
    return recurseive(0, 0)

In [28]:
evaluate_test_case(len_lcs_memo, T0)


Input:
{'seq1': 'serendipitous', 'seq2': 'precipitation'}

Expected Output:
7


Actual Output:
7

Execution Time:
1.224 ms

Test Result:
[92mPASSED[0m



(7, True, 1.224)

In [29]:
evaluate_test_cases(len_lcs_memo, lcs_tests)


[1mTEST CASE #0[0m

Input:
{'seq1': 'serendipitous', 'seq2': 'precipitation'}

Expected Output:
7


Actual Output:
7

Execution Time:
0.385 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #1[0m

Input:
{'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3], 'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]}

Expected Output:
5


Actual Output:
5

Execution Time:
0.169 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #2[0m

Input:
{'seq1': 'longest', 'seq2': 'stone'}

Expected Output:
3


Actual Output:
3

Execution Time:
0.117 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #3[0m

Input:
{'seq1': 'asdfwevad', 'seq2': 'opkpoiklklj'}

Expected Output:
0


Actual Output:
0

Execution Time:
0.263 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #4[0m

Input:
{'seq1': 'dense', 'seq2': 'condensed'}

Expected Output:
5


Actual Output:
5

Execution Time:
0.157 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #5[0m

Input:
{'seq1': '', 'seq2': 'opkpoiklklj'}

Expected Output:
0


Actual Output:
0

Execution Time:
0

[(7, True, 0.385),
 (5, True, 0.169),
 (3, True, 0.117),
 (0, True, 0.263),
 (5, True, 0.157),
 (0, True, 0.005),
 (0, True, 0.006),
 (3, True, 0.042)]

In [30]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

### 9. Analyze the algorithm's complexity and identify inefficiencies, if any.

While our memoization helps us not recurse over keys that we have already looked at, we still will need to go through the max of seq1 and seq2, due to the recursive nature of our algorithm. This reduces our time complexeity to $O(m*n)$.

### 10. Come up with a correct solution for the problem. State it in plain English.

Come with the optimized correct solution and explain it in simple words below:

1. Create a table of size $(n1+1) * (n2+1)$ initlized with 0s. Where $n1$ and $n2$ are the lengths of the sequencs. 
2. Table $[i][j]$ represents the longest common subsequencs of $seq1[:i]$ and $seq2[:j]$. 
3. If $seq1[i]$ and $seq2[j]$ are equal, then $table[i+1][j+1] = 1 + table[i][j]$
4. If $seq1[i]$ and $seq2[j]$ are not equal, then $table[i+1][j+1] = max(table[i][j+1], table[i+1][j])$


### 11. Implement the solution and test it using example inputs. Fix bugs, if any.

In [32]:
def len_lcs_dynamic(seq1, seq2):
    n1, n2 = len(seq1), len(seq2)
    table = [[0 for x in range(n2+1)] for x in range(n1+1)]

    for i in range(n1):
        for j in range(n2):
            if seq1[i] == seq2[j]:
                table[i+1][j+1] = 1 + table[i][j]
            
            else:
                table[i+1][j+1] = max(table[i][j+1], table[i+1][j])

    return table[-1][-1]

In [33]:
evaluate_test_case(len_lcs_dynamic, T0)


Input:
{'seq1': 'serendipitous', 'seq2': 'precipitation'}

Expected Output:
7


Actual Output:
7

Execution Time:
0.174 ms

Test Result:
[92mPASSED[0m



(7, True, 0.174)

In [34]:
evaluate_test_cases(len_lcs_dynamic, lcs_tests)


[1mTEST CASE #0[0m

Input:
{'seq1': 'serendipitous', 'seq2': 'precipitation'}

Expected Output:
7


Actual Output:
7

Execution Time:
0.173 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #1[0m

Input:
{'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3], 'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]}

Expected Output:
5


Actual Output:
5

Execution Time:
0.081 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #2[0m

Input:
{'seq1': 'longest', 'seq2': 'stone'}

Expected Output:
3


Actual Output:
3

Execution Time:
0.049 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #3[0m

Input:
{'seq1': 'asdfwevad', 'seq2': 'opkpoiklklj'}

Expected Output:
0


Actual Output:
0

Execution Time:
0.115 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #4[0m

Input:
{'seq1': 'dense', 'seq2': 'condensed'}

Expected Output:
5


Actual Output:
5

Execution Time:
0.067 ms

Test Result:
[92mPASSED[0m


[1mTEST CASE #5[0m

Input:
{'seq1': '', 'seq2': 'opkpoiklklj'}

Expected Output:
0


Actual Output:
0

Execution Time:
0

[(7, True, 0.173),
 (5, True, 0.081),
 (3, True, 0.049),
 (0, True, 0.115),
 (5, True, 0.067),
 (0, True, 0.009),
 (0, True, 0.007),
 (3, True, 0.05)]

In [35]:
jovian.commit(project=project_name, filename=filename)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "zoibderg/longest-common-subsequence" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/zoibderg/longest-common-subsequence[0m


'https://jovian.ai/zoibderg/longest-common-subsequence'

### 12. Analyze the algorithm's complexity and identify inefficiencies, if any.

Here our time complexity is still the same as memoization - $O(N1 * N2)$, despite this, this does not require extra memeory on the system, so this will function much quicker with extreamly large sequences. (Like DNA)