### Shortest Common Supersequence

The shortest common supersequence (SCS) is finding the shortest supersequence Z of given sequences X and Y such that both X and Y are subsequences of Z.

Example:

```
X: ABCBDAB
Y: BDCABA

Length of the SCS is 9.
The SCS are ABCBDCABA, ABDCABDAB, and ABDCBDABA
```

This problem has optimal substructure. The problem can be repeatedly broken down into smaller, simple subproblems until the solution becomes trivial.

1. Consider two sequences that end in the same element. Shorten each sequence by removing the last element, find the SCS of the shortened sequences, and then append the removed element.

``` 
CCBA and CDBA
Find SCS of CCB and CDB and append A
```

2. Consider two sequences that do not end in the same element. The SCS will end with either of the two final characters, so remove the last character from one sequence and find the SCS of that and the other, and append the removed character. Do the same with the other sequence, and choose the shorter of the two.

``` 
CCBA and CCB
Find shorter SCS of CCBA and CC with B appended and of CCB and CC with A appended.
```

In [2]:
"""
With just recursion
"""
def shortestCommonSupRec(s1, s2, sup=0):
    if len(s1) == 0 or len(s2) == 0:
        return sup + max(len(s1), len(s2))
    if s1[-1] == s2[-1]:
        return shortestCommonSupRec(s1[:-1], s2[:-1]) + 1
    return min(shortestCommonSupRec(s1[:-1], s2), shortestCommonSupRec(s1, s2[:-1])) + 1

In [10]:
s1 = "ABCBDAB"
s2 = "BDCABA"
s3 = "ABCBDBABADABSABASD"
s4 = "BASDVASBAADASBASDAB"

In [11]:
%%time
shortestCommonSupRec(s1, s2)

CPU times: user 164 µs, sys: 22 µs, total: 186 µs
Wall time: 191 µs


9

In [12]:
%%time
shortestCommonSupRec(s3, s4)

CPU times: user 5.31 s, sys: 22.5 ms, total: 5.34 s
Wall time: 5.4 s


25

To reduce runtime, we'll use a lookup table to remember our results for when our two sequences are of certain lengths.

In [25]:
def shortestCommonSupDy(s1, s2, lookup=None):
    if lookup == None:
        lookup = {}
    m = len(s1)
    n = len(s2)s
    if lookup.get(f'{m}-{n}'):
        return lookup[f'{m}-{n}']
    # key does not exist in lookup, have to assign it
    if m == 0 or n == 0:
        lookup[f'{m}-{n}'] = max(m, n)
    elif s1[-1] == s2[-1]:
        truncated = shortestCommonSupDy(s1[:-1], s2[:-1], lookup)
        lookup[f'{m}-{n}'] = truncated + 1
    else:
        truncated = min(shortestCommonSupDy(s1[:-1], s2, lookup), shortestCommonSupDy(s1, s2[:-1], lookup))
        lookup[f'{m}-{n}'] = truncated + 1
    return lookup[f'{m}-{n}']
        

In [26]:
%%time
shortestCommonSupDy(s1, s2)

CPU times: user 97 µs, sys: 1e+03 ns, total: 98 µs
Wall time: 99.9 µs


9

In [27]:
%%time
shortestCommonSupDy(s3, s4)

CPU times: user 1.16 ms, sys: 1e+03 ns, total: 1.16 ms
Wall time: 1.17 ms


25

### Bottom up approach

With the bottom up approach, we'll make a grid so we can build up from smaller sequences up to our given sequences.

We know that if we just had one sequence, then the supersequence would be the same as that sequence. So that establishes our first row and column as we build those first sequences from 0 characters up to the given number of characters.

From there, we move into the grid. So we start by comparing the first letter of each sequence. If they match, we just need the result from when both sequences didn't have that letter and add 1 (look to upper left answer, and add 1).

If they don't match, take the minimum supersequence from each sequence having that letter removed (left and upper) and add 1. Final answer is in the bottom right corner.

```
            A    B    C   B   C   B
         0  1    2    3   4   5   6  
     B   1  2    2    3   4   5   6
     C   2  3    3    3   4   5   6
     C   3  3    4    4   5   5   6   
     A   4  4    4    5   6   6   7
```

In [32]:
def shortestCommonSupDyBU(s1, s2):
    lookup = [[0] * (len(s1) + 1) for _ in range(len(s2) + 1)]
    # start off the first row and first column
    for r in range(len(lookup[0])):
        lookup[0][r] = r
    for c in range(len(lookup)):
        lookup[c][0] = c
    # fill in the rest of the lookup table by comparing characters of the two sequences
    for row in range(1, len(lookup)):
        for col in range(1, len(lookup[0])):
            if s1[col - 1] == s2[row - 1]:
                lookup[row][col] = lookup[row - 1][col - 1] + 1
            else:
                lookup[row][col] = min(lookup[row][col - 1], lookup[row - 1][col]) + 1
    return lookup[-1][-1]

In [34]:
%%time
shortestCommonSupDyBU(s1, s2)

CPU times: user 41 µs, sys: 1e+03 ns, total: 42 µs
Wall time: 56.3 µs


9

In [35]:
%%time
shortestCommonSupDyBU(s3, s4)

CPU times: user 207 µs, sys: 1e+03 ns, total: 208 µs
Wall time: 211 µs


25