## Longest Common Subsequence Problem - Finding All LCS

Given two sequences, print all the possible longest common subsequences present in them.

    Input: X = XMJYAUZ
           Y = MZJAWXU
    
    Output: MJAU
    
    
    Input: X = ABCBDAB
           Y = BDCABA
           
    Output: BCAB, BCBA, BDAB
    
In the previous problem we worked out how to create a 2D array in order to determine the length of the longest common sequence. This lookup table can also be used to determine all possible subsequences.

Starting in the lower right corner, we know the length of the longest common sequence. The question is, which is the last letter of the sequence? We follow the trail by looking left and up to see if that number is also there.

1. If the number is not present: Add this letter to the start of the sequence and move diagonally left and start searching for the next letter

2. If the number is diagonally left: Move diagonally left

2. If the number is to the left but not to the top: Move to the left

3. If the number is to the top but not to the left: Move to the top

4. If the number is both to the left and to the top: Use recursion and  move both left and to the top.
    

In [1]:
def LCS(x, y):
    lookup = [[0 for _ in range(len(x)+1)] for _ in range(len(y)+1)]
    for b in range(len(y)):
        for a in range(len(x)):
            above = lookup[b][a+1]
            left = lookup[b+1][a]
            upleft = lookup[b][a]
            # use upleft to determine your value
            # this ensures the letter you're looking at isn't already being used as the last letter
            if x[a] == y[b]:
                lookup[b+1][a+1] = upleft + 1
            else:
                lookup[b+1][a+1] = max(above, left)
    return lookup

In [15]:
A = "ABCBDAB"
B = "BDCABA"
C = "BADJAR"
D = "BAD"
X = "XMJYAUZ"
Y = "MZJAWXU"

In [3]:
LCS(X, Y)

[[0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 1, 1, 1, 1, 1],
 [0, 0, 1, 1, 1, 1, 1, 2],
 [0, 0, 1, 2, 2, 2, 2, 2],
 [0, 0, 1, 2, 2, 3, 3, 3],
 [0, 0, 1, 2, 2, 3, 3, 3],
 [0, 1, 1, 2, 2, 3, 3, 3],
 [0, 1, 1, 2, 2, 3, 4, 4]]

In [8]:
LCS(A, B)

[[0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 1, 1, 1, 1, 1],
 [0, 0, 1, 1, 1, 2, 2, 2],
 [0, 0, 1, 2, 2, 2, 2, 2],
 [0, 1, 1, 2, 2, 2, 3, 3],
 [0, 1, 2, 2, 3, 3, 3, 4],
 [0, 1, 2, 2, 3, 3, 4, 4]]

In [10]:
def all_LCS(x, y, sequences=None, h=None, w=None, sequence=""):
    lookup = LCS(x, y)
    if sequences == None:
        sequences = []
    if h == None:
        h = len(y)
        w = len(x)
    max_number = lookup[h][w]
    if max_number == 0 and len(sequence) > 0:
        sequences.append(sequence)
    else:
        up = lookup[h-1][w]
        left = lookup[h][w-1]
        diag = lookup[h-1][w-1]
        # add the character to the sequence, move diagonally
        if max_number not in [up, left, diag]:
            sequence = y[h-1] + sequence
            all_LCS(x, y, sequences, h-1, w-1, sequence)
        elif max_number == diag:
            all_LCS(x, y, sequences, h-1, w-1, sequence)
        elif max_number == up and max_number == left:
            all_LCS(x, y, sequences, h-1, w, sequence)
            all_LCS(x, y, sequences, h, w-1, sequence)
        elif max_number == up:
            all_LCS(x, y, sequences, h-1, w, sequence)
        elif max_number == left:
            all_LCS(x, y, sequences, h, w-1, sequence)
    return sequences

In [11]:
all_LCS(X, Y)

['MJAU']

In [12]:
all_LCS(A, B)

['BDAB', 'BCAB', 'BCBA']

In [17]:
all_LCS(C, D)

['BABACBAABBDAB']