Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Note that this Pre-class Work is estimated to take **46 minutes**.

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = "Nahom Agize"
COLLABORATORS = ""

# CS110 Pre-class Work - Computational applications of dynamic programming and greedy algorithms

## Question 1 [time estimate: 18 minutes]
Complete the following functions, following the algorithms in Cormen et al.

In [None]:
def lcs_length(x, y):
    """
    Computes the length of an LCS of strings x and y.
    
    Inputs:
    - x, y: strings
    
    Outputs:
    - c: a list of lists of ints OR a numpy array. c[i,j] contains the 
    length of a LCS of x[:i] and y[:j]
    - b: a list of lists of strings OR a numpy array, containing the information
    used for LCS reconstruction (See Cormen et al.) Use "N" (North), "NW" 
    (North West), and "W" (West) that correspond to the directions of the arrows 
    used in Cormen et al.
    """
    # Define m and n, the length of the sequences we compare
    m = len(x)
    n = len(y)
    
    # Initialize both tables, for the pointers and the lengths of LCSs
    b = [[0 for i in range(n+1)] for j in range(m+1)]
    c = [[0 for i in range(n+1)] for j in range(m+1)]
    
    # For every element in the tables, define its value based on the previous ones
    for i in range(m+1):
        for j in range(n+1):
            
            # If the elements in the sequences are the same, add them to the count
            if x[i-1] == y[j-1]:
                c[i][j] = c[i-1][j-1] + 1
                # Add a pointer so we know where to follow in the table
                b[i][j] = "NW" # North-west
            
            # If the value "above" in the table is larger than the value "to the left", define the entry to be the larger
            # of the two, the one above, and set the pointer north
            elif c[i-1][j] >= c[i][j-1]:
                c[i][j] = c[i-1][j]
                b[i][j] = "N" # North
            
            else:
                c[i][j] = c[i][j-1]
                b[i][j] = "W" # West
    return c, b
    #raise NotImplementedError()

In [None]:
def print_lcs(b,x,i,j):
    """
    Finds a LCS.
    
    Inputs:
    - b: a list of lists of strings OR a numpy array, returned by lcs_length
    - x: string, an input to lcs_length
    - i, j: ints. print_lcs(b,x,i,j) returns a lcs of x[:i] and y[:j], where y
    is an input to lcs_length.
    
    Outputs:
    - lcs: list of strings, representing a LCS of x and y
    - length: int, the length of the LCS
    
    You can choose to actually PRINT OUT the LCS or not using the print function.
    
    """
    # Initialize an array to store the values in the longest common subsequence
    recursive_arr = []
    
    # Define a local recursive function to store the values we obtain in the recursion
    def recursion(b,x,i,j, subseq):
        if i == 0 or j == 0:
            return 0
        if b[i][j] == "NW":
            recursion(b, x, i-1, j-1, subseq)
            subseq.append(x[i-1])
        elif b[i][j] == "N":
            recursion(b, x, i-1, j, subseq)
        else:
            recursion(b, x, i, j-1, subseq)
        return subseq
    
    # Call the recursive function and populate the array initialized
    lcs = recursion(b,x,i+1,j+1, recursive_arr)
    length = len(lcs)
    
    return lcs, length
    #raise NotImplementedError()

In [None]:
import numpy as np
x, y = 'ambgdec', 'aubyci'
c, b = lcs_length(x, y)
assert(print_lcs(b,x,len(x)-1,len(y)-1)[0] == ['a', 'b', 'c'])
assert(print_lcs(b,x,len(x)-1,len(y)-1)[1] == 3)

x, y = 'xyqwsssazdesaqqf', 'xoppoypllzookjdef'
c, b = lcs_length(x, y)
assert(print_lcs(b,x,len(x)-1,len(y)-1)[0]  == ['x', 'y', 'z', 'd', 'e', 'f'])
assert(print_lcs(b,x,len(x)-1,len(y)-1)[1]  == 6)

## Question 2. (Adapted from Exercise 15-4.1 Cormen et al.) [time estimate: 3 minutes]
Use the functions built in Question 1 to find the LCS of ```'10010101'``` and ```'010110110'```. You should store the list that represents the LCS you found in a variable named ```lcs_q2```

In [None]:
seq1 = [1,0,0,1,0,1,0,1]
seq2 = [0,1,0,1,1,0,1,1,0]

lcs_q2 = print_lcs(lcs_length(seq1, seq2)[1], seq1, len(seq1)-1, len(seq2)-1)[0]
print(lcs_q2)
#raise NotImplementedError()

[0, 1, 0, 1, 0, 1]


## Question 3. (Adapted from Exercise 15-4.5 Cormen et al.) [time estimate: 15 minutes]
Complete the following function, making use of ```lcs_length``` and ```print_lcs```.

In [None]:
def lmis(lst):
    """
    Finds the Longest Monotonically Increasing Subsequence (LMIS) of a list 
    (lst) of n numbers in O(n^2) time. Note that a monotonically increasing 
    sequence is a sequence of numbers such that: a_1 <= a_2 <= ... <= a_n .
    
    Inputs:
    - lst: a list of ints
    
    Outputs:
    - out_lst: a list of ints, a longest monotonically increasing subsequence
    of lst
    """
    # If we sort the list, the LCS of lst and its sorted version will be equal to the LMIS of the list
    sorted_lst = lst
    sorted_lst = sorted(sorted_lst)
    
    out_lst = print_lcs(lcs_length(lst, sorted_lst)[1], lst, len(lst)-1, len(sorted_lst)-1)[0]
    
    return out_lst
    #raise NotImplementedError()

## Question 4 [time estimate: 5 minutes]
How would you devise a greedy algorithm to compute the longest common subsequence in a string? Explain your strategy step by step, and comment on any advantages/limitations over the dynamic programming approach. Provide a few test cases to check the validity of the greedy approach.

In a greedy approach, we take the smallest of the two strings and store that string in hashmap. Then, Iterate through the longer string to find if there are any matching characters. If there are, then append those characters in the output list to retrieve it later. 