## Final Review

#### Randomized Algorithm
 

- Expected Work and Span - > We still have an extreme small probability to worst case
- <span style="color:red">**Question**:</span> Why do we consider the worst case for other algorithms?
- The expected number of trials to get an outcome of probability 𝒑 is 𝟏/𝒑.



- QuickSort

 <p>\[\begin{array}{ll}  
\mathit{quicksort}~a =  \\  
~~~~\texttt{if}~|a| = 0~\texttt{then}~a  \\  
~~~~\texttt{else}   \\  
~~~~~~~~\texttt{let}  \\  
~~~~~~~~~~~~p = \texttt{pick a random pivot from}~a  \\  
~~~~~~~~~~~~    a_1 = \left\langle\, x \in a \;|\; x < p \,\right\rangle  \\  
~~~~~~~~~~~~    a_2 = \left\langle\, x \in a \;|\; x = p \,\right\rangle  \\  
~~~~~~~~~~~~    a_3 = \left\langle\, x \in a \;|\; x > p \,\right\rangle  \\  
~~~~~~~~~~~~    (s_1,s_3) = (\mathit{quicksort}~a_1)~\mid\mid{}~(\mathit{quicksort}~a_3)  \\  
~~~~~~~~   \texttt{in}  \\  
~~~~~~~~~~~~    s_1 \texttt{++}{} a_2 \texttt{++}{} s_3  \\  
~~~~~~~~  \texttt{end}  
\end{array}\]</p>

<img src="module-05-random/sorting.jpg" width="60%">

#### Greedy Algorithm

> Optimal substructure: An optimal solution can be constructed from optimal solutions of smaller subproblems.

> Greedy choice: A greedy choice must be in some optimal solution (of a given size).

<span style="color:red">**Question**:</span> What algorithms belong to this so far we learned?

- Unit Scheduling Task
 **unit task scheduling problem**: a set of $n$ tasks $A = \{a_0, \ldots, a_{n-1}\}$. Each task $i$ has start and finish times $(s_i, f_i)$. The goal is to select a subset $S$ of tasks with no overlaps that is as large as possible.
 
<span style="color:red">**Question**:</span> What about weighted unit task? 

- Huffman Coding


|$$\sigma$$ |$$f(\sigma)$$| $$e'(\sigma)$$|
|-------|--|-------------|
| A     | 9 | 0   |
| B     | 1 |10  |
| C     | 1 |110 | 
| D     | 1 |111 |

So the optimal compression of $D$ can be achieved by identifying the encoding tree $T$ that minimizes:

$$C(T) = \sum_{\sigma\in\Sigma} f(\sigma)\cdot d_T(\sigma)$$


<span style="color:red">**Question**:</span> How to decode given the Huaffman Tree? Say `00000000010110111`

<span style="color:red">**Question**:</span> What is the tree depth for fixed-length coding?




## Dynamic Programming 

> Directed Acyclic Graph or DAG.
> Optimal Substructure

<span style="color:red">Question:</span>  What is the key difference with divide-and-conquer?

### 0-1 Knapsack


**Optimal Substructure for Knapsack**: For any set of objects $[n]$ and $W>0$, we have

$$v(OPT(n, W)) = \max \{v(n) + v(OPT([n-1], W - w(n))), \\~~~~~~v(OPT(n-1, W))\}.$$

### Minimal Edit Distance

**Optimal Substructure for Edit Distance**: Let $S$ and $T$ be strings of length $m$ and $n$. <span style="color:red">If we only consider insertions and deletions</span>, then, the optional substructure is:

$$\mathit{MED}(S, T) = 
\begin{cases}
\mathit{MED}(S[1:], T[1:]), \mbox{if}~~~S[0]=T[0] \\
1+\min\{\mathit{MED}(S[1:], T),\mathit{MED}(S, T[1:])\}, \mbox{otherwise} \\
\end{cases}
$$

<span style="color:red">If we only consider insertions, deletions and substitutions</span>, then, the optional substructure is:

$$\mathit{MED}(S, T) = 
\begin{cases}
\mathit{MED}(S[1:], T[1:]), \mbox{if}~~~S[0]=T[0] \\
1+\min\{\mathit{MED}(S[1:], T),\mathit{MED}(S, T[1:]), \mathit{MED}(S[1:], T[1:])\}, \mbox{otherwise} \\
\end{cases}
$$




In [3]:
test_cases = [('book', 'back'), ('kookaburra', 'kookybird'), ('elephant', 'relevant'), ('AAAGAATTCA', 'AAATCA')]
alignments = [('book', 'back'), ('kookaburra', 'kookybird-'), ('relev-ant','-elephant'), ('AAAGAATTCA', 'AAA---T-CA')]

def MED(S, T):
    # TO DO - modify to account for insertions, deletions and substitutions
    if (S == ""):
        return(len(T))
    elif (T == ""):
        return(len(S))
    else:
        if (S[0] == T[0]):
            return(MED(S[1:], T[1:]))
        else:
            return(1 + min(MED(S, T[1:]), MED(S[1:], T), MED(S[1:], T[1:])))


def fast_MED(S, T):
    m = len(S)
    n = len(T)
    # Fill MED[][] in bottom up manner
    fMED = [[0 for x in range(n + 1)] for x in range(m + 1)]
    for i in range(m + 1):
    for j in range(n + 1):
        if i == 0:
            fMED[i][j] = j # Min. operations = j
        elif j == 0:
            fMED[i][j] = i 
        # If last characters are same, ignore last char and recur for remaining string
        elif S[i-1] == T[j-1]:
            fMED[i][j] = fMED[i-1][j-1]
        else:
        # If last character are different, consider all possibilities and find minimum
            fMED[i][j] = 1 + min(fMED[i][j-1], fMED[i-1][j], fMED[i-1][j-1])
    return fMED



def fast_align_MED(S, T, fMED={}):
    # TODO - keep track of alignment
    fMED = fast_MED(S, T)
    S_align = []
    T_align = []
    i = len(S)
    j = len(T)
    while True:
        if(i == 0 and j==0):
            break
        else:
            insert = fMED[i][j-1]
            remove = fMED[i-1][j]
            sub = fMED[i-1][j-1]
            minimum = min(insert,remove,sub)
        if(sub == minimum):
            S_align = [S[i-1]] + S_align
            T_align = [T[j-1]] + T_align
            i = i-1                        
            j = j-1
        elif(insert == minimum):
            S_align = ['-'] + S_align
            T_align = [T[j-1]] + T_align
            j = j-1
        elif(remove == minimum):
            T_align = ['-'] + T_align
            S_align = [S[i-1]] + S_align
            i = i-1

    s_str = ""
    t_str = ""
    s_str = s_str.join(S_align) 
    t_str = t_str.join(T_align)
      
    return s_str, t_str


for i in range(len(test_cases)):
    S, T = test_cases[i]
    print('Recursive Results:\n')
    print(MED(S, T))
    
    print('Memorized Results:\n')    
    print(fast_MED(S, T)[-1][-1])

    print(fast_MED(S, T))
    
    align_S, align_T = fast_align_MED(S, T)
    print(align_S)
    print(align_T)
    print('\n')



Recursive Results:

2
Memorized Results:

2
[[0, 1, 2, 3, 4], [1, 0, 1, 2, 3], [2, 1, 1, 2, 3], [3, 2, 2, 2, 3], [4, 3, 3, 3, 2]]
book
back


Recursive Results:

4
Memorized Results:

4
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 0, 1, 2, 3, 4, 5, 6, 7, 8], [2, 1, 0, 1, 2, 3, 4, 5, 6, 7], [3, 2, 1, 0, 1, 2, 3, 4, 5, 6], [4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 1, 2, 3, 4, 5], [6, 5, 4, 3, 2, 2, 1, 2, 3, 4], [7, 6, 5, 4, 3, 3, 2, 2, 3, 4], [8, 7, 6, 5, 4, 4, 3, 3, 2, 3], [9, 8, 7, 6, 5, 5, 4, 4, 3, 3], [10, 9, 8, 7, 6, 6, 5, 5, 4, 4]]
kookaburra
kookybir-d


Recursive Results:

3
Memorized Results:

3
[[0, 1, 2, 3, 4, 5, 6, 7, 8], [1, 1, 1, 2, 3, 4, 5, 6, 7], [2, 2, 2, 1, 2, 3, 4, 5, 6], [3, 3, 2, 2, 1, 2, 3, 4, 5], [4, 4, 3, 3, 2, 2, 3, 4, 5], [5, 5, 4, 4, 3, 3, 3, 4, 5], [6, 6, 5, 5, 4, 4, 3, 4, 5], [7, 7, 6, 6, 5, 5, 4, 3, 4], [8, 8, 7, 7, 6, 6, 5, 4, 3]]
-elephant
rele-vant


Recursive Results:

4
Memorized Results:

4
[[0, 1, 2, 3, 4, 5, 6], [1, 0, 1, 2, 3, 4, 5], [2, 1, 0, 1, 2, 3

#### Graph Search

- BFS, DFS, Dijkstra, Bellman-Ford [<span style="color:red">Visiting Order per given source node</span>]

**Unweighted Graph:** Both work $O(|V| + |E|)$
Breadth-first: Sibling > Children 
Depth-first: Children > Sibling

**Positive Weighted Graph:**
Dijkstra's work $O(|E| \log |E|)$

**Negative Weighed Graph:**
Bellman-Ford's work $O(|V| \cdot |E|)$


<span style="color:blue">We can still apply BFS/DFS for weighted graph, Dijkstra for negative weighted graph. Just the solution is not correct.</span>
 






#### Minimal Spanning Tree

- Prim's Algorithm and Kruskal's Algorithm [<span style="color:red">Visiting Order per Step</span>]

<span style="color:green">Question:</span> Given some intermediate steps, can you quickly identify which algorithm it is using?




