# The Knapsack Problem

Given $n$ items that have two properties
1. value $v_i > 0$
2. size $w_i \in \mathbb{Z}^+$

As well as a capacity $w \in \mathbb{Z}^+$.

Return a subset $S \subset \{1, 2, 3, \cdots, n\}$ that maximises 
$$
\sum_{i \in S}{v_i}
$$
while ensuring that
$$
\sum_{i \in S}{w_i} \leq W
$$


## Structure of Optimala Solution

Let $S$ = a max-value solution to an instance of Knapsack

### Case1: 
Suppose $n \notin S$, then $S$ must be optimal for the first $n-1$ item.

### Case 2: 
Suppose $n \in S$ then $S - \{n\}$ is an optimal solution for the first $n-1$ itesm with capacity
$$
W - w_n
$$

Let $V_{i, x} = $ value of the best solution that:
1. considers the first $i$ items
2. has total size $\leq x$

For $i \in \{1, 2, \cdots, n\}$ and any $x$,
$$
V_{i, x} = \max
\begin{cases}
V_{(i-1), x} & \text{case} \; 1 \; \text{, item}\; i \; \text{excluded} \\
V_{(i-1), (x-w_i)} + v_i & \text{case} \; 2 \; \text{, item}\; i \; \text{included} \\
\end{cases}
$$
We seek to choose the maximum of the 2 cases.

Note that if $w_i > x$, we can only carry out case 1.

Space of subproblems.
1. All possible prefix of items $\{1, 2, \cdots, i\}$
2. All possible residual capacities $x \in \{0, 1, 2, \cdots, W\}$



## Psuedocode
```
Knapsack():
    Let A = 2D array
    
    A[0, x] = 0 for x = 0, 1, 2, ... ,W

    For i = 1 to n:
        For x = 0 to W:
            
            if w_i > x:

                A[i, x] = A[i-1, x]

            else:
                A[i, x] = max(
                    A[i-1, x]
                    A[i-1, x-w_i] + v_i
                )

    Return A[n, W]
```

This returns the value of the optimal solution. We can reconstruct the list of jobs

```

i = n
x = W

while i > 0:

    if A[i-1, x] > A[i-1, x-w_i] + v_i: (case 1)
        i -= 1

        v_i not included

    else: (case 2)
        i -= 1
        x -= w_i

        v_i included

```

Example
![example](img/knapsack%20example.png)

# Sequence Alignment 

Given a pair of strings, compute the [Needleman-Wunsch Score](https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm#:~:text=The%20Needleman%E2%80%93Wunsch%20algorithm%20is%20still%20widely%20used%20for%20optimal,alignments%20having%20the%20highest%20score.)

The problem is stated in Week 1 of this section.

Let for some alphabet $\Phi$, and strings $X$, $Y$
$$
X = x_1, x_2, x_3, \cdots, x_m \\
Y = y_1, y_2, y_3, \cdots, y_n \\[10pt]
x_i, y_i \in \Phi
$$
Define a penalty for a gap and a penalty for matching $a$ and $b$
$$
\alpha_{gap} \geq 0 \\[10pt]
\alpha_{a,b} = 
\begin{cases}
0 & \text{if} \; a = b \\
\lambda & \text{otherwise}
\end{cases}
$$

Compute over all feasible alignments the alignment with the lowest penalty

## Structure of Optimal Solution

Lets try to identify the subproblem / structure to look at. 

Consider an optimal alignment of $X$, $Y$. Let's scrutinise the possible cases for the final positions of each string. 

Let $X^\prime = X - x_m$, $Y^\prime = Y - y_n$

Let the "induced alignment" be the optimal alignment with the last characters removed.

#### Case 1: ($x_m$, $y_n$ matched):

The induced alignment of $X^\prime$ and $Y^\prime$ remains optimal.

Proof: 

Assume that the induced alignment is not optimal. Then appending $x_m$ and $y_n$ to the end of the strings would result is a better alignmnent. This contradicts the assumption that we started with an optimal alingment

#### Case 2 ($x_m$ matched to gap):

The induced alignment of $X^\prime$ and $Y$ remains optimal.

#### Case 3 ($y_n$ matched to gap):

The induced alignment of $X$ and $Y^\prime$ remains optimal.

Therefore we can solve a given problem with respect to each of the three cases and a smaller subproblem.

Space of all subproblems $(X_i, Y_i)$
$$
X_i = \text{first} \; i \; \text{letters of} \; X \\
Y_i = \text{first} \; i \; \text{letters of} \; Y \\
$$

Let $P_{i,j}$ = penalty of optimal alingment of $X_i$ and $Y_j$

For all $i=1, 2, \cdots, m$ and $j=1, 2, \cdots, n$
$$
P_{i,j} = \min
\begin{cases}
\alpha_{x_i, y_j} + P_{(i-1), (j-1)} & \text{case 1} \\
\alpha_{gap} + P_{(i-1), j} & \text{case 2} \\
\alpha_{gap} + P_{i, (j-1)} & \text{case 3} \\
\end{cases}
$$

We seek to chose the minium over all 3 possibilites.

For the base cases,
$$
P_{i, 0} = P_{0, i} = i \times \alpha_{gap}
$$

## Pseudo code
```
Let A = 2D array

Initialise

For all i >= 0
    A[i, 0] = A[0, i] = i * p_gap

For i = 1 to m:
    For j = 1 to n:

        A[i, j] = min(
            A[i-1, j-1] + p_(x_i, y_j),
            A[i-1, j] + p_gap,
            A[i, j-1] + p_gap,
        )

```

This returns the minmal penalty score. We can reconstruct the actual alingment.

```
i = m
j = n

while i >=0 and j >=0
    if A[i, j] = A[i-1, j-1] + p_(x_i, y_j):
        Case 1
    elif A[i, j] = A[i-1, j] + p_gap:
        Case 2
    elif A[i, j] = A[i, j-1] + p_gap:
        Case 3

```

Fill in the other one that has not yet completed with gaps.


# Optimal Binary Search Trees

Out of all the possible binary search trees, can we optimise the search time of each key, given a set of known search frequencies for each key. 

For each of the items $1, 2, \cdots, n$. (in sorted order $1 < 2 < \cdots < n$). 

Let the frequencies be
$$
p_1, p_2, \cdots, p_n
$$

Compute a valid search tree that minimises the weighted search time
$$
C(T) = \sum_{i \in \; \text{items}}{(p_i \times [\text{depth of} \; i \; \text{in} \; T+1])}
$$

## Structure of Optimal Solution

Let's see if we can construct a optimal solution from an optimal solution to a smaller subproblem.

Suppose we have an optimal BST for keys $\{1, 2, \cdots, n\}$. Futher let the root of this optimal BST be $r$, the left subtree be $T_l$ and the right subtree be $T_r$.

Then, we know that
1. $T_l$ is optimal for $\{1, 2, \cdots, r-1\}$
2. $T_r$ is optimal for $\{r+1, r+2, \cdots, n\}$

Proof (by contradiction):

Suppose that $T_l$ is not optimal for $\{1, 2, \cdots, r-1\}$. Therefore, there must exist a more optimal tree $T^*_l$ such that
$$
C(T^*_l) < C(T_l)
$$

Now we can swap out $T_l$ for $T^*_l$ in $T$ to obtain a new BST that would be more optimal than $T$. this is a contradiciton.
$$
\begin{aligned}
C(T) &= \sum_{i=1}^{n}{p_i \cdot \text{search time for} \; i} \\
&= p_r \cdot 1 + \sum_{i=1}^{r-1}{p_i \cdot \text{search time for} \; i} + \sum_{i=r+1}^{n}{p_i \cdot \text{search time for} \; i} \\
\end{aligned}
$$
Note that for $T_l$, and likewise for $T_r$
$$
\begin{aligned}
\text{search time for} \; i \; \text{in} \; T &= 1 + \text{search time for} \; i \; \text{in} \; T_l \\
&= 1 + \text{search time for} \; i \; \text{in} \; T_r
\end{aligned}
$$
Therefore
$$
\begin{aligned}
C(T) &= p_r \cdot 1 + \sum_{i=1}^{r-1}{p_i \cdot \text{search time for} \; i} + \sum_{i=r+1}^{n}{p_i \cdot \text{search time for} \; i} \\
&= \sum_{i=1}^{n}{p_i} + \sum_{i=1}^{r-1}{p_i \cdot \text{search time for} \; i \; \text{in} \; T_l} + \sum_{i=r+1}^{n}{p_i \cdot \text{search time for} \; i \; \text{in} \; T_r} \\
&= \sum_{i=1}^{n}{p_i} + C(T_l) + C(T_r)
\end{aligned}
$$

For a given optimal solution, we will need both the prefix (left subtree) and the suffix (right subtree) in order to construct an optimal solution. However, in the subsequent calls, in order to memoise the necessary smaller subproblems, the subproblem space is infact
$$
s = \{i, i+1, \cdots, j-1, j\} \; \text{for every} \; i < j
$$
such that the algorithm is able to lookup any given interval that it might request.

For $1 \leq i \leq j$, let $c_{i,j}$ be the weighted search cost of an optimal BST for the items $\{i, i+1, \cdots, j-1, j\}$
$$
C_{i, j} = \min_{r =i}^{j}\left(
    \sum_{k=i}^{j} p_k + C_{i, (r-1)} + C_{(r+1), j}
\right)
$$

## Pseudocode
```
Let A = 2D array

For s = 0 to n-1:
    For i = 1 to n:

        A[i, i+s] = min over the range r = i to i+s (
            sum over k = i to i+s p_k 
            + A[i, r-1]
            + A[r+1, i+s]
        )

Return A[1, n]
```

In general this runs in $O(n^3)$ however, there is a [speedup by Knuth, Yao](https://cp-algorithms.com/dynamic_programming/knuth-optimization.html#references) that runs in only $O(n^2)$