## 1. Greedy
Assume you will drive from city A to city B along a highway and you wish to minimize the time you spend on adding gas to your tank. The following quantities are given to you:
- the capacity $C$ of your gas tank in liters
- the rate $F$ of fuel consumption in liters/mile
- the rate $r$ in liters/minute at which you can fill your tank at a gas station. $r$ is the same for all gas stations.
- the distances of the $n$ gas stations from your start point $x_1, x_2, \ldots, x_n$ along the highway. We know $x_1=0, x_1<x_2<\ldots<x_n$ and $x_i-x_{i-1} \leq C / F$ for $2 \leq i \leq n$.
You will start with an empty tank. For example, if you stop to fill your tank from 2 liters to 8 liters, you would have to stop for $6 / r$ minutes. The goal is to minimize the total time you stop for gas.
Consider the following two algorithms:
(a) Stop at every gas station, and fill the tank with just enough gas to make it to the next gas station.
(b) Stop if and only if you don't have enough gas to make it to the next gas station, and if you stop, fill the tank up all the way.
For each algorithm either prove or disprove that this algorithm solves the problem optimally. Your proof of correctness can use an exchange argument.


<div style="color:blue">

### Algorithm (a): Stop at Every Gas Station

This algorithm suggests filling the tank with just enough gas to make it to the next gas station at each stop.

**Revised Analysis:**
- **Potentially Optimal Approach:** Since there is no fixed penalty for each stop, the optimality of this algorithm depends on the relationship between the fueling rate and the distances between gas stations. If the distances between stations are such that you only add a small amount of fuel at each stop, this could minimize the time spent refueling.
- **Exchange Argument:** If you replace a stop at one station $x_i$ with not stopping there but stopping at the next station $x_{i+1}$, it does not necessarily reduce the total refueling time, because the total filling time for the fuel used to drive from $x_{i-1}$ to $x_{i+1}$ is still $F(x_{i+1} - x_{i-1}) / r$, as you would need to add more fuel at the next stop, which takes the same amount of time per liter regardless of where you stop.

### Algorithm (b): Stop Only When Necessary


- **Potentially Optimal Approach:** This approach minimizes the number of stops, but the time spent at each stop could be longer since you fill the tank completely. The total refueling time depends on the distances between the stations and the tank capacity.
- **Exchange Argument:** TODO

**Conclusion:**
- Both Algorithm (a) and (b) can be optimal depending on circumstances if without a fixed penalty for each stop.

</div>






## 2. Dynamic Programming


This problem is to determine the set of states with the smallest total population that can provide the votes to win the election.
Formally, the problem is: We are given a list of states $\{1, \ldots, n\}$ where each state $i$ has population $p_i$, and $v_i$, which is the number of electoral votes for state $i$. All electoral votes of a state go to a single candidate. The overall winning candidate is the one who receives at least $V$ electoral votes, where $V=\left(\sum_i v_i\right) / 2+1$. Our goal is to find a set of states $S$ that minimizes the value of $\sum_{i \in S} p_i$ subject to the constraint that $\sum_{i \in S} v_i \geq V$.
Please design a dynamic programming algorithm for this problem. Define the subproblem, give the recurrence relations, and analyze the time and space complexity of your algorithm. Remember to include steps to output the optimal set of states (you do not need to output all the optimal solutions if there are more than one.)


<div style="color:blue">

### Recurrence relation

This problem is a variant of the 0-1 Knapsack problem, where we are trying to minimize the total population while ensuring that the electoral votes meet a certain threshold. In this case, the "weight" of each item (state) is the number of electoral votes $v_i$, and the "value" to minimize is the population $p_i$. The capacity of the knapsack is the required number of electoral votes $V$.

For each state $i$ and each value of population $j$, we have 2 choices: to exclude or include state $i$

```
dp[i][j] = min(dp[i - 1][j], dp[i - 1][max(j - v_i, 0)] + p_i) 

```

* $dp[i - 1][j]$ is the minimum population without state $i$ to reach vote $j$.
* $dp[i - 1][max(j - v_i, 0)] + p_i$ is the minimum population to reach vote $j$ when including state $i$. Note that we need to take the

### Algorithm

* Initialize a 2D array with shape $(n + 1) \times (V + 1)$
* Initialize the base cases
  * Set $dp[i][0] = 0$ for all $i$ (0 electoral votes requires 0 population)
  * Set $dp[0][j] = \infty$ for $j > 0$ (Reaching a positive $V$ with 0 states is impossible)
* For i = 1 ... n:
  * For j = 1 ... V:
    * Use the recurrence relation to compute $dp[i][j]$
* The minimum total population is found at $DP[n][V]$

Trace back to find the optimal set of states

* Start at $(n, V)$
* If $dp[i][j] = dp[i - 1][j]$, then state $i$ is not included. We move to state $i - 1$
* If $dp[i][j] = dp[i - 1][j - v_i] + p_i$, then state $i$ is included. We add $i$ to the set of selected states and move to $i-1, j-v_i$
* Continue until we reach $(0, 0)$


Time Complexity

The time complexity is $O(nV)$, where $n$ is the number of states and $V$ is the required number of electoral votes. This is because we compute $DP[i][j]$ for each $i$ from 1 to $n$ and for each $j$ from 0 to $V$.

Space Complexity
The space complexity is also 
$O(nV)$ due to the storage of the DP array. However, if only the minimum total population is required (not the set of states), this can be reduced to $O(V)$ by using a 1D rolling array.



</div>

In [7]:
import numpy as np
def q2(votes, population):
    num_votes_to_win = int(np.ceil(sum(votes) / 2)) + 1
    num_states = len(votes)

    # Initialize DP table
    dp = np.ones((num_states + 1, num_votes_to_win)) * np.inf
    dp[0, :] = np.inf
    dp[:, 0] = 0

    # Populate DP table
    for i in range(1, num_states + 1):
        for j in range(1, num_votes_to_win):
            not_take = dp[i - 1, j]
            take = dp[i - 1, max(0, j - votes[i - 1])] + population[i - 1]
            dp[i, j] = min(not_take, take)

    # Backtrack to find the set of states
    states = []
    j = num_votes_to_win - 1
    for i in range(num_states, 0, -1):
        if dp[i][j] != dp[i - 1][j]:
            states.append(i)
            j -= votes[i - 1]

    return dp[num_states, num_votes_to_win - 1], states


In [6]:
votes, population = ([1, 2, 3, 4], [6, 7, 8, 5])
min_population, states = q2(votes, population)
print("Minimum population:", min_population)
print("States:", states)  # Should give states that add up to at least 6 votes

Minimum population: 11.0
States: [4, 1]


## 3. Dynamic Programming: merging two sequences

Let $X = \{x_1,x_2,...,x_m\}$ and $Y = \{y_1,y_2,...,y_n\}$ be two genomic sequence represented by strings ofletters and $C[i, j]$ be the cost function defined on pairs of letters, one from X and one from Y .
Our task is to merge these two sequences to create a new genomic sequence Z and we need to find the cheapest merge of X and Y , while maintaining the order of letters from both X and Y . Therefore, for instance, if X = $\{a,b,a,c\}$ and $Y = \{d,a,e,b\}$, then $Z = \{a,d,a,b,a,e,c,b\}$ and $Z = \{a,d,a,e,b,b,a,c\}$ are valid merges, but $Z = \{a, b, a, e, c, d, a, b\}$ is not because e from the second sequence is used before d and a.
Total cost of the merge is the sum of the merging costs of the adjacent letters from different sequence. So if Z includes xi and xi+1 as consecutive letters, there is no cost, but if Z has xsyt as consecutive letters, then there is a cost $C[s,t]$ and it should be added to the total cost of the merge. Note: You can assume that the cost function is symmetric.
Please design a dynamic programming algorithm to find the minimum cost of merging X and Y . Please define the subproblem(s) and give the recurrence relations. Analyze the time and space complexity of your algorithm. Backtracing step and pseudocode are not required.



<div style="color:blue">

Let $dp[i][j]$ represent the minimum cost of merging the first $i$ letters of sequence $X$ and the first $j$ letters of sequence $Y$.

### Recurrence Relation

The recurrence relation for $dp[i][j]$ will be based on the last letters added to the merged sequence. We have two choices: either the last letter comes from $X$ or from $Y$. We select the one that minimizes the cost.

1. If the last letter of the merged sequence is from $X$, we add the cost of merging $x_i$ with the last letter from $Y$ that was added to the merged sequence.
2. If the last letter of the merged sequence is from $Y$, we add the cost of merging $y_j$ with the last letter from $X$ that was added to the merged sequence.

Thus, the recurrence relation is:

$dp[i][j] = \min \{ dp[i-1][j] + C[x_i][y_{j+1}], dp[i][j-1] + C[x_{i+1}][y_j] \}$

### Base Case

The base case for the dynamic programming table will be when either $i = 0$ or $j = 0$, i.e., one of the sequences is empty. In such cases, the cost will be 0 since all elements are from the same sequence.

### Time Complexity

The time complexity of this algorithm is $O(m \times n)$ because we compute the cost for each pair of indices $(i, j)$, where $i$ ranges from $1$ to $m$ and $j$ ranges from $1$ to $n$.

### Space Complexity

The space complexity is also $O(m \times n)$ as we need to store the cost for each pair of indices in the dynamic programming table.

### Note

This algorithm assumes that the cost function $C$ is given and can be accessed in constant time. The costs are also assumed to be symmetric, as mentioned in the problem statement. The actual implementation would involve initializing and populating the dynamic programming table according to the recurrence relation and base cases outlined above.

</div>

### 4. NP-complete

The Degree-Constrained-ST problem is defined as follows: given an undirected, unweighted graph $G=(V, E)$, does there exist a spanning tree of this graph where each node in the spanning tree has a degree of at most $k$ ?

Prove that this problem is NP-Complete using the statement that the HamPath problem is NP-Complete.

In the HamPath problem, we are given a connected graph $G$, and are asked whether it contains a simple path that visits all vertices of the graph (the path should visit each vertex exactly once).

Please remember to include all steps of the NP-completeness proof.

<div style="color:blue">

To prove that the Degree-Constrained Spanning Tree (DC-ST) problem is NP-Complete, we will follow the standard approach to proving NP-completeness: first, we show that the problem is in NP, and then we demonstrate that a known NP-Complete problem can be reduced to it in polynomial time. We will use the Hamiltonian Path (HamPath) problem for the reduction, as it is a well-known NP-Complete problem.




### 1. DC-ST is in NP

A problem is in NP if a solution to the problem can be verified in polynomial time. For the DC-ST problem, given a spanning tree of a graph, it is straightforward to check in polynomial time whether each node in the spanning tree has a degree of at most $k$. This verification process involves checking the degree of each node, which can be done in $O(E)$.

### 2. Reduction from HamPath to DC-ST

To show that DC-ST is NP-Complete, we need to reduce an NP-Complete problem to it. We will use the HamPath problem for this purpose. The goal is to demonstrate that if we can solve DC-ST in polynomial time, then we can also solve HamPath in polynomial time.

#### Reduction Algorithm

Given an instance of HamPath, an undirected, connected graph $G = (V, E)$, we construct an instance of DC-ST as follows:

1. **Construct a Graph $G'$ for DC-ST**: We use the same graph $G$ as $G'$ for the DC-ST problem. So, $G' = G$.

2. **Set the Degree Constraint $k$**: We set $k = 2$ for the DC-ST problem. This is because in a Hamiltonian Path, each vertex (except possibly for the start and end vertices) has exactly two neighbors on the path.

#### Proving the Reduction

- **If HamPath has a solution, then DC-ST has a solution**: If there exists a Hamiltonian Path in $G$, then this path visits each vertex exactly once and forms a spanning tree of $G$ where each node has a degree of at most 2 (except possibly the start and end nodes). Therefore, this path is also a solution to the DC-ST problem with $k = 2$.

- **If DC-ST has a solution, then HamPath has a solution**: If there exists a spanning tree in $G$ where each node has a degree of at most 2, it forms a path that visits every vertex. Since each vertex is connected to at most two other vertices, this path does not repeat any vertex and is, therefore, a Hamiltonian Path.

### Conclusion

Since we have shown that DC-ST is in NP and that HamPath (an NP-Complete problem) can be polynomially reduced to DC-ST, it follows that DC-ST is NP-Complete. This reduction shows that solving DC-ST is at least as hard as solving HamPath, cementing its status as an NP-Complete problem.


</div>