# Chapter 1 Dynamic Programming and Bellman Equations

## Example: 0-1 Knapsack Problem

In this example, we will solve the 0-1 knapsack problem using deterministic dynamic programming (DP).

### 0-1 Knapsack Problem

The 0-1 knapsack problem [1] can be described as follows.
Given $n$ items, each with a weight $w_i > 0$ and a value $v_i$, $i=0,1,...,n-1$. Given a knapsack that has a limited weight capacity $W>0$, determine which items to include so that the total weight is less than or equal to the given limit $W$ and the total value is as large as possible. Assume that all weights $w_i, i=0,1,...,n-1$ and $W$ are integers. Note that $v_i\in\mathbb{R}, i=0,1,...,n-1$ are not necessarily integers.

### Problem Formulation

We can formulate the 0-1 knapsack problem into a finite-horizon deterministic DP problem as follows:

- Stage $i$: the time when we make decision on item $i$.
    
    We determine whether to include each item one by one. Let stage $i$ be the time when we decide whether to include item $i$ or not, $i=0,1,...,n-1$. Let stage $n$ be the final stage after all decisions.
    
- State $s_i$: the remaining weight capacity of the knapsack at stage $i$.
    
    By assumption, $s_i$ is an integer between $0$ and $W$.
    
- Action $a_i$:

    - $a_i=1$: to include item $i$ in the knapsack;
    - $a_i=0$: not to include item $i$ in the knapsack.

    Note that if $w_i > s_i$, i.e., the weight of item $i$ is larger than the remaining weight capacity of the knapsack, then $a_i=0$.
    
- Reward $r_i(s_i,a_i)$: 
    
    $r_i(s_i,a_i)=v_i$ if $a_i=1$ and $w_i \le s_i$;
    otherwise, $r_i(s_i,a_i)=0$.
    
- State transition function $f_i(s_i, a_i)$: 
    \begin{align*}
        s_{i+1} = f_i(s_i, a_i) = \begin{cases}
                s_i - w_i, & a_i=1, w_i \le s_i \\
                s_i, & a_i=0.
                \end{cases}
    \end{align*}
    
- Goal: maximize the total reward $\sum_{i=0}^{n-1} r_i(s_i,a_i)$, i.e., the total value in the knapsack, by finding the optimal sequence of actions $a_0,\ldots.a_{n-1}$.

### Value Function

Let $V_i(s_i)$ denote the optimal value function for state $s_i$ at stage $i$, defined by
$$
V_i(s_i) = \max_{a_i,...a_{n-1}} \sum_{j=i}^{n-1} r_j(s_j, a_j).
$$
The optimal value function $V_i(s_i)$ can be interpreted as the largest total value we can obtain given remaining weight capacity $s_i$ and items $i,i+1,...,n-1$. Note that $V_{n}(s_{n}) = 0$ by definition. 

### Bellman Equation

Next, we can write down the Bellman equation for this problem.
Let
\begin{align*}
    \mathcal{A}_i = \begin{cases}
                        \{1, 0\} & s_i \ge w_i \\
                        \{0\} & s_i < w_i.
                    \end{cases}
\end{align*}
For $i=0,1,...,n-1$, we have
\begin{align*}
    V_i(s_i) = & \max_{a_i\in\mathcal{A}_i}\{r_i(s_i, a_i) + V_{i+1}(s_{i+1})\}\nonumber\\
             = & \begin{cases}
                    \max\{v_i + V_{i+1}(s_i - w_i), V_{i+1}(s_i)\} & s_i \ge w_i \\
                    V_{i+1}(s_i) & s_i < w_i.
                \end{cases}
\end{align*}
Then we can recursively solve $V_i(\cdot)$ using the Bellman equation.

## Reference

[1] Hans Kellerer, Ulrich Pferschy, and David Pisinger. Introduction. In Knapsack problems, pages 1–14. Springer, 2004.


## Codes

### Backward Search

We will calculate the optimal value function backward using the Bellman equation, i.e., compute the values of $V_{n}(s_{n}), V_{n-1}(s_{n-1}),...,V_0(s_0)$.

For the Python function `backward_cal` in the next cell, the inputs are `n`, `W`, `weights`, `values`:

   - `n`: the number of items, i.e., $n$.
 
   - `W`: the weight limit of the knapsack, i.e., $W$.

   - `weights`: the weights of the items. It is a numpy array with size $n$. `weights[i]` represents $w_i$.

   - `values`: the values of the items. It is a numpy array with size $n$. The precision is up to 4 decimal places. `values[i]` represents $v_i$.
 
The output `value_function` is the value function $V_i(s_i)$:

   - `value_function`: a numpy array with shape `(n + 1, W + 1)`. The precision is up to 4 decimal places. `value_function[i, s_i]` represents $V_i(s_i)$.
    
    For example, `value_function[0, 1]` is the value of $V_{0}(1)$ given the inputs.


In [3]:
# Import packages. Run this cell.

import numpy as np


In [4]:
def backward_cal(n, W, weights, values):
    """
    Calculate the optimal value function $V_{i}(s_i)$ using the Bellman equation
    Args:
        n: the number of items.
        W: the weight limit of the knapsack.
        weights: the weights of the items. It is a numpy array with size $n$. weights[i] represents $w_i$.
        values: the values of the items. It is a numpy array with size $n$. values[i] represents $v_i$.
    Returns:
        value_function: a numpy array with shape (n + 1, W + 1). value_function[i, s_i] represents $V_i(s_i)$.
    """
    value_function = np.zeros((n + 1, W + 1))

    for i in range(n, -1, -1):
        for j in range(W + 1):
            if i != n and j != 0:
                if weights[i] <= j:
                    value_function[i, j] = max(value_function[i + 1, j - weights[i]] + values[i], value_function[i + 1, j])
                else:
                    value_function[i, j] = value_function[i + 1, j]
    
    return value_function


In [5]:
# Sample Test, checking the output of the function backward_cal

# Sample input
n = 3
W = 3
weights = np.array([2, 1, 3])
values = np.array([8.0, 9.0, 10.0])

# Sample output
value_function = np.array([[ 0.0, 9.0, 9.0, 17.0],
                           [ 0.0, 9.0, 9.0, 10.0],
                           [ 0.0, 0.0, 0.0, 10.0],
                           [ 0.0, 0.0, 0.0, 0.0]])

# Sample test
func_out = backward_cal(n, W, weights, values)
for i in range(n + 1):
    for j in range(W + 1):
        assert round(func_out[i, j], 4) == round(value_function[i, j], 4), "The sample test failed."


### Forward Search

Assume that we have obtained the optimal value function $V_{i}(s_{i})$ for all $i$ and $s_i$. Then we can find the optimal sequence of actions $a_0,...,a_{n-1}$ forward.
 
For the Python function `find_optimal_actions` in the next cell,

Inputs:
   - `n`: the number of items, i.e., $n$.
 
   - `W`: the weight limit of the knapsack, i.e., $W$.
   
   - `weights`: the weights of the items. It is a numpy array with size $n$. `weights[i]` represents $w_i$.

   - `values`: the values of the items. It is a numpy array with size $n$. The precision is up to 4 decimal places. `values[i]` represents $v_i$.

   - `value_function`: the optimal value function $V_{i}(s_{i})$. It is a numpy array with shape `(n + 1, W + 1)`. The precision is up to 4 decimal places. `value_function[i, s_i]` represents $V_i(s_i)$. For example, `value_function[0, 1]` is the value of $V_{0}(1)$.

Output:
   - `opt_actions`: the optimal actions $a_0, a_1,...,a_{n-1}$. It is a numpy array with size $n$. `opt_actions[i]` represents $a_i$. For example, `opt_actions[0]=1` means that we determine to include item $0$ in the knapsack.


In [6]:
def find_optimal_actions(n, W, weights, values, value_function):
    """
    Find the optimal actions $a_0,...,a_{n-1}$ using the optimal value function.
    Args:
        n: the number of items, i.e., $n$.
        W: the weight limit of the knapsack, i.e., $W$.
        weights: the weights of the items. It is a numpy array with size $n$. weights[i] represents $w_i$.
        values: the values of the items. It is a numpy array with size $n$. values[i] represents $v_i$.
        value_function: optimal value function, a numpy array with shape (n + 1, W + 1). value_function[i, s_i] represents $V_i(s_i)$.
    Returns:
        opt_actions: the optimal actions, a numpy array with size n. opt_actions[i] represents $a_i$.
    """
    opt_actions = np.zeros((n,), dtype=int)

    s = W
    for i in range(n):
        if weights[i] > s:
            opt_actions[i] = 0
        else:
            if value_function[i + 1, s] >= value_function[i + 1, s - weights[i]] + values[i]:
                opt_actions[i] = 0
            else:
                opt_actions[i] = 1
                s = s - weights[i]
    
    return opt_actions


In [7]:
# Sample Test, checking the output of the function find_optimal_actions

# Sample input
n = 3
W = 3
weights = np.array([2, 1, 3])
values = np.array([8.0, 9.0, 10.0])
value_function = np.array([[ 0.0, 9.0, 9.0, 17.0],
                           [ 0.0, 9.0, 9.0, 10.0],
                           [ 0.0, 0.0, 0.0, 10.0],
                           [ 0.0, 0.0, 0.0, 0.0]])

# Sample output
opt_actions = np.array([1, 1, 0])

# Sample test
func_out = find_optimal_actions(n, W, weights, values, value_function)
for i in range(n):
    assert round(func_out[i]) == round(opt_actions[i]), "The sample test failed."
