# Chapter 1 Dynamic Programming and Bellman Equations

## Example: 0-1 Knapsack Problem

In this example, we will solve the 0-1 knapsack problem using deterministic dynamic programming (DP).

### 0-1 Knapsack Problem

The 0-1 knapsack problem [1] can be described as follows.
Given $n$ items, each with a weight $w_k > 0$ and a value $v_k$, $k=0,1,...,n-1$. Given a knapsack that has a limited weight capacity $W>0$, determine which items to include so that the total weight is less than or equal to the given limit $W$ and the total value is as large as possible. Assume that all weights $w_k, k=0,1,...,n-1$ and $W$ are integers. Note that $v_k\in\mathbb{R}, k=0,1,...,n-1$ are not necessarily integers.

### Problem Formulation

We can formulate the 0-1 knapsack problem into a finite-horizon deterministic DP problem as follows:

- Stage $k$: the time when we make decision on item $k$.
    
    We determine whether to include each item one by one. Let stage $k$ be the time when we decide whether to include item $k$ or not, $k=0,1,...,n-1$. Let stage $n$ be the final stage after all decisions.
    
- State $x_k$: the remaining weight capacity of the knapsack at stage $k$.
    
    By assumption, $x_k$ is an integer between $0$ and $W$.
    
- Action $u_k$:

    - $u_k=1$: to include item $k$ in the knapsack;
    - $u_k=0$: not to include item $k$ in the knapsack.

    Note that if $w_k > x_k$, i.e., the weight of item $k$ is larger than the remaining weight capacity of the knapsack, then $u_k=0$.
    
- Reward $r_k(x_k,u_k)$: 
    
    $r_k(x_k,u_k)=v_k$ if $u_k=1$ and $w_k \le x_k$;
    otherwise, $r_k(x_k,u_k)=0$.
    
- State transition function $f_k(x_k, u_k)$: 
    \begin{align*}
        x_{k+1} = f_k(x_k, u_k) = \begin{cases}
                x_k - w_k, & u_k=1, w_k \le x_k \\
                x_k, & u_k=0.
                \end{cases}
    \end{align*}
    
- Goal: maximize the total reward $\sum_{k=0}^{n-1} r_k(x_k,u_k)$, i.e., the total value in the knapsack, by finding the optimal sequence of actions $u_0,\ldots.u_{n-1}$.

### Value Function

Let $V_k(x_k)$ denote the optimal value function for state $x_k$ at stage $k$, defined by
$$
V_k(x_k) = \max_{u_k,...u_{n-1}} \sum_{j=k}^{n-1} r_j(x_j, u_j).
$$
The optimal value function $V_k(x_k)$ can be interpreted as the largest total value we can obtain given remaining weight capacity $x_k$ and items $k,k+1,...,n-1$. Note that $V_{n}(s_{n}) = 0$ by definition. 

### Bellman Equation

Next, we can write down the Bellman equation for this problem.
Let
\begin{align*}
    \mathcal{U}_k(x_k) = \begin{cases}
                        \{1, 0\} & x_k \ge w_k \\
                        \{0\} & x_k < w_k.
                    \end{cases}
\end{align*}
For $k=0,1,...,n-1$, we have
\begin{align*}
    V_k(x_k) = & \max_{u_k\in\mathcal{U}_k(x_k)}\{r_k(x_k, u_k) + V_{k+1}(x_{k+1})\}\nonumber\\
             = & \begin{cases}
                    \max\{v_k + V_{k+1}(x_k - w_k), V_{k+1}(x_k)\} & x_k \ge w_k \\
                    V_{k+1}(x_k) & x_k < w_k.
                \end{cases}
\end{align*}
Then we can recursively solve $V_k(\cdot)$ using the Bellman equation.

## Reference

[1] Hans Kellerer, Ulrich Pferschy, and David Pisinger. Introduction. In Knapsack problems, pages 1–14. Springer, 2004.


## Codes

### Backward Search

We will calculate the optimal value function backward using the Bellman equation, i.e., compute the values of $V_{n}(x_{n}), V_{n-1}(x_{n-1}),...,V_0(x_0)$.

For the Python function `backward_cal` in the next cell, the inputs are `n`, `W`, `weights`, `values`:

   - `n`: the number of items, i.e., $n$.
 
   - `W`: the weight limit of the knapsack, i.e., $W$.

   - `weights`: the weights of the items. It is a numpy array with size $n$. `weights[k]` represents $w_k$.

   - `values`: the values of the items. It is a numpy array with size $n$. The precision is up to 4 decimal places. `values[k]` represents $v_k$.
 
The output `value_function` is the value function $V_k(x_k)$:

   - `value_function`: a numpy array with shape `(n + 1, W + 1)`. The precision is up to 4 decimal places. `value_function[k, x_k]` represents $V_k(x_k)$.
    
    For example, `value_function[0, 1]` is the value of $V_{0}(1)$ given the inputs.


In [None]:
# Import packages. Run this cell.

import numpy as np


In [None]:
def backward_cal(n, W, weights, values):
    """
    Calculate the optimal value function $V_{k}(x_k)$ using the Bellman equation
    Args:
        n: the number of items.
        W: the weight limit of the knapsack.
        weights: the weights of the items. It is a numpy array with size $n$. weights[k] represents $w_k$.
        values: the values of the items. It is a numpy array with size $n$. values[k] represents $v_k$.
    Returns:
        value_function: a numpy array with shape (n + 1, W + 1). value_function[k, x_k] represents $V_k(x_k)$.
    """
    value_function = np.zeros((n + 1, W + 1))

    for k in range(n, -1, -1):
        for x in range(W + 1):
            if k != n and x != 0:
                if weights[k] <= x:
                    value_function[k, x] = max(value_function[k + 1, x - weights[k]] + values[k], value_function[k + 1, x])
                else:
                    value_function[k, x] = value_function[k + 1, x]
    
    return value_function


In [None]:
# Sample Test, checking the output of the function backward_cal

# Sample input
n = 3
W = 3
weights = np.array([2, 1, 3])
values = np.array([8.0, 9.0, 10.0])

# Sample output
value_function = np.array([[ 0.0, 9.0, 9.0, 17.0],
                           [ 0.0, 9.0, 9.0, 10.0],
                           [ 0.0, 0.0, 0.0, 10.0],
                           [ 0.0, 0.0, 0.0, 0.0]])

# Sample test
func_out = backward_cal(n, W, weights, values)
for k in range(n + 1):
    for x in range(W + 1):
        assert round(func_out[k, x], 4) == round(value_function[k, x], 4), "The sample test failed."


### Find the Optimal Sequence of Actions

Assume that we have obtained the optimal value function $V_{k}(x_{k})$ for all $k$ and $x_k$. Then we can find the optimal sequence of actions $u_0,...,u_{n-1}$ forward.
 
For the Python function `find_optimal_actions` in the next cell,

Inputs:
   - `n`: the number of items, i.e., $n$.
 
   - `W`: the weight limit of the knapsack, i.e., $W$.
   
   - `weights`: the weights of the items. It is a numpy array with size $n$. `weights[k]` represents $w_k$.

   - `values`: the values of the items. It is a numpy array with size $n$. The precision is up to 4 decimal places. `values[k]` represents $v_k$.

   - `value_function`: the optimal value function $V_{k}(x_{k})$. It is a numpy array with shape `(n + 1, W + 1)`. The precision is up to 4 decimal places. `value_function[k, x_k]` represents $V_k(x_k)$. For example, `value_function[0, 1]` is the value of $V_{0}(1)$.

Output:
   - `opt_actions`: the optimal actions $u_0, u_1,...,u_{n-1}$. It is a numpy array with size $n$. `opt_actions[k]` represents $u_k$. For example, `opt_actions[0]=1` means that we determine to include item $0$ in the knapsack.


In [None]:
def find_optimal_actions(n, W, weights, values, value_function):
    """
    Find the optimal actions $u_0,...,u_{n-1}$ using the optimal value function.
    Args:
        n: the number of items, i.e., $n$.
        W: the weight limit of the knapsack, i.e., $W$.
        weights: the weights of the items. It is a numpy array with size $n$. weights[k] represents $w_k$.
        values: the values of the items. It is a numpy array with size $n$. values[k] represents $v_k$.
        value_function: optimal value function, a numpy array with shape (n + 1, W + 1). value_function[k, x_k] represents $V_k(x_k)$.
    Returns:
        opt_actions: the optimal actions, a numpy array with size n. opt_actions[k] represents $u_k$.
    """
    opt_actions = np.zeros((n,), dtype=int)

    x = W
    for k in range(n):
        if weights[k] > x:
            opt_actions[k] = 0
        else:
            if value_function[k + 1, x] >= value_function[k + 1, x - weights[k]] + values[k]:
                opt_actions[k] = 0
            else:
                opt_actions[k] = 1
                x = x - weights[k]
    
    return opt_actions


In [None]:
# Sample Test, checking the output of the function find_optimal_actions

# Sample input
n = 3
W = 3
weights = np.array([2, 1, 3])
values = np.array([8.0, 9.0, 10.0])
value_function = np.array([[ 0.0, 9.0, 9.0, 17.0],
                           [ 0.0, 9.0, 9.0, 10.0],
                           [ 0.0, 0.0, 0.0, 10.0],
                           [ 0.0, 0.0, 0.0, 0.0]])

# Sample output
opt_actions = np.array([1, 1, 0])

# Sample test
func_out = find_optimal_actions(n, W, weights, values, value_function)
for k in range(n):
    assert round(func_out[k]) == round(opt_actions[k]), "The sample test failed."
