# Dynamic Programming: Theory and Tools

This lecture today references material from:

* [QuantEcon lectures](https://quantecon.org/lectures/)
* [Dynamic Economics by Jerome Adda and Russell Cooper](https://mitpress.mit.edu/books/dynamic-economics)
* [Economic Dynamics: Theory and Computation by John Stachurski](https://mitpress.mit.edu/books/economic-dynamics)

We've previously referenced the first, but the two books mentioned here are also of exceptional quality.


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate as interp
import scipy.stats as st
import quantecon as qe

from ipywidgets import interact, IntSlider

%matplotlib inline

## What have we learned so far?

### Two applications

1. Shortest path: How do we find the shortest path connecting two pre-specified nodes on a (directed) graph?
2. Cake eating: How should an indidivual divide consumption of a cake over an infinite horizon?

The solutions to these two problems consisted of two functions:

1. Policy function: This is a rule that specifies what action to take given the "state"
  - In the shortest path problem, the policy function specifies which edge to follow (to move to the next node) given the node that you're currently at.
  - In the cake eating problem, the policy function specifies how much of the cake I should eat today given how much cake I have to begin the day.
2. Value function: This specifies the value (or cost) of following a particular policy function
  - In the shortest path problem, the value function tells us how much it will "cost" to follow the policy that gets us to the terminal node.
  - In the cake eating problem, this specifies the total utility, $\sum_t u(c_t)$, of having a particular amount of cake and following the given policy function going forward.

### Two algorithms

**Value function iteration**

Value function iteration focuses on finding the optimal value function.

We described (and used!) value function iteration in the context of both problems.

The algorithm is structured as,

1. Make a guess at the value function
2. Update the value function according to the Bellman equation
3. Check for convergence:
  - If the updated value function is "close enough" to the current value function, use that as the optimal value function
  - Otherwise, use the updated value function as the input to step 2

**Policy function iteration**

Policy function iteration focuses on finding the optimal policy function.

We described (and used!) policy function iteration in the context of the cake eating problem.

The algorithm is structured as,

1. Make a guess at the policy function
2. Find the value function associated with the specified policy function (by iterating on the value function).
3. Find the optimal policy function for the value function from step 2.
3. Check for convergence:
  - If the updated policy function is "close enough" to the current policy function, use that as the optimal policy function (and the value function from step 2 as the optimal value function)
  - Otherwise, use the updated policy function as the input to step 2

### New tools

**Interpolation**

Value function iteration and policy function iteration both occur in function space because we are iterating on functions.

Function space is a complicated thing to represent in a computer so we instead approximate the functions using interpolation on a grid of points. When we evaluate the convergence, rather than compare $||V_j - V_{j+1}||_p$ we compare $||\begin{bmatrix} V_j(x_0) & V_j(x_1) & \dots & V_j(x_n) \end{bmatrix} - \begin{bmatrix} V_{j+1}(x_0) & V_{j+1}(x_1) & \dots & V_{j+1}(x_n) \end{bmatrix}||_p$

Interpolation solves the following problem:

> Given a function $f$, $n$ points $x \in X$, and function evaluations at those points, $f(x) \; \forall x \in X$, construct an approximate function $\tilde{f}$ from a pre-specified class of functions that can be used to evaluate the function at $x \notin X$.
>
> Interpolation will typically enforce $\tilde{f}(x) = f(x) \; \forall x \in X$.

We briefly discussed piece-wise linear interpolation. We see an example of piece-wise linear interpolation below, but defer a more complete exploration of these tools because entire courses could be taught on interpolation methods and error bounding the function approximations etc...

In [None]:
def interp_example(n):
    # Original data
    x = np.linspace(0, 2*np.pi, n)
    y = np.sin(x)
    
    fig, ax = plt.subplots()

    # Plot the original data
    ax.scatter(x, y, color="k")

    # Create and plot interpolator
    x_interp = np.linspace(0, 2*np.pi, 10*n)
    pwl = interp.interp1d(x, y)
    y_interp = pwl(x_interp)
    ax.plot(x_interp, y_interp, color="b")

    # Plot exact function
    y_exact = np.sin(x_interp)
    ax.plot(x_interp, y_exact, color="r", linewidth=1.0, linestyle="--")

    pass

interact(
    interp_example,
    n=IntSlider(min=5, max=25, step=1, value=5)
);

## Fixed points

Value function iteration (and policy function iteration) rely on [fixed point theory](https://en.wikipedia.org/wiki/Fixed_point_(mathematics))

### What is a fixed point?

Let $T$ be a function that maps a given space $X$ into itself, i.e. $T: X \rightarrow X$.

A fixed point is an input, $x \in X$, such that $f(x) = x$ (or, equivalently, $f(x) - x = 0$).

**Examples**


Example 1: Let $T: \mathcal{R} \rightarrow \mathcal{R}$ be defined by $T(x) = x^2$.

Then $T(0) - 0 = 0^2 - 0 = 0$

Example 2: Let $\mathcal{F}$ represent the space of all functions and define $T: \mathcal{F} \rightarrow \mathcal{F}$ as $T(f) = (x + 0.1) + 0.9 f$. Let $f(x) \equiv 10x + 1$ then

\begin{align*}
  T(f) &= (x + 0.1) + 0.9f \\
  &= (x + 0.1) + 0.9(10x + 1) \\
  &= (x + 9x) + (0.1 + 0.9)  \\
  &= 10x + 1 = f \\
  &\rightarrow T(f) = f
\end{align*}

### How do we find fixed points?

Well... One choice, as illustrated above, is to guess and check.

We might want something slight more robust than that though so we turn to the [contraction mappings](https://en.wikipedia.org/wiki/Contraction_mapping) and [fixed-point theorems](https://en.wikipedia.org/wiki/Fixed-point_theorem)

**Contraction mapping**

A contraction mapping (on a metric space $(M, d)$) is a function $f$ from $M$ to itself with the property that there is some nonnegative real number, $0 \leq k \leq 1$ such that for all $x$ and $y$ in $M$, $d(f(x), f(y)) \leq k d(x, y)$

What does this mean in terms of value function iteration?

If we define $T$ as the Bellman operator (used in the update step in value function iteration) and let

$$d_0 = d(V_0, T(V_0)) = d(V_0, V_1)$$

and

$$d_1 = d(T(V_0), T(T(V_0))) = d(V_1, V_2)$$

then $d_1 < d_0$. As we continue this sequence, for any $\varepsilon > 0$, we could find $N$ such that $d(V_N, V_{N+1}) < \varepsilon$.

The above reasoning is what motivates the algorithm driving value function iteration (and policy function iteration).

The above is effectively a word description of the outcome of the Banach Fixed Point Theorem which states,

> Let $(M, d)$ be a non-empty complete metric space with a contraction mapping $T: M \rightarrow M$. Then $T$ admits a unique fixed-point, $x^* \in M$. Furthermore, $x^*$ can be found as follows: start with an arbitrary element $x_0 \in M$ and define a sequence $\{x_n\}$ by $x_n = T(x_{n-1})$ for $n \geq 1$ then $x_n \rightarrow x^*$.

## General formulation of dynamic programming

**Notation**

- Let $s_t$ be the *state variables*. The state variables tell us everything that we need to know in order to understand the value in period $t$ -- It summarizes the history of all information in the past that we need to make a forward looking decision.
- Let $a_t$ be the *control variables*. The control variables are what the agent can directly choose. We can potentially restrict $a_t \in \Gamma(s_t)$ where $\Gamma(s_t)$ is a non-empty compact set. We will often denote a particular policy as $\sigma(s_t) : S \rightarrow \Gamma(S)$.
- Let $r(s_t, a_t)$ be the reward function (flow value or flow utility). This notation acknowledges that we are going to focus on additvely separable utility -- i.e. $\sum_t \beta^t r(s_t, a_t)$.
- Let $F(s_t, a_t)$ be the transition function that expresses how the state variables move in response to the current state and control variables.
- Let $\beta$ be the discount factor with $0 < \beta < 1$. We could allow $\beta$ to be equal to 1 in cases where the horizon is finite...


We can then write the general Bellman equation as

\begin{align*}
  V(s_t) &= \max_{a_t \in \Gamma(s_t)} r(s_t, a_t) + \beta V(s_{t+1}) \\
  &\text{where }\\
  &s_{t+1} = F(s_t, a_t) \\
\end{align*}

### Mapping the general framework to the cake eating problem

* $s_t$ was the size of the cake because that summarized all relevant information needed to make decisions going forward. Finding the state in more complex problems sometimes requires some additional -- "Finding the state is an art"
* $a_t$ was how much cake to consume today
* $r(s_t, a_t)$ was the per-period utility function
* $\Gamma(s_t) = [0, x_t]$ - The agent can't consume more cake than available
* $F(s_t, a_t) \rightarrow x_t - c_t$ was the transition function
* $\beta$ mapped to $\beta$...


### Deterministic vs stochastic

What we've described so far only satisfies environments without randomness... What happens when $\{s_t\}$ includes a random variable?

Almost exactly the same! The main difference is that we need to update the transition function and take expectations over tomorrow:

Rather than use $s_{t+1} = F(s_t, a_t)$, we use $s_{t+1} = F(s_t, a_t, w_{t+1})$ where $w_{t+1}$ is a random variable realized in period $t+1$ and so is not known in $t$.

The Bellman equation becomes

\begin{align*}
  V(s_t) &= \max_{a_t \in \Gamma(s_t)} r(s_t, a_t) + \beta E \left[ V(s_{t+1}) \right] \\
  &\text{where }\\
  &s_{t+1} = F(s_t, a_t, w_{t+1}) \\
\end{align*}

## Numerically computing expectations

In stochastic dynamic programming, we will need to be able to evaluate expectations of random variables.

To keep things "simple", we are going to work though the examples of evaluating expectations of $g(x_{t+1}) = \sqrt{|x_{t+1}|}$ for different random variables

### Discrete random variables

In the case of a discrete random variable, evaluating an expectation reduces to summations and multiplications... Luckily, computers are very good at summing and multiplying.

Let

\begin{align*}
  x_{t+1} \sim \begin{cases}
    0.0 \; \text{ with probability } 0.25 \\
    2.5 \; \text{ with probability } 0.25 \\
    5.0 \; \text{ with probability } 0.25 \\
    25.0 \; \text{ with probability } 0.25 \\
  \end{cases}
\end{align*}

then $E[g(x_{t+1})] = \sum_i p(x_i) g(x_i) = \sum_i 0.25 \sqrt{|x_i|}$

In [None]:
probabilities = np.ones(4)*0.25
values = np.array([0.0, 2.5, 5.0, 25.0])

E_drv = np.sum(probabilities * np.sqrt(np.abs(values)))
E_drv

### Continuous random variables

In the case of continuous random variables, evaluating an expectation becomes integration. Exactly evaluating an integral is not a trivial exercise and there are many integrals that cannot be exactly evaluated even by a human...

We will need to resort to an approximation and there are two main ways forward:

1. Monte Carlo
2. Quadrature

In the examples that follow, we will assume that $x_{t+1} \sim N(0, 1)$.


**Monte Carlo**

[Monte Carlo integration](https://en.wikipedia.org/wiki/Monte_Carlo_integration) relies on the law of large numbers. If we draw enough samples from our random variable and evaluate the function on those random variables then in the limit, we should be able to evaluate our expectation.

In practice, this works well enough for low dimensional objects, but can become prohibitively expensive as the number of dimensions expands

In [None]:
def monte_carlo_example(n):
    d = st.norm()
    draws = d.rvs(n)
    x = np.linspace(-3.5, 3.5, 500)

    fig, ax = plt.subplots()

    # Plot the original data
    ax.hist(draws, density=True)
    ax.plot(x, d.pdf(x), color="k", linestyle="--")

    print(f"Approximated expectation is {np.mean(np.sqrt(np.abs(draws)))}")
    pass

interact(
    monte_carlo_example,
    n=IntSlider(min=25, max=10_000, step=250, value=25)
);

**Quadrature**

[Quadrature](https://en.wikipedia.org/wiki/Quadrature_(mathematics)) is a class of methods used to approximate the area under a curve (aka integrating...).

Given a function, $h(x)$, quadrature approximates the area under a curve by finding values $\{x_i\}_i$ and weights $\{w_i\}_i$ such that

$$\int_a^b h(x) dx \approx \sum_i w_i h(x_i)$$

One of the assumptions typically required for quadrature to work well is that the function can be approximated well by polynomials.

In [None]:
def gaussian_quadrature(n):
    d = st.norm()
    vals, weights = qe.quad.qnwnorm(n, 0.0, 1.0)

    y = np.sqrt(np.abs(vals))

    fig, ax = plt.subplots()

    # Plot the original data
    ax.scatter(vals, weights, color="k", linestyle="--")
    ax.plot(vals, y*weights, color="k")

    print(f"Approximated expectation is {np.sum(weights*y)}")
    pass

interact(
    gaussian_quadrature,
    n=IntSlider(min=5, max=75, step=5, value=5)
);

In this case, the integral is hard to approximate because it isn't well approximated by a polynomial -- It's non-differentiable.