# Dynamic Programming: Eating a Magic Cake

> Finding the optimal saving rate with dynamic programming.

- toc: true
- badges: true
- comments: true
- author: David Ngo
- image: images/cake.jpg
- categories: [economics, mathematics, python]

In layman terms, [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming) is a method of mathematical optimizaation, breaking down a complex problem into simpler parts and storing the results to save on computing power. We use dynamic programming to efficiently calculate the Bellman equation, which will be explain in the following sections.

## Background

Suppose a nice genie has given us a magic cake. The cake's initial size at time 0 is $y_0z_0$, some a known number. Once per day, we are allowed to eat a piece of any size from the cake. Overnight, the cake will regenerate with multiplicative random variable $z_t$ whose logarithm is independently and identically distributed $N(\mu, \sigma)$. This means we potentially have an unlimited source of cake! 

Suppose the genie places the condition that we must choose a constant fraction of the cake to eat each day. As rational economists, we want to fully maximize our happiness by eating the right amount of cake each day. For this experiment, we can assume our utility function to be $E\big[\sum_{t=0}^{\infty}\beta^tu(c_t)\big]$. 

## Exercise

We apply the following:

1. Guess that the optimal policy is to eat a constant fraction of the magic cake in every period. 
2. Assume that $u(c) = \frac{c^{1-r}-1}{1-r}$ for $r \geq 0$. Fix $z_0y_0=69$, $r=2$, $\beta = 0.95$, $\mu=0$, and $\sigma = 0.1$. 
4. Using np.random.seed(1234), generate a fixed sequence of 100 log normally distributed random variables. 
5. Use Python to search for the fraction that maximizes the expected reward for that fixed sequence. 

## Terminology

We use the following terminology:

- The **state** space, denoted $X$ is the size of the cake when we start with $(y_0, z_0)$.
- The **control,** denoted $U$, is the piece of any size from the cake that you choose to eat. So we have $u \in U$ such that $0 < u < 1$.
- The **law of motion** represents the transition into the next price. This is the the multiplicative random variable or shock $z_t$, whose logarithm is independently and identically distributed $N(\mu, \sigma)$.
- The **reward** $r$ is $r: X \times Y \rightarrow \mathbb{R}$, so we have the reward r(x,y) for $x \in X$ if we choose some fraction $u \in U$.
- The **discount factor** is $\beta$, how much we discount to forgo current felicity in favor of future felicity.

## Setting up the Dynamic Program

Let us reiterate what we know so far:

<ol>
    <li> The state space is the random size of the cake</li>  
    <li> The control space is how much one eats in period $t$</li> 
    <li> The law of motion is $x_{t+1} = z_{t+1}(x_t-c_t)$, where $x_t$ is the known size of the cake at time $t$. $z_{t+1}$ is lognormally distributed with parameters $\mu$ and $\sigma$.</li> 
    <li> The reward function is $u(c)$.</li>
    <li> The discount factor is $\beta$. </li>   
</ol>

For this problem, we can apply the Bellman equation:

$$
V(x_t) = max_{c_t \in [0,x_t]}\left\{ u(c_t) + \beta E[V\left(z_{t+1}(x_t-c_t)\right)|z_t] \right\}
$$

$V(\cdot)$ represents a value fuction where, for each $c$ given a value $0 \leq u \leq 1$, we maximize the expected value of the optimal plan. Essentially, the bellman equation lets us find the optimal solution of a complex, iterative problems by breaking it down into simpler parts—which is why we use dynmaic programming to solve this problem. An explanation of the bellman equation can be found [here](https://towardsdatascience.com/the-bellman-equation-59258a0d3fa7). 
