# Dynamic programming II: sequential cotinuous choice
ECON 3127/4414/8014 Computational methods in economics  
Week 10  
Fedor Iskhakov  
<img src="../img/lecture.png" width="64px"/>

&#128214; Adda and Russell Cooper "Dynamic Economics. Quantitative Methods and Applications."
    *Chapters: 2,3*

## Plan for the lecture
1. Recap of last lecture on DP
2. Contraction mappings and fixed point theory
3. Cake eating problems in various formulations
4. Corresponding solution methods
5. Lab tomorrow: practice of formulating and solving a DP problem

### Recap: What is Dynamic Programming
**DP is a general algorithm design technique for solving problems with overlapping sub-problems.**

**"DP is recursive method for solving sequential decision problems"**  
&#128214; Rust 2006, _New Palgrave Dictionary of Economics_

**Bellman's Principle of Optimality**
"An optimal policy has a property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."  
&#128214; Bellman, 1957 "Dynamic Programming"

### Recap: Dynamic programming in economics
Macro:
- Stochastic growth models
- Consumption and savings
- Investment
- Heterogeneous agents and overlapping generaton models

Micro:
- Dynamic models of labor supply and job search
- Human capital accumulation
- Health process, insurance and long term care
- Durable consumption
- Numerical solutions to game-theoretic models


### Components of DP problem
- **State variables**  — vector of variables that describe all relevant information about the modeled decision process
- **Decision variables** — vector of variables describing the choices
- **Instantaneous payoff** — utility function, additively separable across time periods
- **Motion rules** — agent's beliefs of how state variable evolve through time, conditional on choices (controlled Markov process in Markovian problems)
- **Value function** — maximum attainable utility at any point of the state space (and time period)
- **Policy function** — mapping from state space to action space that returns the optimal choice

### Bellman equation (finite horizon)

\begin{eqnarray}
V_t(x_t) &=& 
\max_{\sigma_t \in \Sigma_t} \Big\{ u_t\big(x_t,d_t\big) 
+ \beta \mathbb{E}_t \big[ V_{t+1}(x_{t+1})  \big| x_t, d_t \big] \Big\}, t<T
\\
V_T(x_T) &=& 
\max_{\sigma_T \in \Sigma_T} \Big\{ u_T\big(x_T,d_T\big) \Big\}
\end{eqnarray}

- $t$ index of __time period__, $t=1,\dots,T$
- $x_t \in S$ state variables
- $d_t \in D$ decision varibles, $d_t=\sigma_t(x_t), t=1,\dots,T$
- $\sigma_t(x_t): S\rightarrow D$ policy function from __feasible__ set $\Sigma_t$
- $\mathbb{E}_t$ conditional expectation given $t$-motion rule
- time indexes can be dropped for the time-invariant elements
- solution is a __collection__ of policy functions $\{\sigma_t(x_t)\}_{t=1,\dots,T}$

### Bellman equation (infinite horizon)

$$
V(x) =
\max_{\sigma \in \Sigma} \Big\{ u\big(x,d\big) 
+ \beta \mathbb{E} \big[ V(x')  \big| x, d \big] \Big\}
$$

- $x,x' \in S$ state variables, $x'$ is next period state
- $d \in D$ decision varibles, $d=\sigma(x)$
- $\sigma(x): S\rightarrow D$ policy function from __feasible__ set $\Sigma$
- solution is a __single__ of policy functions $\{\sigma_t(x_t)\}_{t=1,\dots,T}$
- fixed point problem for value function $V(x)$

### What is value function?

\begin{eqnarray}
V(x) &=& 
\max_{\sigma \in \Sigma} \Big\{ u\big(x,d\big) 
+ \beta \mathbb{E} \big[ V(x')  \big| x, d \big] \Big\}
\\
&=& 
\max_{\sigma \in \Sigma} \Big\{ u\big(x,d\big) 
+ \beta \mathbb{E} \big[ 
\max_{\sigma \in \Sigma} \Big\{ u\big(x',d'\big) 
+ \beta \mathbb{E} \big[ V(x'')  \big| x', d' \big] \Big\}
\big| x, d \big] \Big\}
\\
&\dots&\\
&=&
\max_{\sigma \in \Sigma} \mathbb{E} \sum_{t=0}^{\infty} \beta^t u\big(x,\sigma(x)\big) 
\end{eqnarray}

Maximum expected utility at each point of state space (both infinite and finite horizon)


### Important characteristics of DP problems
1. Dimensions/cardinality of states and choices
1. Finite / infinite horizon
1. Discrete / continuous / discrete-continuous choice
1. Discrete / continuous states

### Solution methods
Finite horizon $\leftrightarrow$ recursive computation
1. Backward induction = VFI without fixed point

Infinite horizon $\leftrightarrow$ fixed point problem
1. Value fucntion iterations (VFI) = back.ind. with convergence
2. Policy iterations
3. Combination of the two (Newton-Kantorovich step)

Plus special methods for particular model classes

## Infinite horizon $\leftrightarrow$ fixed point problem

**Contraction mapping**

Let $(S,\rho)$ be a metric space, and $T: S \rightarrow S$.
The mapping $T$ is _contraction mapping_ if

$$
||T(s)-T(s')|| < ||s-s'|| \text{ for } \forall s \ne s' \in S,
$$

where $||\cdot||$ is the norm that generates metric $\rho$.

### Value of annuity

$$
\stackrel{\nearrow}{V} \quad
\stackrel{\searrow}{c} \quad
\stackrel{\searrow}{c} \quad
\stackrel{\searrow}{c} \quad
\dots
$$

$$
V=\quad
\frac{c}{(1+r)^0} + \quad
\frac{c}{(1+r)^1} + \quad
\frac{c}{(1+r)^2} + \quad
\frac{c}{(1+r)^3} + \quad
\dots
$$

- interest rate $r$
- $V$ can be found from the "Bellman equation"

$$
V = c + \frac{1}{1+r} V = T(V)
$$


### Backward induction/VFI algorithm

1. Start with a guess $V_0$
2. Insert into the Bellman equation

$$
V_{i+1} = c + \frac{1}{1+r} V_i = T(V_i)
$$

3. Repeat until convergence

$$
||V_{i}-V_{i-1}||\leq\varepsilon\text{ (small number)}
$$


### Contraction mapping?


$$
||V_{i}-V_{i-1}|| = ||(c+\beta V_{i-1})-(c+\beta V_{i-2})||=\beta ||V_{i-1}-V_{i-2}||
$$

- If $\beta<1$ with every iteration the difference $||V_{i}-V_{i-1}||$ becomes **strictly smaller**
- Recursive formula (Bellman equation) is **contraction mapping**
- Banach fixed point theorem guarantees unique solution!

$$
V'=T(V) \; \Rightarrow ||T(V')-T(V)|| = \beta ||V'-V|| < ||V'-V||
$$

### Blackwell conditions for contraction mapping

Let $B(\mathbb{R}^n,\mathbb{R})$ be a set of bounded functions $f:\mathbb{R}^n \rightarrow \mathbb{R}$, with the sup norm. If an operator $T:B(\mathbb{R}^n,\mathbb{R}) \rightarrow B(\mathbb{R}^n,\mathbb{R})$ satisfies
1. Monotonicity
$$
\forall f,g \in B(\mathbb{R}^n,\mathbb{R}), f(x)\le g(x) \forall x \Longrightarrow Tf(x) \le Tg(x) \forall x,
$$
2. Discounding
$$
\exists \beta \in (0,1) \text{ such that } \forall f \in B(\mathbb{R}^n,\mathbb{R}), x \in  \mathbb{R}^n,
\text{ and } \alpha>0
$$
$$
\text {we have } T\big( f(x) + \alpha \big) \le  T\big( f(x)  \big) + \beta\alpha,
$$

then $T$ is a contraction.

### Blackwell conditions apply for Bellman equation

Assuming the utility if bounded:

1. Monotonicity is satisfied due to maximization inside $T(V)$
2. Discounting is satisfied by elementary argument when $\beta<1$

**The Bellman operator is contraction mapping under typical conditions!**

### Banach fixed point theorem

Let $(S,\rho)$ be a complete metric space with a contraction mapping $T: S \rightarrow S$.
Then 
1. $T$ admits a unique fixed-point $V^{\star} \in S$, i.e. $T(V^{\star}) = V^{\star}$. 
2. $V^{\star}$ can be found by repeated application of the operator $T$, i.e. $T^n(V) \rightarrow V^{\star}$ as $n\rightarrow \infty$.

**Global solution method for infinite horizon DP problems!**

### Cake eating problem

![cake](img/cake.png)

<img src="img/cake.png" width=100px>
- Cake of initial size $W_0$
- **How much of the cake to eat each period $t$?**
- Time is discrete, $t=1,2,\dots$
- What is not eaten in period $t$ is left for the future

$$W_{t+1}=W_t-c_t$$

<img src="img/cake.png" width=100px>
- Utility flow from cake consumption

$$
u(c_{t})=\log(c_t)
$$

- Future is discounted with discount factor $\beta$
- Optimization problem: 

$$
\max_{\{c_{t}\}_{0}^{\infty}}\sum_{t=0}^{\infty}\beta^{t}u(c_{t})
\longrightarrow \max
$$

<img src="img/cake.png" width=100px>
**Value function $V(W_t)$** = the maximum attainable value given the size of cake $W_t$ (in period $t$)

$$
\begin{eqnarray*}
  V(W_{0}) & = & \max_{\{c_{t}\}_{0}^{\infty}}\sum_{t=0}^{\infty}\beta^{t}u(c_{t})\\
  & = & \max_{c_{0}}\{u(c_{0})+\beta\max_{\{c_{t}\}_{1}^{\infty}}\sum_{t=1}^{\infty}\beta^{t-1}u(c_{t})\}\\
  & = & \max_{c_{0}}\{u(c_{0})+\beta V(W_{1})\}
\end{eqnarray*}
$$

<img src="img/cake.png" width=100px>
**Bellman equation**

$$
V(W_{t})=\max_{0 \le c_{t} \le W_t}\big\{u(c_{t})+\beta V(\underset{=W_{t}-c_{t}}{\underbrace{W_{t+1}}})\big\}
$$

- **State variables**  — vector of variables that describe all relevant information about the modeled decision process, $W_t$
- **Decision variables** — vector of variables describing the choices, $c_t$
- **Instantaneous payoff** — utility function, $u(c_t)$, with time separable discounted utility
- **Motion rules** — agent's beliefs of how state variable evolve through time, conditional on choices, $W_{t+1}=W_t-c_t$
- **Value function** — maximum attainable utility, $V(W_t)$
- **Policy function** — mapping from state space to action space that returns the optimal choice, $c^{\star}(W_t)$

### Cake eating: analytic solution
- Start with a (good) guess of $V(W)=A+B\log W$
$$
\begin{eqnarray*}
  V(W) & = & \max_{c}\big\{u(c)+\beta V(W-c)\big\} \\
  A+B\log W & = & \max_{c} \big\{\log c+\beta(A+B\log (W-c)) \big\}
\end{eqnarray*}
$$
- Determine $A$ and $B$ and find the optimal rule for cake consumption.
- This is only possible in **few** models!

$$
c^{\star}(W) = \arg\max_{c}\big\{\log(c)+\beta V(W-c)\big\} = (1-\beta)W
$$

$$
A=\frac{\log(1-\beta)}{1-\beta} + \frac{\beta \log(\beta)}{(1-\beta)^2},\quad
B=\frac{1}{1-\beta}
$$



### Cake eating: numeric solution
- Have to solve the _functional equation_ for $V(W)$
- The Bellman operator in functional space
$$
T({V})(W) \equiv \max_{0 \le c \le W}\{u(c)+\beta {V}(W-c)\}
$$


- The Bellman equations is then $V(W) = T({V})(W)$, with the solution given by the fixed point


### Can we find the fixed point by iterations?

**Standard fixed point argument applies**

- Bellman operator is contraction mapping $\Rightarrow$
- Unique fixed point $\Leftrightarrow$ unique solution to the Bellman equation
- The fixed point can be reached by an iterative process using an **arbitrary
initial guess**!
- Therefore VFI algorithm converges globally


### Value function iterations (VFI)
1. Start with an arbitrary guess $V_0(W)$
2. At each iteration $i$ compute 

$$
\begin{eqnarray*}
V_i(W) = T(V_{i-1})(W) &=& 
\max_{0 \le c \le W} \big\{u(c)+\beta V_{i-1}(W-c) \big \}  \\
c_{i-1}(W) &=& 
\underset{0 \le c \le W}{\arg\max} \big\{u(c)+\beta V_{i-1}(W-c) \big \} 
\end{eqnarray*}
$$

3. Repeat until convergence 

$$
||V_{i}(W)-V_{i-1}(W)||\leq\varepsilon\text{ (small number,} ||\cdot|| \text{ sup norm)}
$$


### Numerical implementation of the Bellman operator
- Cake is continuous $\rightarrow$ value function is a function of continuous variable
- Solution: **discretize $W$**  
Construct a _grid_ (vector) of cake-sizes  $\vec{W}\in\{0,\dots\overline{W}\}$

$$V_{i}(\vec{W})=\max_{0 \le c \le \vec{W}}\{u(c)+\beta V_{i-1}(\vec{W}-c)\}$$

- Compute value and policy function sequentially point-by-point
- May need to compute the value function _between grid points_
$\Rightarrow$
Interpolation and function approximation

### Can interpolation be avoided?
- Note that conditional on $W_t$, the choice of $c$ defines $W_{t+1}$ 
- Can replace $c$ with $W_{t+1}$ in Bellman equation so that _next period cake size is the decision variable_
- "Dual" formulation of the same problem 

$$
V_{i}(\vec{W})=\max_{0 \le \vec{W}' \le \vec{W}}\{u(\vec{W}-\vec{W}')+\beta V_{i-1}(\vec{W}')\}
$$

- Compute value and policy function sequentially point-by-point
- Note that grid $\vec{W}\in\{0,\dots\overline{W}\}$ is used twice: for state space and for decision space
- _Can you spot the potential problem?_

In [None]:
import numpy as np
import matplotlib.pyplot as plt
# plotting parameters
plt.rcParams['axes.autolimit_mode'] = 'round_numbers'
plt.rcParams['axes.xmargin'] = 0
plt.rcParams['axes.ymargin'] = 0
plt.rcParams['patch.force_edgecolor'] = True
from cycler import cycler
plt.rcParams['axes.prop_cycle'] = cycler(color='bgrcmyk')

In [None]:
class cake:
    '''Cake eating model fundamentals'''

    def __init__(self):
        '''Cake eating default parameters'''
        self.beta=.9        # Discount factor
        self.Wbar=10        # Upper bound on cake size
        self.ngrid=100      # Number of grid points
        self.ngridd=500     # Number of grid points for decisions
        self.maxiter=1000   # Maximum number of iterations
        self.tol=1e-4       # Convergence tolerance
        # analytical solution
        self.apolicy = lambda w: w*(1-self.beta) 
        self.avalue = lambda w: np.log(1-self.beta)/(1-self.beta) + self.beta*np.log(self.beta)/((1-self.beta)**2) + np.log(w)/(1-self.beta)
        
    def utility(self,x):
        '''Utility function'''
        return np.log(x)    
    
    def marginal_utility(self,x):
        '''Marginal utility function'''
        return 1/x
    
    def inverse_marginal_utility(self,x):
        '''Inverse marginal utility function'''
        return 1/x

In [None]:
def solve1(model,plotting=True):
    '''Solve cake eating problem on-the-grid'''
    machine_epsilon=np.finfo(float).eps #smallest positive float number
    ngrid=model.ngrid
    #(1 by ngrid) grid for both state and decision space
    grid=np.linspace(machine_epsilon,model.Wbar,ngrid).reshape(1,ngrid)
    
    def bellman(V0):
        '''Bellman operator for on-the-grid solution'''
        #V0 should be vector-row of values on grid
        matW=np.repeat(grid,ngrid,0) #matrix with state space repeated in rows
        matWpr=np.repeat(np.transpose(grid),ngrid,1) #matrix with decision space repeated in columns
        matV0=np.repeat(np.transpose(V0),ngrid,1) #current value function repeated in columns
        c=matW-matWpr #level of cake consumtpion in current period
        c[c==0]=machine_epsilon #add small quantity to avoid log(0)
        mask=c>0 #mask off infeasible choices
        preV=-np.inf*np.ones((ngrid,ngrid)) #prepare space for trial values for all possible choices
        preV[mask]=model.utility(c[mask])+model.beta*matV0[mask] #maximand of the Bellman equation
        V1=np.amax(preV,0,keepdims=True) #maximum in every column
        ic=np.argmax(preV,axis=0) #index of arg-maximum in every column
        cstar=c[ic,range(ngrid)].reshape((1,ngrid))
        return V1, cstar

    if plotting:
        # prepare to make plots 
        fig1, ax1 = plt.subplots(figsize=(12,8))
        plt.grid(b=True, which='both', color='0.65', linestyle='-')
        ax1.set_title('Value function convergence with VFI')
        ax1.set_xlabel('Cake size, W')
        ax1.set_ylabel('Value function')
    
    V0=np.zeros((1,model.ngrid)) #initial value function
    for i in range(model.maxiter):
        V1,c=bellman(V0)
        if plotting and (i%5==0):
            # plot all but the first point for better viewing
            ax1.plot(np.squeeze(grid[0,1:]),np.squeeze(V1[0,1:]),linewidth=2.5)
        if np.max(abs(V1-V0))<model.tol:
            print('Convergence achieved after %d iterations'%i)
            break
        V0=V1
    else:
        print('No convergence in %d iterations: maximum number of iterations achieved'%model.maxiter)
    return grid, V1, c

m = cake()
w,v,c = solve1(m)
plt.show

### How to measure numerical errors?
- In our case there is an analytic solution
- Typically very dense (slow) grid is used in place of true solution
- Can control for max or mean error at the grid points of value and policy functions
- Better yet: simulations of the known analytic cases (Keane's test)

In [None]:
m = cake()
# m.ngrid=100
w,v,c = solve1(m,plotting=False)

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Value function')
ax1.plot(w[0,1:].squeeze(),v[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.avalue(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Policy function')
ax1.plot(w[0,1:].squeeze(),c[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.apolicy(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

## Cake eating with discretized choices

_Control for grid over state space separately from the discretization of the choice variables to increase accuracy_

- As before solve cake eating Bellman equation by VFI

$$V(W) = \max_{0 \le c \le W} \big\{u(c)+\beta V(W-c) \big \}$$

- Discretize state space with $\vec{W}\in\{0,\dots\overline{W}\}$

- Discretize decision space with $\vec{D}\in\{0,\dots\overline{D}\}$, usually $\overline{D}=\overline{W}$

$$V_{i}(\vec{W})=\max_{0 \le \vec{D} \le \vec{W}}\{u(c)+\beta V_{i-1}(\vec{W}-c)\}$$

- Compute value/policy function point-by-point on grid $\vec{W}$
- Find the maximum over the points of grid $\vec{D}$ that satisfy the choice set condition $0 \le \vec{D} \le W$

### Need interpolation
- In each iteration, the value function $V_{i}(\vec{W})$ is computed on a set of grid points
- But for iteration $i+1$ we need to compute $V_{i}(\vec{W}-c)\}=V_{i}(\vec{W}-\vec{D})\}$
- **Interpolation of the value function**

In [None]:
def solve2(model,plotting=True):
    '''Solve cake eating problem by discretization'''
    machine_epsilon=np.finfo(float).eps #smallest positive float number
    ngrid=model.ngrid
    ngrid_decision=model.ngridd
    #(1 by ngrid) grid for state space only
    grid=np.linspace(machine_epsilon,model.Wbar,ngrid).reshape(1,ngrid)
    #(ngrid_decision by ngrid) grids of decisions between 0 and each w on grid
    grid_decision=np.zeros((ngrid_decision,ngrid)) #allocate space
    for i,x in zip(range(ngrid),grid.squeeze()):
        grid_decision[:,i]=np.linspace(machine_epsilon,x,ngrid_decision)
        
    def bellman(V0):
        '''Bellman operator for discretized solution'''
        #V0 should be vector-row of values on grid
        matW=np.repeat(grid,ngrid_decision,0) #matrix with state space repeated in rows
        c=grid_decision #decisions grid in columns
        matWpr=matW-c #size of cake in the next period
        matWpr[matWpr==0]=machine_epsilon #add small quantity to avoid log(0)
        mask=matWpr>0 #mask off infeasible choices
        matV=np.interp(matWpr,np.squeeze(grid),np.squeeze(V0)) #values of next period value at next period case sizes
        preV=-np.inf*np.ones((ngrid_decision,ngrid)) #prepare space for trial values for all possible choices
        preV[mask]=model.utility(c[mask])+model.beta*matV[mask] #maximand of the Bellman equation
        V1=np.amax(preV,0,keepdims=True) #maximum in every column
        ic=np.argmax(preV,axis=0) #index of arg-maximum in every column
        cstar=c[ic,range(ngrid)].reshape((1,ngrid))
        return V1, cstar
    
    if plotting:
        # prepare to make plots 
        fig1, ax1 = plt.subplots(figsize=(12,8))
        plt.grid(b=True, which='both', color='0.65', linestyle='-')
        ax1.set_title('Value function convergence with VFI')
        ax1.set_xlabel('Cake size, W')
        ax1.set_ylabel('Value function')
    
    V0=np.zeros((1,model.ngrid)) #initial value function
    for i in range(model.maxiter):
        V1,c=bellman(V0)
        if plotting and (i%5==0):
            # plot all but the first point for better viewing
            ax1.plot(np.squeeze(grid[0][1:]),np.squeeze(V1[0][1:]),linewidth=2.5)
        if np.max(abs(V1-V0))<model.tol:
            print('Convergence achieved after %d iterations'%i)
            break
        V0=V1
    else:
        print('No convergence in %d iterations: maximum number of iterations achieved'%model.maxiter)
    return grid, V1, c

m = cake()
w,v,c = solve2(m)
plt.show

In [None]:
m = cake()
# m.beta=0.9925
# m.ngrid=500
# m.ngridd=200
w,v,c = solve2(m,plotting=False)

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Value function')
ax1.plot(w[0,1:].squeeze(),v[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.avalue(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Policy function')
ax1.plot(w[0,1:].squeeze(),c[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.apolicy(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

### Why the results so different?
- Solving "on the grid" impies very coarse discretization of consumption for higher levels of wealth..
- Errors accumulate in the backwards induction and VFI
- Discrepancy depends on the 

## Continuous choice

**Can we avoid discretization of consumption altogether? Yes!**

- Discretize only the state space $W$

$$
V_{i}(\vec{W})=\max_{0 \le c \le \vec{W}}\{u(c)+\beta V_{i-1}(\vec{W}-c)\}
$$

- Treat choices as continuous $\rightarrow$ optimization problem for each point of $\vec{W}$
- Again, compute value and policy function sequentially point-by-point
- Need to compute the value function _between grid points_ $\rightarrow$ 
again, interpolation and function approximation




In [None]:
from scipy import optimize
def solve3(model,plotting=True):
    '''Solve cake eating problem with truely continuous consumption'''
    machine_epsilon=np.finfo(float).eps #smallest positive float number
    ngrid=model.ngrid
    #(1 by ngrid) grid for state space only
    grid=np.linspace(machine_epsilon,model.Wbar,ngrid).reshape(1,ngrid)
        
    def bellman(V0):
        '''Bellman operator for continuous consumption and Newton'''
        cstar=np.zeros((1,ngrid)) #allocate space for optimal choices
        V1=np.zeros((1,ngrid)) #allocate space for values
        # first point without optimization
        cstar[0,0]=machine_epsilon
        V1[0,0]=model.utility(machine_epsilon)+model.beta*V0[0,0]
        x0=np.average(grid[0,0:2]) #initial starting value
        # look for all points except first
        for i in np.arange(1,ngrid):
            W = grid[0,i]
            maximand = lambda c: -(model.utility(c)+model.beta*np.interp(W-c,np.squeeze(grid),np.squeeze(V0)))
            cnstr1 = lambda c: c - machine_epsilon 
            cnstr2 = lambda c: W - c
            cnstrs=[{'type':'ineq','fun':cnstr1},
                    {'type':'ineq','fun':cnstr2}]
            optim=optimize.minimize(maximand,x0,method='COBYLA',constraints=cnstrs)
            if not optim.success:
                raise RuntimeError('Failed to optimize for W=%1.4f: %s'%(W,optim.message))
            else:
                cstar[0,i]=optim.x
                V1[0,i]=-optim.fun
                x0=optim.x #use previous optimum for starting value
        return V1, cstar
    
    if plotting:
        # prepare to make plots 
        fig1, ax1 = plt.subplots(figsize=(12,8))
        plt.grid(b=True, which='both', color='0.65', linestyle='-')
        ax1.set_title('Value function convergence with VFI')
        ax1.set_xlabel('Cake size, W')
        ax1.set_ylabel('Value function')
    
    V0=np.zeros((1,model.ngrid)) #initial value function
    for i in range(model.maxiter):
        V1,c=bellman(V0)
        if plotting and (i%5==0):
            # plot all but the first point for better viewing
            ax1.plot(np.squeeze(grid[0][1:]),np.squeeze(V1[0][1:]),linewidth=2.5)
        if np.max(abs(V1-V0))<model.tol:
            print('Convergence achieved after %d iterations'%i)
            break
        V0=V1
    else:
        print('No convergence in %d iterations: maximum number of iterations achieved'%model.maxiter)
    return grid, V1, c

m = cake()
w,v,c = solve3(m)
plt.show

In [None]:
m = cake()
# m.ngrid=200
w,v,c = solve3(m,plotting=False)

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Value function')
ax1.plot(w[0,1:].squeeze(),v[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.avalue(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Policy function')
ax1.plot(w[0,1:].squeeze(),c[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.apolicy(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

### Evaluation
Directly atacking continuous choice problem is hard:
- VFI becomes very slow
- Not robust (will be even worse in more complicated problems)

### Endogenous grid point method (EGM)

**What if no root-findnig is necessary during VFI?**

Model-specific solution method that is VERY fast and accurate

Conditions:
1. Consumption-savings (stock and flow) model structure
2. Invertible marginal utility

### Euler equation
First order condition (FOC) of the Bellman equation w.r.t. $c$
$$
0 = u'(c_{t})+\beta \frac{\partial V(W_{t+1})}{\partial c_{t}} =
u'(c_{t})+\beta V'(W_{t+1})
\underset{-1}{\underbrace{\frac{\partial W_{t+1}}{\partial c_{t}}}},
$$

$$
\Rightarrow u'(c^\star_{t})=\beta V'(W_{t+1})
$$

### Euler equation
From the Envelope theorem we have
$$
V'(W_{t}) 
=\frac{\partial}{\partial W_t}\big[ u(c_t)+\beta V(W_{t+1}) \big] \Big|_{c_t^{\star}}
=\beta V'(W_{t+1}) \underset{1}{\underbrace{\frac{\partial W_{t+1}}{\partial W_{t}}}}
$$

### Euler equation
Shift one period over, combine and plug back into FOC
$$
u'(c^\star_t) = V'(W_t) \Rightarrow u'(c^\star_{t+1}) = V'(W_{t+1}) \Rightarrow
$$

$$
u'(c^\star_{t})=\beta u'(c^\star_{t+1})
$$

### Idea of EGM 
&#128214; Carroll 2006, _Economics letters_
"The method of endogenous gridpoints for solving dynamic stochastic
optimization problems"

- Instead of searching for optimal decision in each point of the state
space (traditional approaches)
- Look for the state variable (level of assets) where arbitrary chosen decision (consumption $\rightarrow$ savings)  would be optimal


### EGM algorithm

1. Start with $c_T^{\star}=W_T$. In each period $t=T,T-1,..,1$  
(or time iteration $t$):
2. Take a guess $A$ = current period savings ($A=W_{t}-c_{t}$)  
(these guesses come from fixed or adaptive grid)
3. Intertemporal budget constraint: $A \rightarrow W_{t+1}$
$$W_{t+1}=W_{t}-c_{t}=A$$
4. Policy function at period $t+1$: $W_{t+1} \rightarrow c_{t+1}$
$$c_{t+1}=c_{t+1}^{\star}\big(W_{t+1}\big)$$

5. Inverted Euler equation: $c_{t+1} \rightarrow c_{t}$
$$c_{t}=\big(u^{\prime}\big)^{-1}\big( \beta u'(c_{t+1}) \big)$$
6. Intratemporal budget constraint: $c_{t}+A=W_{t} \rightarrow c_{t}\left(W_{t}\right)$
$W_{t}=c_{t}+A \rightarrow c_{t}^{\star}\left(W_{t}\right)$

### Evaluation
- No root finding, simple direct computation of the optiomal policy
- Value function is not needed, although can be computed along side
- Resulting funcitons are defined over irregular endogenous grid $\rightarrow$ potential issues in multiple dimensions
- Interpolation of __policy function__ which has much lower curvature!
- Euler errors are zero at the grid points

### Applicability
1. Marginal utility must be invertible
2. There must be the **post-decision state variable** $A$ which serves as a _sufficient statistic_ for period $t$ state $W_t$ and decision $c_t$

\begin{eqnarray}
V(x) &=&
\max_{\sigma \in \Sigma} \Big\{ u\big(x,d\big) 
+ \beta \mathbb{E} \big[ V(x')  \big| x, d \big] \Big\}
\\ &=&
\max_{\sigma \in \Sigma} \Big\{ u\big(x,d\big) 
+ \beta \mathbb{E} \big[ V(x')  \big| A(x,d) \big] \Big\}
\end{eqnarray}

3. Current period state must be recoverable from decision and post-decision variables

In [None]:
from scipy.interpolate import interp1d
def solve4(model,plotting=True):
    '''Solve cake eating problem by EGM'''
    machine_epsilon=np.finfo(float).eps #smallest positive float number
    ngrid=model.ngrid
    #(1 by ngrid) grid for post decision variable A
    grid=np.linspace(machine_epsilon,model.Wbar,ngrid).reshape(1,ngrid)
        
    def egm_step(W0,c0,v0):
        '''EGM step: input/output grid, policy, value for next period/current period'''
        Wnxt = grid #size of cake next period
        cnxt = np.interp(Wnxt,np.squeeze(W0),np.squeeze(c0)) #next period optimal consumption
        rhs = model.beta*model.marginal_utility(cnxt)
        c = model.inverse_marginal_utility(rhs)
        W = c + grid
        v = model.utility(c) + model.beta*np.interp(Wnxt,np.squeeze(W0),np.squeeze(v0))
        return W,c,v
    
    if plotting:
        # prepare to make plots
        fig1, ax1 = plt.subplots(figsize=(12,8))
        plt.grid(b=True, which='both', color='0.65', linestyle='-')
        ax1.set_title('Value function convergence with VFI')
        ax1.set_xlabel('Cake size, W')
        ax1.set_ylabel('Value function')
    
    #initial grid, consumption and value
    W0=grid
    c0=grid
    v0=model.utility(grid)
    #loop over EGM steps
    for i in range(model.maxiter):
        W1,c1,v1=egm_step(W0,c0,v0)
        if plotting and (i%5==0):
            # plot all but the first point for better viewing
            ax1.plot(np.squeeze(W1[0][1:]),np.squeeze(v1[0][1:]),linewidth=2.5)
            plt.xlim(right=model.Wbar) 
        # note complex convergence criterion!
        dif=np.interp(grid,np.squeeze(W1),np.squeeze(c1)) - np.interp(grid,np.squeeze(W0),np.squeeze(c0))
        if np.max(abs(dif))<model.tol:
            print('Convergence achieved after %d iterations'%i)
            break
        W0=W1
        c0=c1
        v0=v1
    else:
        print('No convergence in %d iterations: maximum number of iterations achieved'%model.maxiter)
    return W1, v1, c1

m = cake()
# m.ngrid=50
# m.maxiter=5
w,v,c = solve4(m)
plt.show

In [None]:
    def egm_step(W0,c0,v0):
        '''EGM step: input/output grid, policy, value for next period/current period'''
        Wnxt = grid #size of cake next period
        cnxt = np.interp(Wnxt,np.squeeze(W0),np.squeeze(c0)) #next period optimal consumption
        rhs = model.beta*model.marginal_utility(cnxt)
        c = model.inverse_marginal_utility(rhs)
        W = c + grid
        # linear interpolation
        v = model.utility(c) + model.beta*np.interp(Wnxt,np.squeeze(W0),np.squeeze(v0))
        # cubic splines
#         interp=interp1d(np.squeeze(W0),np.squeeze(v0),kind='cubic',fill_value='extrapolate')
#         v = model.utility(c) + model.beta*interp(Wnxt)
        # log-transform + linear
#         tr = lambda x: np.log(x)
#         itr = lambda x: np.exp(x)
#         v = model.utility(c) + model.beta*tr(np.interp(Wnxt,np.squeeze(W0),itr(np.squeeze(v0))))
        return W,c,v

In [None]:
m = cake()
# m.ngrid=1000
w,v,c = solve4(m,plotting=False)

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Value function')
ax1.plot(w[0,1:].squeeze(),v[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.avalue(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

fig1, ax1 = plt.subplots(figsize=(12,8))
plt.grid(b=True, which='both', color='0.65', linestyle='-')
ax1.set_title('Solution')
ax1.set_xlabel('Cake size, W')
ax1.set_ylabel('Policy function')
ax1.plot(w[0,1:].squeeze(),c[0,1:].squeeze(),linewidth=2.5,label='Numerical')
ax1.plot(w[0,1:].squeeze(),m.apolicy(w[0,1:].squeeze()),linewidth=2.5,label='Analytical')
plt.legend(loc=4)
plt.show

## Which DP solution algorithm will you choose for your project?

## Further learning resources
* QuantEcon DP section https://lectures.quantecon.org/py/index_dynamic_programming.html
