# Bellman Abstract Representation Kit - BARK Design

_Notebook by Sebastian Benthall_

In [1]:
from dataclasses import dataclass, field

import itertools
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import minimize
from typing import Any, Callable, Mapping, Sequence
import xarray as xr

from HARK import distribution
from HARK.rewards import CRRAutility, CRRAutilityP, CRRAutility_inv, CRRAutilityP_inv

In [2]:
from HARK.stage import Stage, backwards_induction, simulate_stage

import cons_stages

In [3]:
## Doing this because of the CRRAutility warnings

import warnings
warnings.filterwarnings('ignore')

**TODO: Use language of 'blocks'.**

**TODO: The shifted Bellman equation here.**

**TODO: Other notes from the PDF, which I lost earlier...**

**TODO: Work in 'steps' and $\underline{v}$**

This notebook demonstrates HARK's ability to represent and compose Bellman stages.
This is possible because all Bellman stages have a general form.

In each Bellman stage $S = (\vec{X}, P_\vec{K}, \vec{A}, \Gamma, F, \vec{Y}, T, B)$, the agent:
 - begins in some input states $\vec{x} \in \vec{X}$
 - experiences some exogeneous shocks $\vec{k} \in \vec{K}$ according to distribution $P_\vec{K}$
 - can choose some actions $\vec{a} \in \vec{A}$
 - subject to constraints $\Gamma: \vec{X} \times \vec{K} \rightarrow \mathcal{P}(\vec{A})$
     - For scalar actions, these may be expressed as upper and lower bounds, such that $\Gamma_{lb} \leq a \leq \Gamma_{ub}$:
         - $\Gamma_{ub}: \vec{X} \times \vec{K} \rightarrow \mathbb{R}$
         - $\Gamma_{lb}: \vec{X} \times \vec{K} \rightarrow \mathbb{R}$
         - such that $\Gamma(\vec{x}, \vec{k}) = [\Gamma_{lb}(\vec{x}, \vec{k}), \Gamma_{ub}(\vec{x}, \vec{k})]$
 - experience a reward $F: \vec{X} \times \vec{K} \times \vec{A} \rightarrow \mathbb{R}$
 - together, these determine some output states $\vec{y} \in \vec{Y}$ via...
 - a **deterministic** transition function $T: \vec{X} \times \vec{K} \times \vec{A} \rightarrow \vec{Y}$
   - _This is deterministic because shocks have been isolated to the beginning of the stage._
 - The agent has a discount factor B for future utility.
     - This is often a constant, such as $\beta$.
     - but it can also be a function $B: \vec{X} \times \vec{K} \times \vec{A} \rightarrow \mathbb{R}$
     
**TODO**: Notation.
 - Allow an $h$ operation which is more general than scalar discount factors.
 - Rename shocks to $Z$, with evidence from Stachurski and Sargent.
 - rename transition equation to $g$? unless something better from sargent.
     
### Grids

In practice, we will discretized versions of the stage. We will use bold-faced, non-italic $\mathbf{X}$, $\mathbf{K}$, $\mathbf{A}$, and $\mathbf{Y}$ for grids over the input, shock, action, and output spaces.

Note that the shock space $\mathbf{K}$ will normally be generated by discretizing the continuous probability distribution $P_\vec{K}$. We will refer to the discretized probability distribution over $\mathbf{K}$ that preserves point mass values as $\mathbf{P_\vec{K}}$

**TODO: Use grave mark for interpolated functions?**

## Solving one stage 

### Policy Optimization

Given a stage $S$, we often want to know the optimal policy or decision rule $\pi^*(\vec{x}, \vec{k})= \vec{a}^*_{xk}$ that yields the best choice of action given input states and shock realizations.

There are several different techniques available for policy optimization. In general, the more one is able to provide analytic information about the functions in the stage definition, the better a policy optimization algorithm one can employ.

| Method                 | Requirements           | Discretization    | Value Input | Computation  | Products*  |
| ---------------------  |:--------------         | :-------------    | :---------- | :----------  | :--------  |
| Value Optimization     |                        | $$\mathbf{X, K}$$ | $$v_y$$     | Optimization | $\pi^*, q$ |
| First Order Condition  | $F', T'$, $B' = 0$     | $$\mathbf{X, K}$$ | $$v'_y$$    | Rootfinding  | $\pi^*$    |
| Endogenous Gridpoints  | $F'^{-1}, T'$, $B' = 0$| $$\mathbf{Y} $$   | $$v'_y$$    | None         | $\pi_y^*$  |     

#### Value optimization

Given the output value function $v_y : \vec{Y} \rightarrow \mathbb{R}$, the action-value function $q$ is defined as:

$$q(\vec{x}, \vec{k}, \vec{a}) = F(\vec{x}, \vec{k}, \vec{a}) + B(\vec{x},\vec{k},\vec{a}) v_y(T(\vec{x}, \vec{k}, \vec{a}))$$

This can be computed for all points on the grids $\mathbf{X, K}$.

The optimal policy $\pi: \vec{X} \times \vec{K} \rightarrow \vec{A}$ is:

$$\pi^*(\vec{x}, \vec{k}) = \underset{\vec{a} \in \Gamma(\vec{x}, \vec{k})}{\mathrm{argmax}} q(\vec{x}, \vec{k}, \vec{a})$$

(This corresponds to Equation 3 in the notes).

#### First Order Condition

Mathematically, the optimization step above truly depends on the marginal value function $v'_y$, and not on $v_y$ (i.e., the solution is indifferent to an additive constant on $v_y$). Optimizing an interpolated value function can can introduce errors. Sometimes, one can get improved results by starting from an interpolatd marginal value function $v'_y$.
 
Given:

 - $v'_y : Y \rightarrow \mathbb{R}$ is the marginal value of output states.
 - A marginal reward function $F' = \frac{\partial F}{\partial \vec{a}}$
 - A marginal transition function $T' = \frac{\partial T}{\partial \vec{a}}$
 - A discount factor $B$ such that $B' = \frac{\partial B}{\partial a} = 0$, i.e. because it is constant.
 
Assuming the $q$ function is concave, then the optimal $\pi^*(\vec{x}, \vec{k}) \in \vec{A}$ will satisfy the first order condition (FOC). This condition is that the marginal action value function $q' =  \frac{\partial q}{\partial \vec{a}}$ is 0:

$$0 = q'(\vec{x}, \vec{k}, \vec{a})$$

$$0 = F'(\vec{x}, \vec{k}, \vec{a}) + B(\vec{x},\vec{k},\vec{a}) v_y'(T(\vec{x}, \vec{k}, \vec{a}))T'(\vec{x}, \vec{k}, \vec{a})$$

(This condition is more complex if $B$ depends on the actions because of the Product Rule of differentiation.)

This can be computed for all points on the grids $\mathbf{X, K}$.

If $q'(\vec{x}, \vec{k}, \Gamma_{lb}(\vec{x}, \vec{k}))$ and $q'(\vec{x}, \vec{k}, \Gamma_{ub}(\vec{x}, \vec{k}))$ have the same sign, then the constraints bind, and
- $\pi^*(\vec{x}, \vec{k}) = \Gamma_{lb}(\vec{x}, \vec{k})$ (negative sign)
- or $\pi^*(\vec{x}, \vec{k}) = \Gamma_{ub}(\vec{x}, \vec{k})$ (positive sign).

**TODO: Nice LaTeX braces and cases for this?**

Computationally, this involves replacing the numerical optimization step with a numerical root-finding step.

#### Endogenous Gridpoints

Sometimes, no computation is needed at all to compute the optimal policy! With a catch.

Given:
 - $v'_y : Y \rightarrow \mathbb{R}$ is the marginal value of output states.
 - A constant marginal transition function $T' = \frac{\partial T}{\partial \vec{a}}$. Note: Some researchers are discovering how to lift this condition!
 - An inverse marginal reward function $F'^{-1} : \mathbb{R} \rightarrow \vec{A}$.
 - A constant discount factor $B$ such that $B' = \frac{\partial B}{\partial a} = 0$, i.e. because it is constant.
 - A grid $\mathbf{Y}$ over output states.

Then we can derive from the FOC that:

$$0 = F'(\vec{x}, \vec{k}, \vec{a}) + B T'(\vec{x}, \vec{k}, \vec{a}) v_y'(T(\vec{x}, \vec{k}, \vec{a}))$$

$$F'(\vec{x},\vec{k},\pi^*_y(\vec{y})) = - B T' v_y'(\vec{y}) $$

$$\pi^*_y(\vec{y})  =  F'^{-1}(- B T' v_y'(\vec{y}))$$

Unlike the other policy optimizing methods, this is computed over a grid over outputs $\mathbf{Y}$ and requires no numerical searching over the action space. This takes $O(|Y|)$ time, but with a very small constant.

Note that this produces the function $\pi^*_y : \vec{Y} \rightarrow \vec{A}$, which chooses the optimal action for a given _output_. This is part of the Endogenous Gridpoints Method (Carroll, 2006), so called because it implies an endogenous grid over actions $\mathbf{A}^* = \pi^*_y(\mathbf{Y})$.

Under special conditions, this function $\pi^*_y$ can be used to efficiently compute the input value function.

**TODO: What happens if $\vec{y}$ is multidimensional? Is it e.g. $\frac{\partial v_y'(\vec{y})}{\partial y_1 \partial y_2}$?**

### Value Backup Methods

When solving a problem with backwards induction, we will want to derive input value function $v_x$ from the output value function $v_y$. This requires solving for the optimal policy, discussed above. Once this is in hand, several methods are available.

| Method                  | Requirements  | Discretization      | Policy      | Value    | Products*  |
| ---------------------   |:------------- | :-------------      | :---------  | :------  | :--------  |
| Value Update            |               | $$\mathbf{X, P_K}$$ | $$\pi^*$$   | $$v_y$$  | $v_x, q$   |
| Analytic Marginal Value | $T'_x, T'_a, $| $$\mathbf{X, P_K}$$ | $$\pi^*$$   | $$v'_y$$ | $v'_x$     |
| Endogenous Gridpoints?  | $T^{-1}_a$    | $$\mathbf{Y} $$     | $$\pi^*_y$$ | $$v'_y$$ | $v'_x$     |  


#### Basic value backup

With the optimal policy $\pi^*$ in hand, it is possible to compute the input value function $v_x: \vec{X} \rightarrow \mathbb{R}$. 

$$v_x(\vec{x}) = \mathbb{E}_{\vec{k} \in \vec{K}}[q(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))]$$

This can be computed over discretized shock distribution $\mathbf{P_K}$, which includes probability mass values for each point. This step takes time relative to the size of the discretization, $O(|\mathbf{X}||\mathbf{K}|)$.

#### Marginal value backup

**TODO**

Getting $v'_x$ from $v'_y$.

#### Analytic marginal value from marginal reward

See notebook "The Envelope Theorem for Abstract Bellman Stages" for the derivation.

Using the Envelope Theorem, it is possible to derive an expression for the marginal function using the partial derivatives of the transition function and marginal reward function.

Under the following conditions:

- The conditions of the FOC policy solution, described above.
- $B$ is a constant
 - A reward function such that $\frac{\partial F}{\partial \vec{x}} = 0$
 - A discount factor such that $\frac{\partial B}{\partial x} = 0$.

Then:

$$v'_x(\vec{x}) = \mathbb{E}\left[- \frac{T^x(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))}{T^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))} F^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})) \right]$$

This method works in generality if the analytic expressions for the partial derivatives are provided.

#### Endogenous Gridpoint Method

Given:
 - an invertible transition function so that $T^{-1}_a: \vec{Y} \times \vec{A} \rightarrow \vec{X}  $ is well defined
   - No shocks allowed!
 - The optimal policy with respect to the output $\pi^*_y$. (See EG Policy optimization above).
 
Then we can create an endogenous grid $\mathbf{X} = T^{-1}_a(\mathbf{Y}, \pi^*_y(\mathbf{Y}))$. This this is $O(|Y|)$ with a low constant.

## Special Points, and Interpolation

Interpolation brings with it a host of challenges when the underlying functions are very curved.

For example, with a CRRA utility function the utility of consuming $0$ resources is $-\infty$. This means both that linear extrapolation from low-but-positive values will be too high, and that when an agent has no choice but to consume 0 resources they will be impossibly miserable.

For this reason we offer a few tricks:

### Transformed value function interpolation

Rather than requiring users to use a linearly interpolated value function $v_x \sim i(\vec{v_x})$, we allow the user to define a transform function $f$ and its inverse $f^{-1}$ such that $v_x \sim f^{-1}(i(f(\vec{v_x})))$.

Commonly, the transformation function $f$ is the inverse of the CRRA utility function, which is $e^x$ when $\rho = 1$. 

Likewise, for the marginal value function $v'_x$, a transformation can be provided so that the interpolation is on a more linear function. In this case, with CRRA utility, the transformation is often $g(u) = u^{-\frac{1}{\rho}} \sim u^{-1}$ when $\rho = 1$.

### Solution Points

When it is easy to determine the optimal policy $\pi^*$ or value function $v_x$ for a particular state $x^*$ analytically, but difficult to solve it using optimization, it is useful to input that value directly in the stage definition.