In this notebook we will generalize [these notes](http://www.econ2.jhu.edu/people/ccarroll/public/lecturenotes/consumption/Envelope/) about the Envelope Theorem to abstract Bellman stages.

Recall that an abstact Bellman stage is defined as a tuple $S = (\vec{X}, P_\vec{K}, \vec{A}, \Gamma, F, \vec{Y}, T, B)$.

## Defining the value function

The agent will choose the action $\vec{a} \in \vec{A}$ that optimizes their beginning of stage value $v_x$ given an end-of-stage value function $v_y$.

Given the output value function $v_y : \vec{Y} \rightarrow \mathbb{R}$, the action-value function $q$ is defined as:

$$q(\vec{x}, \vec{k}, \vec{a}) = F(\vec{x}, \vec{k}, \vec{a}) + B(\vec{x},\vec{k},\vec{a}) v_y(T(\vec{x}, \vec{k}, \vec{a}))$$

$$v_x(\vec{x}) = \mathbb{E}_{\vec{k} \in \vec{K}}[ \text{max}_{\vec{a}}  q(\vec{x}, \vec{k}, \vec{a}, \vec{k}))]$$

(This corresponds to **Equation 3** in the notes.)

The optimal policy $\pi: \vec{X} \times \vec{K} \rightarrow \vec{A}$ is:

$$\pi^*(\vec{x}, \vec{k}) = \underset{\vec{a} \in \Gamma(\vec{x}, \vec{k})}{\mathrm{argmax}} q(\vec{x}, \vec{k}, \vec{a})$$

With the optimal policy $\pi^*$ in hand, it is possible to compute the input value function $v_x: \vec{X} \rightarrow \mathbb{R}$. 

$$v_x(\vec{x}) = \mathbb{E}_{\vec{k} \in \vec{K}}[q(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))]$$

## First order condition

Given: **TODO: Use partials throughout here, or super- derivative notation.**

 - $v'_y : Y \rightarrow \mathbb{R}$ is the marginal value of output states with respect to actions.
 - A marginal reward function $F^a = \frac{\partial F}{\partial \vec{a}}$
 - A marginal transition function $T^a = \frac{\partial T}{\partial \vec{a}}$
 - A discount factor $B$ such that $B^a = \frac{\partial B}{\partial a} = 0$, i.e. because it is constant.
 
Assuming the $q$ function is concave, then the optimal $\pi^*(\vec{x}, \vec{k}) \in \vec{A}$ will satisfy the first order condition (FOC). This condition is that the marginal action value function $q^a =  \frac{\partial q}{\partial \vec{a}}$ is 0:

$$0 = \frac{\partial q}{\partial \vec{a}}(\vec{x}, \vec{k}, \vec{a}) = F^a(\vec{x}, \vec{k}, \vec{a}) + B(\vec{x},\vec{k},\vec{a}) v'_y(T(\vec{x}, \vec{k}, \vec{a}))T^a(\vec{x}, \vec{k}, \vec{a})$$

(This condition is more complex if $B$ depends on the actions because of the Product Rule of differentiation.)

Thus:

$$F^a(\vec{x}, \vec{k}, \vec{a}) = - B(\vec{x},\vec{k},\vec{a}) v'_y(T(\vec{x}, \vec{k}, \vec{a}))T^a(\vec{x}, \vec{k}, \vec{a})$$

(This is, roughly, **Equation 4** in the notes.)

Under the right conditions, this can be solved with a numerical solver to get the optimal policy $\pi^*$.

However a puzzle remains if this method is to be used to solve a problem that has been decomposed into a series of stages recursively: Given $v'_y$ and $\pi^*$, how does one recover the marginal beginning-of-stage value function $v'_x$? This can be derived using the Envelope condition.

## Envelope condition

Consider again the state-action value function $q$. We now want to know its derivative with respect to the beginning-of-stage states.

$$\frac{\partial q}{\partial \vec{x}} = \frac{\partial F}{\partial \vec{x}}(\vec{x}, \vec{k}, \vec{a}) + \frac{\partial B}{\partial \vec{x}}(\vec{x},\vec{k},\vec{a}) v_y(T(\vec{x}, \vec{k}, \vec{a})) + B(\vec{x},\vec{k},\vec{a}) \frac{\partial v_y}{\partial \vec{x}}(T(\vec{x}, \vec{k}, \vec{a}))\frac{\partial T}{\partial \vec{x}}(\vec{x}, \vec{k}, \vec{a})$$

Given:

 - A reward function such that $\frac{\partial F}{\partial \vec{x}} = 0$
 - A discount factor such that $\frac{\partial B}{\partial x} = 0$.

This simplifies to:

$$\frac{\partial q}{\partial \vec{x}} = B(\vec{x},\vec{k},\vec{a}) v'_y(T(\vec{x}, \vec{k}, \vec{a}))\frac{\partial T}{\partial \vec{x}}(\vec{x}, \vec{k}, \vec{a})$$

(This corresponds to **Equation 8** in the notes).

Note that:

$$v_x(\vec{x}) = \mathbb{E}[q(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))]$$

(This corresponds to **Equation 9** in the notes).

By the Chain Rule:

$$v'_x(\vec{x}) \equiv \frac{\partial v_x}{\partial \vec{x}} = \mathbb{E}\left[\frac{\partial q}{\partial \vec{x}}(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})) + \frac{\partial \pi^*}{\partial \vec{a}}(\vec{x}, \vec{k})\frac{\partial q}{\partial \vec{a}}(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))\right]$$

(This corresponds to **Equation 10** in the notes).

Note that this depends on the term $\frac{\partial \pi^*}{\partial \vec{a}}(\vec{x}, \vec{k})$, which we don't have! Luckily, we don't need it, because of the Envelope theorem.



We know that, from the FOC:

$$0 = \frac{\partial q}{\partial \vec{a}}(\vec{x}, \vec{k}, \pi^*(\vec{x}))$$

So the marginal starting value function simplifies to:

$$\frac{\partial v_x}{\partial \vec{x}} = \mathbb{E}\left[\frac{\partial q}{\partial \vec{x}}(\vec{x}, \pi^*(\vec{x}, \vec{k}))\right] = \mathbb{E}[B(\vec{x},\vec{k},\pi^*(\vec{x}, \vec{k})) v'_y(T(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})))T^x(\vec{x}, \vec{k},\pi^*(\vec{x}, \vec{k}))]$$

(Corresponding to **Equation 11**.)



Note the similarity to the balanced FOC:

$$F^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})) = - B(\vec{x},\vec{k},\pi^*(\vec{x}, \vec{k})) v'_y(T(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})))T^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))$$

Thus:

$$B(\vec{x},\vec{k},\pi^*(\vec{x}, \vec{k})) v'_y(T(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})))T^x(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})) = - \frac{T^x(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))}{T^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))} F^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})) $$

We can then plug this expression into the formula for $v'_x$:

$$v'_x(\vec{x}) = \mathbb{E}\left[- \frac{T^x(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))}{T^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k}))} F^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})) \right]$$

This looks complicated! But consider the simple, common case where $y = x - a$. Under this condition:

$$v'_x(\vec{x}) = \mathbb{E}\left[F^a(\vec{x}, \vec{k}, \pi^*(\vec{x}, \vec{k})) \right]$$

Which is what economists working with intertemporal consumption functions are used to seeing!

## References

Carroll, C. 20**. The Envelope Theorem and the Euler Equation. http://www.econ2.jhu.edu/people/ccarroll/public/lecturenotes/consumption/Envelope/

Parker, J.A., 2007. Euler equations. Prepared for the New Palgrave Dictionary of Economics, pp.1-6.