# Model training with all-in-one expectation method

- companion notebook to the paper "Will Artificial Intelligence Replace Computational Economists any Time Soon?"

- demonstrates how to solve a consumption savings model using one of the model exposed in the paper

- we use a neural network to approximate agents decision and use euler equation to formulate a single objective, which is minimized by adjusting the weights in the n.n. (aka) training.

- to formulate this objective, we use one single unified expectation operator, which allows for efficient, parallel calculations using a deep learning framework.

## Preliminaries

In [None]:
# we load several libraries
# missing library x can be installed using `pip install x`
import numpy as np
from math import sqrt
from matplotlib import pyplot as plt
from tqdm import tqdm_notebook as tqdm         # tqdm is a nice library to visualize ongoing loops
import datetime
class Vector: pass
from typing import Tuple

[Tensorflow](https://www.tensorflow.org/) is a deeplearning library. It can be used to train neural networks, but also to produce computational graphs (aka programs), which can be differentiated automatically, optimized and run on very scalable architecture.
Version 2.0 introduces a new way to build graph, which allows for more intuitive graph definition, essentially by writing numpy-like code. We can install it using:
`pip install tensorflow==2.0.0-rc1`

In [None]:
import tensorflow as tf

## The model

We consider the following version of a consumption-saving problem.

There are four different stochastic processes, interest rate ($r_t$), discount factor shock ($\delta_t$), transitory component of income ($q_t$) and permanent component of income ($p_t$). Total income is $y_t=p_t q_t$.
All processes $r_t$, $\delta_t$, $q_t$ and $p_t$ follow an AR1 process whose specification is given in the code.

Agent consumes $c_t$, a fraction $\zeta_t\in[0,1]$ of disposable income $w_t$ whose evolution is given by:

$$w_t = y_t + (w_{t-1}-c_{t-1}) r_t$$

Given a discount parameter $\beta\in[0,1[$ the objective is to minimize:

$$E_0 \sum_{t\geq0} \delta_t \beta^t U(c_t)$$

where $U(x)=\frac{x^{1-\gamma}}{1-\gamma}$ given initial state. The corresponding Euler equation is:

$$ \beta E_t \left[ \frac{\delta_{t+1}}{\delta_t}  \frac{U(c_{t+1})}{U(c_t)} r_{t+1} \right] \leq 1 \perp \zeta_t \leq 1$$

which is by definition of the complementerity sign ($\perp$) equivalent to:

$$\max\left(1-\beta E_t \left[ \frac{\delta_{t+1}}{\delta_t}  \frac{U(c_{t+1})}{U(c_t)} r_{t+1} \right], 1- \zeta_t \right)= 0$$


The presence of expected terms within a nonlinear operator (the $max$) is a problem for our algorithm so that we reformulate the problem as finding ($h_t$ and $\zeta_t$) which satisfy the "optimality" conditions:

$$\max\left(1-h_t, 1- \zeta_t \right)= 0$$
$$\beta E_t \left[ \frac{\delta_{t+1}}{\delta_t}  \frac{U(c_{t+1})}{U(c_t)} r_{t+1} \right] - h_t = 0$$


Thanks to this transformation the problem takes the form of one single expectation taken over a vector-valued non-linear function:


$$E_t \left[ \left( \begin{matrix}
\max\left(1-h_t, 1- \zeta_t \right) \\
\beta \frac{\delta_{t+1}}{\delta_t}  \frac{U(c_{t+1})}{U(c_t)} r_{t+1} - h_t
\end{matrix}\right)\right] = \left( \begin{matrix}0\\0\end{matrix} \right)$$



In [None]:
# model calibration
β = 0.9
γ = 2.0
σ = 0.1
ρ = 0.9
σ_r = 0.001
ρ_r = 0.2
σ_p = 0.001
ρ_p = 0.9
σ_q = 0.001
ρ_q = 0.9
σ_δ = 0.001
ρ_δ = 0.2
rbar = 1.04
eps = 0.0001   # so that utility is never negative

mute = 0.0     # if 1 then collapses to basic model
# xibar = 0.95   # steady state value of ξ

In [None]:
# ergodi
σ_e_r = σ_r/(1-ρ_r**2)**0.5
σ_e_p = σ_p/(1-ρ_p**2)**0.5
σ_e_q = σ_q/(1-ρ_q**2)**0.5
σ_e_δ = σ_δ/(1-ρ_δ**2)**0.5

wmin = 0.1
wmax = 4.0

## The Decision Rule

Since the model is time-homogenous, we look for a decision rule $\left( \begin{matrix} \zeta_t\\ h_t \end{matrix} \right) = \varphi(s_t)$  where $s_t=r_t, \delta_t, q_t, p_t, w_t$ is the 5-dimensional state-space and $\varphi$ a function to be determined. We approximate the actual $\varphi$ by looking in a family of functions $\varphi(...;\theta)$ parameterized by $\theta$.


In our application, this family is determined by a topology of a neural network which can be easily built with keras.

In [None]:
# construction of d.r.

lrelu = tf.keras.layers.LeakyReLU(alpha=0.1)
layers = [
    tf.keras.layers.Dense(32, activation='relu', input_dim=5, bias_initializer='he_uniform'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(2, activation=tf.keras.activations.linear)
]

perceptron = tf.keras.Sequential(layers)

In [None]:
tf.keras.utils.plot_model(perceptron, to_file='model.png', show_shapes=True)

In [None]:
# the input of the neural network is a $N.5$ matrix, where N is the number of points to be evaluated, aka the size of the mini-batch

def dr(r: Vector, δ: Vector, q: Vector, p: Vector, w: Vector)-> Tuple[Vector, Vector]:

    # we normalize input so as it will typically be comprised between -1 and 1
    r = r/σ_e_r/2
    δ = δ/σ_e_δ/2
    q = q/σ_e_q/2
    p = p/σ_e_p/2
    w = (w-wmin)/(wmax-wmin)*2.0-1.0

    # we prepare input to the perceptron
    s = tf.concat([_e[:,None] for _e in [r,δ,q,p,w]], axis=1) # equivalent to np.column_stack

    x = perceptron(s) # an N.2 matrix

    # consumption share is always in [0,1]
    ζ = tf.sigmoid( x[:,0] )
    # expectation of marginal consumption is always positive
    h = tf.exp( x[:,1] )
    return (ζ, h)



In [None]:
# let's plot the initial guess (against w). Not that the coefficients of the perceptron are initialized with random values,
# so that each run will provide a different plot

In [None]:
wvec = np.linspace(wmin, wmax, 100)
svec = [wvec*0]*4 + wvec
# r,p,q,δ are zero-mean
ζvec, hvec = dr(wvec*0, wvec*0, wvec*0, wvec*0, wvec)

In [None]:
plt.plot(wvec, wvec, linestyle='--', color='black')
plt.plot(wvec, wvec*ζvec)
plt.xlabel("w_t")
plt.xlabel("c_t")

We see that so far, using tensorflow did not depart very significantly from using numpy.

# Construct the objective

By substituting $c_t$ and $c_{t+1}$ it is clear that Euler equation in (ref) depends ...

In [None]:
def euler_residual(r: Vector, δ: Vector, q: Vector, p: Vector, w: Vector):

    # all inputs are expected to have the same size N
    N = tf.size(r)

    # arguments correspond to the values of the states today
    ζ, h = dr(r, δ, q, p, w)
    c = ζ*w

    # transitions of the exogenous processes
    rnext = r*ρ_r + tf.random.normal(shape=(N,), stddev=σ_r)
    δnext = δ*ρ_δ + tf.random.normal(shape=(N,), stddev=σ_δ)
    pnext = p*ρ_p + tf.random.normal(shape=(N,), stddev=σ_p)
    qnext = q*ρ_q + tf.random.normal(shape=(N,), stddev=σ_q)

    # transition of endogenous variables
    wnext = tf.exp(pnext)*tf.exp(qnext) + (w-c)*rbar*tf.exp(rnext)

    ζnext, hnext = dr(rnext, δnext, qnext, pnext, wnext)
    cnext = ζnext*wnext
    
    res1 = β*tf.exp(δnext-δ)*(cnext)**(-γ)*rbar*tf.exp(rnext) - h
    res2 = tf.minimum(h**(-1/γ), w) - c

    return (res1, res2)



In [None]:
def euler(r: Vector, δ: Vector, q: Vector, p: Vector, w: Vector):

    res1_1, res1_2 = euler_residual(r, δ, q, p, w)
    res2_1, res2_2 = euler_residual(r, δ, q, p, w)

    res = res1_1*res2_1 + res1_2*res2_2

    return res

In [None]:
def loss(r: Vector, δ: Vector, q: Vector, p: Vector, w: Vector):
    res = euler(r, δ, q, p, w)
    return (tf.reduce_mean(res**2))

In [None]:
def draw_state(N):
    r = np.random.randn(N).astype('float32')*σ_r
    δ = np.random.randn(N).astype('float32')*σ_δ
    q = np.random.randn(N).astype('float32')*σ_q
    p = np.random.randn(N).astype('float32')*σ_p
    w = wmin + np.random.rand(N).astype('float32')*(wmax-wmin)
    return (r,δ,q,p,w)

In [None]:
N = 1024
s = draw_state(N)
v = loss(*s)

It looks like we have been using numpy but the result is a tensor object.

In [None]:
v.numpy()

TODO: concise graph about the computation.
Maybe screenshot of tensorboard.

In [None]:
from tensorflow.keras.optimizers import Adam, SGD

In [None]:
variables = perceptron.trainable_variables
optimizer = Adam()
# optimizer = SGD(0.1)

In [None]:
@tf.function
def train_step(k, κ=0.5):

    s = draw_state(N)

    with tf.GradientTape() as tape:
        xi = loss(*s)

    grads = tape.gradient(xi, variables)
    optimizer.apply_gradients(zip(grads,variables))

    return xi



In [None]:

def train_me(K):

    drs = []
    vals = []
    for k in tqdm(tf.range(K)):
        val = train_step(k)
        vals.append(val.numpy())
    return vals, drs

In [None]:
# with writer.as_default():
res, drs = train_me(50000)

In [None]:
plt.plot(np.sqrt( res) )
plt.xscale('log')
plt.yscale('log')
plt.grid()

In [None]:

wvec = np.linspace(wmin, wmax, 100)
ζvec, hvec = dr(wvec*0, wvec*0, wvec*0, wvec*0, wvec)

plt.plot(wvec, wvec, linestyle='--', color='black')
plt.plot(wvec, wvec*ζvec)
plt.xlabel("w_t")
plt.xlabel("c_t")