# Computational Methods in Economics

## Lecture 4 - Root Finding

In [1]:
# Author: Alex Schmitt (schmitt@ifo.de)

import datetime
print('Last update: ' + str(datetime.datetime.today()))

Last update: 2017-11-03 17:42:01.221204


### Preliminaries

#### Import Modules

In [2]:
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn

import scipy.optimize

# import sys
from importlib import reload

## Introduction

A function $f(x)$ has a *root* (also called a *zero*) at $x^*$ if $f(x^*) = 0$. Note that $f$ here can be a univariate function (both input and output are scalars, or both its range and its domain have a dimension of 1), a multivariate function (its inputs are vectors, hence its domain has a dimension greater than 1) or vector-valued (both its range and its domain have a dimension greater than 1. In the latter case, finding the roots of a vector-valued function is equivalent to *solving a system of nonlinear equations*. 

Finding the root(s) of a function is one of the most common computational problems in economics, often applied when looking for an equilibrium. In other words, an equilibrium is usually defined by a set of equations


### Example: Neoclassical Growth Model


- Utility function:

\begin{equation}
    u(c, h) = \frac{c^{1-\nu}}{1-\nu} - B \frac{h^{1+\eta}}{1+\eta}
\end{equation}

with $c$ denoting consumption and $h$ labor supply.

- Production function:

\begin{equation}
    f(k, h) = A k^\alpha h^{1-\alpha}
\end{equation}
with $k$ denoting the capital stock, and $\theta$ a productivity shock.

- Resource Constraint:

\begin{equation}
    k_{t+1} + c_t = f(k_t, h_t) + (1 - \delta) k_t = A k_t^\alpha h_t^{1-\alpha} + (1 - \delta) k_t
\end{equation}

- Planner's Problem:

\begin{equation}
    \max_{\left\{c_t, k_{t+1}, h_t\right\}} \sum^\infty_{t = 0} \beta^t u(c_t, h_t) 
\end{equation}
s.t. the resource constraint.

- F.o.c's:
(1) Euler equation

\begin{equation}
    c^{-\nu} = \beta \left[ (c')^{-\nu} (f_k(k', h') + 1 - \delta) \right]    
\end{equation}

(2) intratemporal optimality condition

\begin{equation}
    B h^{\eta} = c^{-\nu} f_h(k, h)  
\end{equation}

where I have used the notation $c = c_t$ and $c' = c_{t+1}$ for brevity. 

In an equilibrium, the two first-order conditions, combined with the resource constraint, must hold in every period. We will get to how to solve for the full dynamic allocation later in this course. For now, let's consider the *steady state*, where all variables are constant over time, i.e. $c_t = c_{t+1} = c_s$ and so on. The Euler equation then can be simplied to:

\begin{equation}
    1 = \beta \left[f_k(k_s, h_s) + 1 - \delta \right]    
\end{equation}

For the intratemporal optimality condition, use the resource constraint to substitute consumption:

\begin{equation}
    B h_s^{\eta} = \left[ f(k_s, h_s) - \delta k_s \right]^{-\nu} f_h(k_s, h_s)  
\end{equation}

This is a nonlinear system of two equations, with two unknown variables, $k_s$ and $h_s$, which can be solved using the method introduced below. We can also define a vector-valued function $\mathbf{S}$ with

\begin{equation}
   \mathbf{S}(k, h) = 
    \left[
    \begin{array}{c}
        \beta \left[f_k(k, h) + 1 - \delta \right]  - 1 \\
        \left[ f(k, h) - \delta k \right]^{-\nu} f_h(k, h) - B h^{\eta}
    \end{array}
    \right]
\end{equation}

Finding the steady state of the model then requires finding a root of function $\mathbf{S}$, i.e. a vector $(k_s, h_s)$ such that 

\begin{equation}
   \mathbf{S}(k_s, h_s) = 
    \left[
    \begin{array}{c}
        0 \\
        0
    \end{array}
    \right]
\end{equation}

--------------------------------------------------------------------------------------------------------------------------------

## Bisection

The simplest way to compute the root of a univariate real-valued function is the *bisection method*. While simple, bisection captures two important features of most root-finding and optimization methods: it is a *local* method and it is based on an *iterative procedure*.

The key idea behind the bisection method is based on the *Intermediate Value Theorem*: if $f$ is continuous and defined on the interval $[a,b]$, and if $f(a)$ and $f(b)$ are distinct values, then $f$ must assume all values in between. Since we are interested in where $f$ assumes the value 0, we need $f(a)$ and $f(b)$ to have different signs.

The bisection method implements the following "pseudo-code":

(i) Start with two values $a$ and $b$ such that $f(a)$ and $f(b)$ are defined and have different signs. Moreover, specify a "tolerance level" $tol$ which should be a very small number, e.g. 1e-8.

(ii) Compute the midpoint between $a$ and $b$, $x = \frac{a + b}{2}$. 

(iii) If $f(x)$ has the same sign as $f(a)$, replace the left endpoint of the interval with $x$, i.e. $a = x$.

(iv) If $f(x)$ has the same sign as $f(b)$, replace the right endpoint of the interval with $x$, i.e. $b = x$.

(v) Repeat from (ii) until the absolute value of $f(x)$ is less than $tol$, i.e. $|f(x)| < tol$.

Note the following:
- Bisection is an *iterative procedure*: at the beginning of each iteration step, the interval $[a,b]$ contains a root of $f$. The interval is then divided ("bisected") into two subintervals of equal length. One of the two subintervals must contain the root, and hence have endpoints of different signs. This subinterval is taken as the interval $[a,b]$ used for the next iteration. This process continues until the resulting midpoint $x$ of the current interval is sufficiently close to 0.  
- Moreover, bisection is a *local* method: it will not give you all the roots of a function, but only one of the roots (in case there are multiple roots) between $a$ and $b$. A corollary of this is that the outcome of bisection (and of local methods in general) is sensitive to the starting point chosen by the user, here the values for $a$ and $b$.

In this week's problem set, you will be asked to code up the bisection method. Of course, most programming languages already have in-built implementations (e.g. in SciPy: **scipy.optimize.bisect**, as discussed below), so writing your own function may seem a bit redundant, but will help you to get used the inner workings of many of the algorithms used in scientific computing.

--------------------------------------------------------------------------------------------------------------------------------

## Function Iteration

We have started to talk about iterative methods at the end of last lecture. To recap, the basic idea of iterative methods is to generate a sequence of approximations to the object of interest, e.g. the solution to linear or nonlinear system of equations, following an iteration rule: 

\begin{equation}
    x^{(k+1)} = g( x^{(k)} ),
\end{equation}

where $k$ is an indicator counting the number of iterations. Hence, in words, the value for $x$ in the $k+1$-iteration is obtained by applying function $g$ on the value for $x$ in the $k$-iteration. Ideally, these approximations become more and more precise with an increasing number of iterations. Recall that iterative methods, in contrast to direct methods, do not yield an exact solution.


When finding the root of a function $f$ or solving for a system of nonlinear equations, the functional form of $g$ is simply

\begin{equation}
    g( x ) = x - f(x).
\end{equation}

This is intuitive: at the root $ x = x* $, we have $f(x^*) = 0$ and hence $g (x^*) = x^*$. In other words, $x^*$ is a *fixed point*.

The following piece of code implements function iteration. For illustration, we print the current guess for $x^{(k)}$ for each iteration. As we can see,  $x^{(k)}$ converges to $x^*$ as the number of iterations increases.

In [9]:
def fun(x):
    return 4*np.log(x) - 4

def g(x):
    return x - fun(x)

In [10]:
eps = 1e-8
x = 4
it = 0
lst = []
while abs((x - g(x))) > eps:
    it += 1
    x = g(x)
    lst.append(x)
    print(x)
    

print("Number of iterations = {}".format(it) )

2.45482255552
2.86260463623
2.65567694682
2.74887857583
2.70410642422
2.72502036232
2.71511676033
2.7197769279
2.71757746733
2.71861408156
2.7181251951
2.71835569051
2.71824700267
2.71829824977
2.71827408559
2.71828547937
2.71828010699
2.71828264016
2.71828144573
2.71828200892
2.71828174337
2.71828186858
2.71828180954
2.71828183738
2.71828182425
Number of iterations = 25


--------------------------------------------------------------------------------------------------------------------------------

## Newton's Method

Most algorithms used in practice to find the roots of a nonlinear system of equations are based on Newton's method. As function iteration, it is an iterative method

In [21]:
def fd(x):
    return 4/x

--------------------------------------------------------------------------------------------------------------------------------

## Quasi-Newton Methods

--------------------------------------------------------------------------------------------------------------------------------

## The Scipy Package

As a simple example, consider the function 

\begin{equation}
    f(x) = 4 \ln(x) - 4
\end{equation}

$f$ has a root at $x = e^x = 2.718282$. To find it numerically, the first thing we need to do is to import Scipy's subpackage *optimize*. We then define the function and use the **bisect()** function, an implementation of the bisection method outlined above.

In [22]:
def fun(x):
    return 4*np.log(x) - 4

print(scipy.optimize.bisect(fun,1,4))

**bisect(fun,a,b)** takes three arguments: the function, and an upper and lower initial guess for the root. In other words, you tell the algorithm to look for a root in the interval $[a,b]$. The important thing to note here is that $f(a)$ and $f(b)$ must have different signs - if they do not, you will get an error message (in this case, change $a$ or $b$ and try again).  

In the example above, solving for the root using Python is not really necessary. The real advantage of numerical root finding is in situations where finding a solution to $f(x) = 0$ analytically is not feasible. Consider, for example,

\begin{equation}
    f(x) = \sin(4 (x - 1/4)) + x + x^{20} - 1
\end{equation}

Finding a root via the bisection method is straightforward:

In [3]:
def fun(x):
    return np.sin(4 * (x - 0.25)) + x + x**20 - 1

print(scipy.optimize.bisect(fun,0,2))

0.4082935042806639


In [27]:
%timeit scipy.optimize.bisect(f,1,4)

10000 loops, best of 3: 58.7 µs per loop


In [29]:
%timeit scipy.optimize.fsolve(f,1)

The slowest run took 4.14 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 57.7 µs per loop


In [30]:
%timeit scipy.optimize.fsolve(f,4)

10000 loops, best of 3: 53.9 µs per loop


In [32]:
%timeit scipy.optimize.root(f,1)

The slowest run took 4.42 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 59.5 µs per loop


### Example: NGM Revisited

#### Parameters

In [13]:
## utility
beta = 0.8      # discount factor
nu = 2       # risk-aversion coefficient for consumption
eta = 1         # elasticity parameter for labor supply
eps = 1e-6      # lower bound of consumption and labor supply
## production
alpha = 0.25
delta = 0
## derived
A = (1 - beta * (1 - delta))/(alpha*beta) # normalization parameter for production function => steady state k = 1
B = (1 - alpha) * A * (A - delta)**nu      # parameter for utility function

#### Functions

In [14]:
def cobb_douglas(x, alpha, A):
    """
    Evaluates the Cobb-Douglas function with coefficient alpha and shift parameter A, for two inputs (x)
    """
    return A * x[0]**alpha * x[1]**(1 - alpha)

def cd_diff(x, alpha, A):
    """
    Evaluates the first derivative Cobb-Douglas function with coefficient alpha and shift parameter A, for two inputs (x)
    """
    return (A * alpha * cobb_douglas(x, alpha, A) / x[0], A * (1 - alpha) * cobb_douglas(x, alpha, A) / x[1])

def steady(x):
    """
    Returns the residuals of the steady-state conditions 
    """
    y = np.zeros(2)
    mp = cd_diff(x, alpha, A)
    
    y[0] = beta * (mp[0] + 1 - delta) - 1
    y[1] = (cobb_douglas(x, alpha, A) - delta * x[0])**(-nu) * mp[1] - B * x[1]**eta
    
    return y

#### Solve for the steady state

In [16]:
x0 = np.array([0.5, 0.5])
scipy.optimize.root(steady, x0)

    fjac: array([[-0.22330597, -0.9747484 ],
       [ 0.9747484 , -0.22330597]])
     fun: array([  1.16562315e-11,   8.72503181e-11])
 message: 'The solution converged.'
    nfev: 11
     qtf: array([ -9.08976505e-09,  -2.05845083e-09])
       r: array([ 1.34344783,  0.8497232 ,  0.5024392 ])
  status: 1
 success: True
       x: array([ 1.,  1.])