In [None]:
# === Environment Setup ===
import os, sys, math, time, random, json, textwrap, warnings
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from numpy.polynomial import chebyshev
from scipy.optimize import minimize_scalar, brentq
from scipy.interpolate import interp1d
from scipy.stats import norm
from numba import njit, prange
from IPython.display import Image, Markdown, display

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'figure.dpi': 130, 'font.size': 12, 'axes.titlesize': 'x-large',
    'axes.labelsize': 'large', 'xtick.labelsize': 'medium', 'ytick.labelsize': 'medium'})
np.set_printoptions(suppress=True, linewidth=120, precision=6)

# --- Utility Functions ---
def note(msg, **kwargs):
    display(Markdown(f"<div class='alert alert-info'>📝 {textwrap.fill(msg, width=100)}</div>"))
def sec(title):
    print(f"\n{100*'='}\n| {title.upper()} |\n{100*'='}")

note("Environment initialized.")

# Part 3: Dynamic Models
## Chapter 3.2: DP with Continuous States via Function Approximation

### Table of Contents
1.  [The Method of Function Approximation](#1.-The-Method-of-Function-Approximation)
    *   [1.1 Projection Methods](#1.1-Projection-Methods)
    *   [1.2 Basis Functions: Why Chebyshev?](#1.2-Basis-Functions:-Why-Chebyshev?)
2.  [Solving the Stochastic Growth Model](#2.-Solving-the-Stochastic-Growth-Model)
    *   [2.1 Method 1: VFI on Coefficients](#2.1-Method-1:-VFI-on-Coefficients)
    *   [2.2 Method 2: Iterating on the Euler Equation Residuals](#2.2-Method-2:-Iterating-on-the-Euler-Equation-Residuals)
    *   [2.3 Method 3: The Endogenous Grid Method (EGM)](#2.3-Method-3:-The-Endogenous-Grid-Method-(EGM))
3.  [Performance Comparison](#3.-Performance-Comparison)
4.  [Chapter Summary](#4.-Chapter-Summary)
5.  [Exercises](#5.-Exercises)

### Introduction: From Grids to Functions

The dynamic programming methods we studied in the previous chapter all relied on **discretizing the state space**. We represented a continuous state variable with a finite grid of points and solved for the value/policy functions at these points, using interpolation to fill in the gaps. This chapter introduces a more advanced and scalable solution: **function approximation**.

Instead of finding the value of the function at every grid point, we approximate the entire value function (or policy function) using a flexible, parameterized functional form, such as a linear combination of **Chebyshev polynomials**. The goal is to find the set of parameters that makes our approximation as close as possible to the true, unknown function. This chapter will introduce three powerful methods for doing so:
1.  **Value Function Iteration on Coefficients**: The most direct extension of VFI to a continuous setting.
2.  **Euler Equation Residual Iteration**: A more accurate method that directly targets the model's first-order condition.
3.  **The Endogenous Grid Method (EGM)**: A highly efficient algorithm that exploits the structure of the Euler equation to dramatically speed up computation.
> **Historical Context: Christopher Carroll and the EGM**
> The Endogenous Grid Method was developed by Christopher Carroll in a 2006 working paper. The method's key innovation was to reverse the standard logic of dynamic programming. Instead of fixing an exogenous grid of states and finding the optimal choices, EGM fixes a grid of choices and finds the endogenous states that would lead to those choices. This seemingly simple change dramatically speeds up the solution of a large class of consumption-savings models, making it a cornerstone of modern computational economics.

### 1. The Method of Function Approximation
The core idea is to approximate a true, unknown function $f(s)$ with a parameterized function $\hat{f}(s; \boldsymbol{\theta})$, where $\boldsymbol{\theta} = [\theta_0, \dots, \theta_{N-1}]$ is a vector of coefficients for a set of $N$ basis functions, $\phi_i(s)$:
$$ f(s) \approx \hat{f}(s; \boldsymbol{\theta}) = \sum_{i=0}^{N-1} \theta_i \phi_i(s) $$

#### 1.1 Projection Methods
We need a way to choose the coefficients $\boldsymbol{\theta}$ to make the approximation "good." **Projection methods** do this by requiring that the residual of the relevant functional equation (e.g., the Bellman equation or the Euler equation) is "close to zero." A common way to enforce this is **collocation**, where we demand that the functional equation holds exactly at a set of $N$ specific *collocation nodes*. This gives us a system of $N$ equations in $N$ unknown coefficients, which we can solve.

This approach transforms an infinite-dimensional problem (finding a function) into a finite-dimensional one (finding the coefficient vector $\boldsymbol{\theta}^*$).

#### 1.2 Basis Functions: Why Chebyshev?
While we could use standard monomials ($1, x, x^2, ...$) as our basis functions, they are a poor choice as they are not orthogonal and lead to numerical instability (multicollinearity). **Chebyshev polynomials** are a special sequence of orthogonal polynomials defined on the interval `[-1, 1]` that are much better behaved. Approximating a function by fitting a polynomial at the **Chebyshev nodes** is a highly accurate and stable method that avoids the Runge phenomenon.
> **The Runge Phenomenon**
> The Runge phenomenon, named after the German mathematician Carl Runge, describes the problem of oscillation at the edges of an interval when using polynomial interpolation with high-degree polynomials and equally spaced nodes. As the degree of the polynomial increases, the approximation can become worse, with large oscillations near the endpoints. Chebyshev nodes, which are more densely clustered near the endpoints of the interval, are specifically chosen to mitigate this problem, ensuring a more stable and accurate approximation.

### 2. Solving the Stochastic Growth Model
We return to the stochastic growth model. The state is $(a, y)$ and the Bellman equation is:
$$ V(a, y) = \max_{0 < a' \le (1+r)a+y} \left\{ u((1+r)a+y-a') + \beta E[V(a', y')|y] \right\} $$
The corresponding Euler equation is:
$$ u'(c(a,y)) = \beta (1+r) E[u'(c(a',y'))|y] $$

In [None]:
sec("Model Setup")
# --- Model Primitives & Parameters ---
PARAMS = {
    'BETA': 0.96,
    'GAMMA': 2.0,
    'R_INTEREST': 0.03,
    'A_MIN': 1e-3,
    'A_MAX': 50.0,
    'N_DEGREE': 25, # Degree of Chebyshev polynomial
    'N_Y_STATES': 7, # Number of income states
    'RHO_Y': 0.95, # Persistence of income shock
    'SIGMA_Y': 0.1, # Std. dev. of income shock
}

@njit
def u(c, gamma=PARAMS['GAMMA']): return (c**(1 - gamma)) / (1 - gamma) if gamma != 1 else np.log(c)
@njit
def u_prime(c, gamma=PARAMS['GAMMA']): return c**(-gamma)
@njit
def u_prime_inv(x, gamma=PARAMS['GAMMA']): return x**(-1/gamma)

# --- Discretize income process ---
@njit
def norm_cdf_numba(x):
    'Cumulative distribution function for the standard normal distribution'
    return (1.0 + math.erf(x / np.sqrt(2.0))) / 2.0

@njit
def tauchen(rho, sigma_e, n, m):
    """Discretizes an AR(1) process using the Tauchen (1986) method."""
    z_max = m * sigma_e / np.sqrt(1 - rho**2)
    z_grid = np.linspace(-z_max, z_max, n)
    step = (z_grid[1] - z_grid[0]) / 2
    P = np.empty((n, n))
    for i in range(n):
        for j in range(n):
            z_j_low = z_grid[j] - step
            z_j_high = z_grid[j] + step
            P[i, j] = norm_cdf_numba((z_j_high - rho * z_grid[i]) / sigma_e) - \
                      norm_cdf_numba((z_j_low - rho * z_grid[i]) / sigma_e)
    return z_grid, P

Z_GRID, P_TRANS = tauchen(rho=PARAMS['RHO_Y'], sigma_e=PARAMS['SIGMA_Y'], n=PARAMS['N_Y_STATES'], m=3)
Y_STATES = np.exp(Z_GRID)

# --- Chebyshev Approximation Setup ---
cheb_nodes = chebyshev.chebpts1(PARAMS['N_DEGREE'])
A_NODES = (cheb_nodes + 1) * (PARAMS['A_MAX'] - PARAMS['A_MIN']) / 2 + PARAMS['A_MIN']

#### 2.1 Method 1: VFI on Coefficients
Iterate on the coefficients of the value function's Chebyshev approximation.

In [None]:
def solve_vfi_chebyshev(params, y_states, p_trans, a_nodes, tol=1e-6, max_iter=1000):
    """Solves the growth model using VFI on Chebyshev coefficients."""
    beta, r, a_min, n_degree, n_y_states = params['BETA'], params['R_INTEREST'], params['A_MIN'], params['N_DEGREE'], params['N_Y_STATES']
    theta_guess = np.zeros((n_y_states, n_degree))
    
    for i in range(max_iter):
        V_approx_funcs = [chebyshev.Chebyshev(theta, domain=[a_min, params['A_MAX']]) for theta in theta_guess]
        EV = np.array([p_trans[i, :] @ np.array([V(a_nodes) for V in V_approx_funcs]) for i in range(n_y_states)])
        
        V_target_at_nodes = np.empty((n_y_states, n_degree))
        for i in range(n_y_states):
            ev_interp = lambda a_prime: np.interp(a_prime, a_nodes, EV[i, :])
            for j in range(n_degree):
                a, y = a_nodes[j], y_states[i]
                def objective(a_prime):
                    c = (1 + r) * a + y - a_prime
                    if c <= 0: return 1e12
                    return -(u(c) + beta * ev_interp(a_prime))
                res = minimize_scalar(objective, bounds=(a_min, (1 + r) * a + y - 1e-6), method='bounded')
                V_target_at_nodes[i, j] = -res.fun
                
        theta_new = np.array([chebyshev.chebfit(a_nodes, V_target_at_nodes[i, :], deg=n_degree-1) for i in range(n_y_states)])
        if np.max(np.abs(theta_new - theta_guess)) < tol: break
        theta_guess = theta_new
    return theta_new

note("VFI on Coefficients solver defined.")

#### 2.2 Method 2: Iterating on the Euler Equation Residuals
A more accurate approach is to iterate on the policy function, finding the coefficients for $\hat{c}(a, y; \boldsymbol{\theta})$ that make the Euler equation error as small as possible. 

**Algorithm:**
1. Start with an initial guess for the policy function coefficients $\boldsymbol{\theta}^0$.
2. Given the current policy $\hat{c}_k$, compute the RHS of the Euler equation, which gives the conditional expectation of next period's marginal utility, $E_k = E[u'(c_k(a', y'))|y]$.
3. Find the updated consumption level that satisfies the Euler equation: $c_{k+1}(a,y) = u'^{-1}(\beta(1+r)E_k)$.
4. Project this new consumption function onto the Chebyshev basis to get the new coefficients $\boldsymbol{\theta}^{k+1}$.
5. Repeat until the coefficients converge.

In [None]:
def solve_euler_residuals(params, y_states, p_trans, a_nodes, tol=1e-7, max_iter=500):
    """Solves the growth model by iterating on the Euler equation residuals."""
    beta, r, a_min, n_degree, n_y_states = params['BETA'], params['R_INTEREST'], params['A_MIN'], params['N_DEGREE'], params['N_Y_STATES']
    policy_coeffs = np.zeros((n_y_states, n_degree))
    
    for i in range(max_iter):
        policy_funcs = [chebyshev.Chebyshev(theta, domain=[a_min, params['A_MAX']]) for theta in policy_coeffs]
        c_prime = np.array([p(a_nodes) for p in policy_funcs])
        marginal_u_prime = u_prime(c_prime)
        
        E_u_prime = p_trans @ marginal_u_prime
        c_target_at_nodes = u_prime_inv(beta * (1 + r) * E_u_prime)
        
        new_policy_coeffs = np.array([chebyshev.chebfit(a_nodes, c_target_at_nodes[i, :], deg=n_degree-1) for i in range(n_y_states)])
        
        if np.max(np.abs(new_policy_coeffs - policy_coeffs)) < tol: break
        policy_coeffs = new_policy_coeffs
    return new_policy_coeffs

note("Euler Residual Iteration solver defined.")

#### 2.3 Method 3: The Endogenous Grid Method (EGM)

##### The Problem with Value Function Iteration (VFI)

In the previous notebooks, we solved dynamic programming problems using Value Function Iteration (VFI). VFI is robust and general, but it can be computationally expensive. The main bottleneck is the maximization step within the Bellman operator:
$$ V(w) = \max_{c} \left\{ u(c) + \beta V(w') \right\} $$
For each point on our state grid for wealth ($w$), we have to perform a numerical optimization to find the optimal consumption ($c$). If the grid for $w$ has 1000 points and the optimization for each point takes time, the total time can add up quickly, especially as the dimensionality of the state space increases.

Can we do better? For a large class of problems, the answer is a resounding **yes**.

##### The Logic of the Endogenous Grid Method (EGM)

The Endogenous Grid Method, developed by Carroll (2006), cleverly avoids the repeated maximization step. Instead of fixing a grid for the state variable (wealth, $w$) and finding the optimal consumption for each grid point, EGM fixes a grid for the agent's choice *at the end of the period* and finds the corresponding state that makes that choice optimal.

The key insight comes from the Euler equation, which is the first-order condition of the Bellman equation:
$$ u'(c_t) = \beta R E_t[u'(c_{t+1})] $$

EGM works backward from a known policy function in the next period ($c_{t+1}$) to find the optimal policy today ($c_t$). Here is the logic:
1.  **Start at the end:** Assume we know the optimal consumption rule in the next period, $c_{t+1}(w')$.
2.  **Choose an endogenous grid:** Instead of an exogenous grid on today's wealth ($w_t$), we create a grid for *end-of-period assets*, which we'll call $a_t$. This is the amount of wealth left *after* consumption today.
3.  **Calculate future consumption:** For each point on our $a_t$ grid, we know that tomorrow's wealth will be $w_{t+1} = R a_t$. We can then use the known future consumption rule to find $c_{t+1}(R a_t)$.
4.  **Invert the Euler Equation:** Now, we can use the Euler equation to find today's optimal consumption, $c_t$, that corresponds to each choice of $a_t$:
    $$ c_t = (u')^{-1}(\beta R E_t[u'(c_{t+1}(R a_t))]) $$
    where $(u')^{-1}$ is the inverse of the marginal utility function. This step is analytical and fast—no numerical optimization is needed!
5.  **Find the endogenous wealth grid:** We have a set of optimal consumption choices $c_t$ and the corresponding end-of-period assets $a_t$. We can find the level of wealth today, $w_t$, that would lead to these choices using the budget constraint: $w_t = c_t + a_t$. This gives us an "endogenous" grid of wealth points.
6.  **Interpolate:** We now have the policy function $c(w_t)$ on an uneven, endogenous grid. We can use interpolation to get the policy function on our original, evenly spaced grid, and we are ready for the next iteration.

By replacing the costly maximization step with an analytical inversion of the Euler equation and interpolation, EGM can be orders of magnitude faster than VFI.

**Unpacking the EGM Algorithm:**
1.  **Outer Loop:** We start with a guess for the policy function, `policy_old`.
2.  **Step 1: Calculate Future Consumption:** For each point on our end-of-period asset grid `a_grid`, and for each possible income shock `y_shocks`, we calculate the wealth next period, `w_future`. We then use our `policy_old` to find the consumption that would occur at that future wealth level, `c_future`.
3.  **Step 2: Calculate Expected Marginal Utility:** We calculate the marginal utility for each possible `c_future` and then take the expectation over the income shocks to get `expected_marginal_utility`.
4.  **Step 3: Invert the Euler Equation:** We use the `inv_u_prime` function to find the current consumption, `c_current`, that corresponds to the `expected_marginal_utility`.
5.  **Step 4: Find the Endogenous Wealth Grid:** We use the budget constraint to find the endogenous wealth grid, `w_endogenous`, that corresponds to `c_current` and `a_grid`.
6.  **Step 5: Update the Policy Function:** We now have the policy function on the `w_endogenous` grid. We use interpolation to find the policy on our original, evenly spaced `a_grid`, which gives us `policy_new`.
7.  **Inner Loop:** We repeat steps 1-6 until the policy function converges.

##### Implementing EGM for a Consumption-Savings Model

In [None]:
@njit(parallel=True)
def egm_solver(R_INTEREST, BETA, GAMMA, y_states, p_trans, a_grid, tol=1e-7, max_iter=1000):
    """Solves the growth model with EGM for a persistent income process."""
    # Unpack params
    R = 1 + R_INTEREST
    beta = BETA
    gamma = GAMMA
    n_y = len(y_states)
    n_a = len(a_grid)

    # Utility functions
    def u_prime(c):
        return c**(-gamma)
    def inv_u_prime(x):
        return x**(-1/gamma)

    # Initial guess for policy: consume current assets
    policy = np.zeros((n_y, n_a))
    for i in range(n_y):
         policy[i, :] = a_grid

    for i in range(max_iter):
        policy_old = policy.copy()

        # c_prime[j,k] is consumption next period if next state is y_j and savings are a_k
        c_prime = np.empty((n_y, n_a))
        for j in range(n_y):
            w_prime = R * a_grid + y_states[j]
            c_prime[j, :] = np.interp(w_prime, a_grid, policy_old[j, :])

        marg_u_prime = u_prime(c_prime)
        E_u_prime = p_trans @ marg_u_prime
        c_target = inv_u_prime(beta * R * E_u_prime)
        w_endog = a_grid + c_target

        policy_new = np.empty_like(policy)
        for j in prange(n_y):
            policy_new[j, :] = np.interp(a_grid, w_endog[j, :], c_target[j, :])
            policy_new[j, :] = np.minimum(policy_new[j, :], a_grid)

        if np.max(np.abs(policy_new - policy_old)) < tol:
            return policy_new
        policy = policy_new
    return policy

### 3. Performance Comparison and Visualization

In [None]:
sec("Solving and Comparing Runtimes")
A_GRID = np.linspace(PARAMS['A_MIN'], PARAMS['A_MAX'], 200)
print("Solving with VFI on Coefficients...")
start_time = time.time()
theta_vfi = solve_vfi_chebyshev(PARAMS, Y_STATES, P_TRANS, A_NODES)
vfi_time = time.time() - start_time
print(f"VFI on Coefficients took {vfi_time:.4f} seconds.")

print("\nSolving with Euler Residual Iteration...")
start_time = time.time()
theta_euler = solve_euler_residuals(PARAMS, Y_STATES, P_TRANS, A_NODES)
euler_time = time.time() - start_time
print(f"Euler Residual Iteration took {euler_time:.4f} seconds.")

print("\nSolving with Endogenous Grid Method (Grid-Based)...")
start_time = time.time()
# Note: The EGM solver returns the policy function directly, not the coefficients.
policy_egm = egm_solver(PARAMS['R_INTEREST'], PARAMS['BETA'], PARAMS['GAMMA'], Y_STATES, P_TRANS, A_GRID)
egm_time = time.time() - start_time
print(f"EGM took {egm_time:.4f} seconds.")

note("EGM is typically orders of magnitude faster because it avoids the costly optimization step inside the main loop.")

#### Visualizing the Policy Functions
Let's compare the consumption policy functions produced by the different methods for the highest and lowest income states. They should be virtually identical, demonstrating that all three methods converge to the same solution, but at very different speeds.

In [None]:
sec("Policy Function Visualization")
a_fine_grid = np.linspace(PARAMS['A_MIN'], PARAMS['A_MAX'], 200)

# Create Chebyshev function objects from the solved coefficients
policy_vfi = [chebyshev.Chebyshev(theta, domain=[PARAMS['A_MIN'], PARAMS['A_MAX']]) for theta in theta_vfi]
policy_euler = [chebyshev.Chebyshev(theta, domain=[PARAMS['A_MIN'], PARAMS['A_MAX']]) for theta in theta_euler]
policy_egm_interp = [interp1d(A_GRID, pol, bounds_error=False, fill_value="extrapolate") for pol in policy_egm]

fig, axes = plt.subplots(1, 2, figsize=(16, 6), sharey=True)

# Low income state (y_0)
axes[0].plot(a_fine_grid, policy_vfi[0](a_fine_grid), label='VFI', linestyle='--')
axes[0].plot(a_fine_grid, policy_euler[0](a_fine_grid), label='Euler Iteration', linestyle=':')
axes[0].plot(a_fine_grid, policy_egm[0], label='EGM (Grid)', linestyle='-')
axes[0].set_title(f'Consumption Policy: Low Income State (y={Y_STATES[0]:.2f})')
axes[0].set_xlabel('Assets (a)')
axes[0].set_ylabel('Consumption (c)')
axes[0].legend()

# High income state (y_N-1)
axes[1].plot(a_fine_grid, policy_vfi[-1](a_fine_grid), label='VFI', linestyle='--')
axes[1].plot(a_fine_grid, policy_euler[-1](a_fine_grid), label='Euler Iteration', linestyle=':')
axes[1].plot(a_fine_grid, policy_egm[-1], label='EGM (Grid)', linestyle='-')
axes[1].set_title(f'Consumption Policy: High Income State (y={Y_STATES[-1]:.2f})')
axes[1].set_xlabel('Assets (a)')
axes[1].legend()

plt.suptitle('Comparison of Policy Functions from Different Solvers', fontsize=18)
plt.tight_layout(rect=[0, 0, 1, 0.96])
if not os.path.exists('../images/03-Economic-Modeling'):
    os.makedirs('../images/03-Economic-Modeling')
plt.savefig('../images/03-Economic-Modeling/policy_comparison.png')
plt.close()
display(Image(filename='../images/03-Economic-Modeling/policy_comparison.png'))

### 4. Chapter Summary
- **Function Approximation:** For continuous state spaces, we can approximate the value or policy function using a parameterized functional form, most commonly a basis of **Chebyshev polynomials**.
- **Projection Methods:** We solve for the unknown coefficients by requiring the relevant functional equation (Bellman or Euler) to hold at a set of **collocation nodes**.
- **Solution Algorithms:**
    - **VFI on Coefficients:** Conceptually simple but can be slow.
    - **Euler Equation Iteration:** More accurate as it directly targets the first-order condition.
    - **Endogenous Grid Method (EGM):** A highly efficient method that avoids the inner optimization loop by inverting the budget constraint. For models where it applies, it is often the best choice.
- **Curse of Dimensionality:** Function approximation is the primary tool for combating the curse of dimensionality, but standard methods still face challenges. Advanced techniques like sparse grids and  learning are active areas of research.

### 5. Exercises

1.  **Accuracy vs. Degree:** Re-solve the stochastic growth model using the Euler Equation method with different degrees for the Chebyshev polynomial (e.g., N=10, 20, 30). How does the maximum absolute Euler equation residual (evaluated on a fine grid) change as you increase the degree? Plot the maximum residual for each degree.

2.  **Log-Utility:** The log-utility function, $u(c) = \ln(c)$, is the special case of CRRA where `γ=1`. Modify the EGM solver to work with log utility (you will need to change `u_prime` and `u_prime_inv`). How does the policy function change?

3.  **The Role of Interest Rates:** Using the EGM solver, resolve the model for a higher interest rate (`R_INTEREST = 0.05`) and a lower one (`R_INTEREST = 0.01`). Plot the three policy functions (for the low, medium, and high interest rates) for the high-income state. How does the interest rate affect savings behavior? Explain the income and substitution effects at play.

4.  **Implement Time Iteration:** The Euler Equation method shown is a form of time iteration. A simpler version iterates on the policy function directly. Given a policy approximation $\hat{c}_k$, compute the RHS of the Euler equation to get the target consumption $c_{k+1}$. Then find the coefficients for the new policy approximation $\hat{c}_{k+1}$. Implement this method and compare its convergence speed to the other methods.

5.  **2D Approximation:** Consider a model where the agent has two assets, a liquid one ($a_1$) and an illiquid one ($a_2$). The value function is now $V(a_1, a_2, y)$. To approximate this, you would use a 2D basis of Chebyshev polynomials: $V(a_1, a_2) \approx \sum_{i=0}^{N-1}\sum_{j=0}^{N-1} \theta_{ij} T_i(a_1) T_j(a_2)$. How many coefficients would you need for a degree-5 approximation in each asset dimension? Why does this illustrate the curse of dimensionality?