# Tutorial 05: Inference in Maximum Likelihood Estimation

**Author**: PanelBox Development Team  
**Date**: 2026-02-16  
**Estimated Duration**: 90-120 minutes  
**Prerequisites**: Tutorials 01 (Robust Fundamentals) and 02 (Clustering Panels)

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. **Understand** MLE standard errors and the role of the Hessian matrix
2. **Implement** classical MLE inference using the information matrix
3. **Apply** robust (sandwich) standard errors for misspecified models
4. **Use** the delta method for nonlinear transformations (odds ratios, marginal effects)
5. **Implement** cluster-robust MLE for panel data
6. **Apply** bootstrap for nonlinear models
7. **Compare** classical, robust, and bootstrap inference

---

## Table of Contents

1. [Setup and Data Loading](#setup)
2. [MLE Basics: Likelihood, Score, and Hessian](#mle-basics)
3. [Classical MLE Standard Errors](#classical)
4. [Robust (Sandwich) Standard Errors](#robust)
5. [Cluster-Robust MLE for Panel Data](#cluster)
6. [Delta Method for Nonlinear Transformations](#delta)
7. [Bootstrap for MLE](#bootstrap)
8. [Comparison and Best Practices](#comparison)
9. [Exercises](#exercises)
10. [Summary and Key Takeaways](#summary)
11. [References](#references)

---

<a id='setup'></a>
## 1. Setup and Data Loading

We work with two datasets:
1. **Credit Approval** (`credit_approval.csv`): 5000 cross-section observations, binary outcome `approved`
2. **Health Insurance** (`health_insurance.csv`): 1000 individuals × 5 years, multinomial plan choice (we derive a binary indicator)

In [None]:
# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Statistical tools
from scipy import stats
from scipy.special import expit  # logistic CDF: Λ(z) = 1/(1+exp(-z))
from scipy.optimize import minimize

# PanelBox imports
import sys
sys.path.insert(0, '../../..')  # Ensure panelbox is on path
import panelbox as pb
from panelbox.models.discrete import PooledLogit, PooledProbit
from panelbox.marginal_effects.discrete_me import compute_ame, compute_mem
from panelbox.standard_errors.mle import (
    sandwich_estimator,
    cluster_robust_mle,
    delta_method,
    bootstrap_mle,
)

# Configuration
np.random.seed(42)
plt.rcParams['figure.dpi'] = 100
plt.rcParams['figure.figsize'] = (10, 5)
pd.set_option('display.precision', 4)

# Paths
DATA_PATH = '../data/'
FIG_PATH  = '../outputs/figures/05_mle/'

import os
os.makedirs(FIG_PATH, exist_ok=True)

print('Setup complete.')

In [None]:
# ── Load Cross-Section: Credit Approval ─────────────────────────────────────
credit = pd.read_csv(DATA_PATH + 'credit_approval.csv')
# Add dummy panel identifiers so PooledLogit is happy
credit['entity'] = credit['id']
credit['time']   = 1

print('Credit Approval dataset')
print(f'  Shape : {credit.shape}')
print(f'  Approval rate : {credit["approved"].mean():.1%}')
print(f'  Columns: {list(credit.columns)}')
print()
print(credit[['approved','income','age','debt_ratio','credit_score','employment_length']].describe().round(2))

In [None]:
# ── Load Panel: Health Insurance Choice ─────────────────────────────────────
health = pd.read_csv(DATA_PATH + 'health_insurance.csv')
# Derive binary outcome: 1 if individual chose the highest-coverage plan (plan 3)
health['high_coverage'] = (health['plan_choice'] == 3).astype(int)

print('Health Insurance Panel dataset')
print(f'  Shape          : {health.shape}')
print(f'  Individuals    : {health["person_id"].nunique()}')
print(f'  Years          : {sorted(health["year"].unique())}')
print(f'  High-coverage rate : {health["high_coverage"].mean():.1%}')
print()
print(health[['high_coverage','age','income','family_size','health_status']].describe().round(2))

---

<a id='mle-basics'></a>
## 2. MLE Basics: Likelihood, Score, and Hessian

### 2.1 The MLE Framework

**Maximum Likelihood** chooses parameters $\hat{\beta}$ to maximise the log-likelihood:

$$
\hat{\beta}_{MLE} = \arg\max_\beta \; \ell(\beta) = \arg\max_\beta \sum_{i=1}^n \log f(y_i \mid x_i, \beta)
$$

**Binary Logit** — the workhorse model for binary outcomes:

$$
P(y_i = 1 \mid x_i) = \Lambda(x_i'\beta) = \frac{\exp(x_i'\beta)}{1 + \exp(x_i'\beta)}
$$

$$
\ell(\beta) = \sum_{i=1}^n \bigl[y_i \log \Lambda(x_i'\beta) + (1-y_i)\log(1-\Lambda(x_i'\beta))\bigr]
$$

### 2.2 Why MLE Inference Differs from OLS

**OLS**: Variance depends on residuals and $X$ through an explicit formula.

**MLE**: Variance depends on the **curvature** of the log-likelihood at $\hat\beta$:
- Steep curvature $\Rightarrow$ precise estimate (small variance)
- Flat curvature $\Rightarrow$ imprecise estimate (large variance)

> **Key insight**: Curvature = Information = Inverse of Variance

In [None]:
# ── Visualise: Steep vs Flat Log-Likelihood ──────────────────────────────────
beta_grid = np.linspace(0, 4, 1000)

# Steep: high curvature → low variance (well-identified parameter)
ll_steep = -10 * (beta_grid - 2) ** 2
# Flat: low curvature → high variance (poorly identified)
ll_flat  =  -1 * (beta_grid - 2) ** 2

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for ax, ll, label, color in zip(
    axes,
    [ll_steep, ll_flat],
    ['Steep Log-Likelihood\n(Low Variance, Precise Estimate)',
     'Flat Log-Likelihood\n(High Variance, Imprecise Estimate)'],
    ['steelblue', 'darkorange']
):
    ax.plot(beta_grid, ll, color=color, linewidth=2.5)
    ax.axvline(2, color='red', linestyle='--', linewidth=2, label='True $\\beta = 2$')
    ax.set_title(label, fontsize=12, fontweight='bold')
    ax.set_xlabel('$\\beta$', fontsize=12)
    ax.set_ylabel('Log-Likelihood $\\ell(\\beta)$', fontsize=11)
    ax.legend(fontsize=11)
    ax.grid(alpha=0.3)

plt.suptitle('Log-Likelihood Curvature Determines Precision of MLE',
             fontsize=13, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig(FIG_PATH + 'loglik_curvature.png', dpi=300, bbox_inches='tight')
plt.show()

print('Hessian (second derivative) at beta = 2:')
print(f'  Steep: d²ℓ/dβ² = -20  →  Var(β̂) = 1/20 = {1/20:.3f}')
print(f'  Flat : d²ℓ/dβ² =  -2  →  Var(β̂) = 1/2  = {1/2:.3f}')

### 2.3 Asymptotic Normality of MLE

Under regularity conditions:

$$
\hat{\beta} \xrightarrow{d} N\!\left(\beta_0,\; [\mathcal{I}(\beta_0)]^{-1}\right)
$$

where $\mathcal{I}(\beta) = -\mathbb{E}[\nabla^2 \ell(\beta)]$ is the **Fisher Information Matrix**.

**Three equivalent estimators** of the information matrix:

| Name | Formula | Used when |
|------|---------|----------|
| Observed Hessian | $-H(\hat\beta)$ | Most common |
| Expected (Fisher) | $\mathbb{E}[-H(\beta)]$ | Analytic form available |
| Outer-product of scores | $\sum_i s_i s_i'$ | Misspecification robust |

All three coincide under correct specification. They diverge when the model is misspecified — this divergence is what **robust (sandwich) SEs** exploit.

---

<a id='classical'></a>
## 3. Classical MLE Standard Errors

### 3.1 The Information Matrix Estimator

**Classical variance** uses the inverted (negative) Hessian:

$$
\widehat{\mathrm{Var}}_{\mathrm{classical}}(\hat{\beta}) = [-H(\hat{\beta})]^{-1}
$$

**Hessian for Logit** (analytical):

$$
H = -\sum_{i=1}^n \Lambda(x_i'\beta)\left[1 - \Lambda(x_i'\beta)\right] x_i x_i'
$$

This is negative semi-definite, so $[-H]^{-1}$ is positive semi-definite — a valid covariance matrix.

### 3.2 Estimation with PanelBox

In [None]:
# ── Logit model on credit data (cross-section, dummy panel ids) ───────────────
formula = 'approved ~ income + age + debt_ratio + credit_score + employment_length'

logit_nonrob = PooledLogit(formula, credit, 'entity', 'time')
res_nonrob   = logit_nonrob.fit(cov_type='nonrobust')

print(res_nonrob.summary())

### 3.3 Inspecting the Hessian and Covariance Matrix

The covariance matrix $[-H]^{-1}$ is stored in `result.cov_params`. Let's reconstruct it manually to understand the underlying computation.

In [None]:
# ── Manual Hessian computation for Logit ─────────────────────────────────────
from scipy.special import expit
import patsy

# Build design matrices (same formula)
y_cr, X_cr = patsy.dmatrices(
    'approved ~ income + age + debt_ratio + credit_score + employment_length',
    credit, return_type='matrix'
)
y_cr = np.asarray(y_cr).ravel()
X_cr = np.asarray(X_cr)
beta_hat = res_nonrob.params.values  # MLE estimates
var_names = res_nonrob.params.index.tolist()

# Fitted probabilities at MLE
eta   = X_cr @ beta_hat
p_hat = expit(eta)          # Λ(η)
w     = p_hat * (1 - p_hat) # Λ(η)(1 − Λ(η))

# Hessian: H = -X'WX  (negative definite)
H = -(X_cr.T * w) @ X_cr

# Covariance: Var(β̂) = (-H)^{-1}
vcov_classical = np.linalg.inv(-H)
se_classical   = np.sqrt(np.diag(vcov_classical))

print('Classical Standard Errors (manual vs PanelBox):')
print(f'{"Variable":<22} {"Manual SE":>10} {"PanelBox SE":>12} {"Match":>8}')
print('-' * 55)
for i, v in enumerate(var_names):
    pb_se  = res_nonrob.std_errors.iloc[i]
    match  = abs(se_classical[i] - pb_se) < 1e-8
    print(f'{v:<22} {se_classical[i]:>10.6f} {pb_se:>12.6f} {str(match):>8}')

### 3.4 When Classical SEs Are Valid

Classical SEs require three conditions:
1. **Correct specification**: The likelihood $f(y_i \mid x_i, \beta)$ is the true DGP
2. **Independent observations**: No cross-observation correlation
3. **Homogeneous parameters**: No unmodeled heterogeneity

**Reality**: These conditions are almost never all satisfied. The consequence: classical SEs are **biased**.

### 3.5 Monte Carlo: Classical SEs Under Misspecification

In [None]:
# ── Monte Carlo: size distortion under misspecification ───────────────────────
# True DGP includes x^2 but we estimate a linear model
np.random.seed(42)
n_sim, n_mc = 500, 200

reject_classical = []
reject_robust    = []

for _ in range(n_mc):
    x      = np.random.normal(0, 1, n_sim)
    # True model: quadratic in x
    eta_true = 0.5 + 1.0 * x + 0.5 * x**2
    p_true   = expit(eta_true)
    y        = np.random.binomial(1, p_true)

    # Design matrix for MISSPECIFIED model (omits x^2)
    X_sim = np.column_stack([np.ones(n_sim), x])

    # Logit MLE via scipy
    def neg_ll(beta):
        eta = X_sim @ beta
        return -np.sum(y * eta - np.log1p(np.exp(eta)))

    opt = minimize(neg_ll, np.zeros(2), method='BFGS')
    if not opt.success:
        continue
    b = opt.x

    p_fit = expit(X_sim @ b)
    w_fit = p_fit * (1 - p_fit)
    H_fit = -(X_sim.T * w_fit) @ X_sim

    # Classical SE
    try:
        vcov_c = np.linalg.inv(-H_fit)
        se_c   = np.sqrt(vcov_c[1, 1])
        t_c    = b[1] / se_c
        reject_classical.append(abs(t_c) > 1.96)
    except np.linalg.LinAlgError:
        pass

    # Robust (sandwich) SE
    scores = (y - p_fit)[:, None] * X_sim
    S_fit  = scores.T @ scores
    H_inv  = np.linalg.inv(-H_fit)
    vcov_r = H_inv @ S_fit @ H_inv
    se_r   = np.sqrt(vcov_r[1, 1])
    t_r    = b[1] / se_r
    reject_robust.append(abs(t_r) > 1.96)

print('Rejection rates at nominal 5% level (testing H0: β_x = 0 when true β_x ≠ 0)')
print(f'  Null is FALSE here — both should reject often (test of power)')
print()
print('Comparing CLASSICAL vs ROBUST for the x coefficient under misspecification:')
print(f'  Classical SE reject rate : {np.mean(reject_classical):.3f}')
print(f'  Robust SE reject rate    : {np.mean(reject_robust):.3f}')
print()
print('Key: Under misspecification, the two SEs diverge.')
print('     Robust SEs give more honest coverage of the pseudo-true parameter.')

---

<a id='robust'></a>
## 4. Robust (Sandwich) Standard Errors

### 4.1 The Huber-White Sandwich Estimator

$$
\widehat{\mathrm{Var}}_{\mathrm{robust}}(\hat{\beta}) = H^{-1}\, S\, H^{-1}
$$

where:
- $H = -\nabla^2 \ell(\hat{\beta})$ **Hessian** ("Bread")
- $S = \displaystyle\sum_{i=1}^n s_i s_i'$ **Outer product of scores** ("Meat")
- $s_i = \nabla_\beta \log f(y_i \mid x_i, \hat{\beta})$ individual **score**

**For Logit**: $s_i = \bigl(y_i - \Lambda(x_i'\hat\beta)\bigr)\, x_i$

**Robustness**: Valid even when the model is misspecified — it does not require the two expressions for the information matrix ($H$ and $S$) to be equal.

### 4.2 Using PanelBox

In [None]:
# ── Fit with all three SE types ───────────────────────────────────────────────
logit_rob  = PooledLogit(formula, credit, 'entity', 'time')
res_rob    = logit_rob.fit(cov_type='robust')

print('=== Robust (Sandwich) SE Results ===')
print(res_rob.summary())

In [None]:
# ── Comparison table: Classical vs Robust ────────────────────────────────────
comp = pd.DataFrame({
    'Coefficient'  : res_nonrob.params,
    'SE Classical' : res_nonrob.std_errors,
    'SE Robust'    : res_rob.std_errors,
    'Ratio Rob/Cls': res_rob.std_errors / res_nonrob.std_errors,
    't Classical'  : res_nonrob.params / res_nonrob.std_errors,
    't Robust'     : res_rob.params    / res_rob.std_errors,
})

print('Classical vs Robust Standard Errors — Credit Approval Logit')
print('=' * 80)
print(comp.round(4).to_string())
print()
print('Interpretation:')
print('  Ratio > 1: Robust SE > Classical SE (classical underestimates uncertainty)')
print('  Ratio < 1: Classical SE > Robust SE (rare; occurs with highly efficient models)')
print('  Large ratio: Signals model misspecification or heteroskedasticity')

In [None]:
# ── Bar chart: Classical vs Robust SEs ───────────────────────────────────────
vars_plot = [v for v in res_nonrob.params.index if v != 'Intercept']
x_pos  = np.arange(len(vars_plot))
width  = 0.35

fig, ax = plt.subplots(figsize=(12, 5))
ax.bar(x_pos - width/2, res_nonrob.std_errors[vars_plot], width,
       label='Classical', color='steelblue', edgecolor='black', alpha=0.8)
ax.bar(x_pos + width/2, res_rob.std_errors[vars_plot], width,
       label='Robust', color='darkorange', edgecolor='black', alpha=0.8)

ax.set_xlabel('Variable', fontsize=12, fontweight='bold')
ax.set_ylabel('Standard Error', fontsize=12, fontweight='bold')
ax.set_title('Classical vs Robust Standard Errors — Logit Model (Credit Approval)',
             fontsize=13, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(vars_plot, rotation=30, ha='right')
ax.legend(fontsize=11)
ax.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig(FIG_PATH + 'classical_vs_robust_se.png', dpi=300, bbox_inches='tight')
plt.show()

### 4.3 Sandwich vs Classical: When Does It Matter?

The ratio $SE_{robust} / SE_{classical}$ is informative:

| Ratio | Interpretation | Action |
|-------|----------------|--------|
| ≈ 1.0 | Model well-specified | Both valid; classical slightly more efficient |
| 1.1–1.5 | Mild misspecification | Use robust |
| > 1.5 | Notable misspecification | Use robust; consider respecifying |
| < 1.0 | Unusual (may indicate small-sample issues) | Inspect carefully |

> **Rule of thumb**: Always use `cov_type='robust'` or better. Classical SEs are only presented for educational comparison.

---

<a id='cluster'></a>
## 5. Cluster-Robust MLE for Panel Data

### 5.1 The Problem: Correlated Observations in Panels

In the health insurance panel, the same individual is observed 5 times. Their choices are correlated across years — robust SEs that assume independence are still invalid!

**Solution**: Cluster-robust sandwich estimator:

$$
\widehat{\mathrm{Var}}_{\mathrm{cluster}}(\hat{\beta}) = H^{-1}\!\left[\sum_{g=1}^{G} S_g S_g'\right]H^{-1}
$$

where $S_g = \displaystyle\sum_{i \in g} s_i$ is the **sum of scores within cluster** $g$.

This allows arbitrary correlation within clusters while maintaining independence across clusters.

**Degrees-of-freedom correction** (always applied in PanelBox):

$$
\text{adj} = \frac{G}{G-1} \times \frac{N-1}{N-K}
$$

### 5.2 Estimation on the Panel

In [None]:
# ── Logit on panel data: health insurance choice ──────────────────────────────
formula_panel = 'high_coverage ~ age + income + family_size + health_status'

logit_nonrob_p = PooledLogit(formula_panel, health, 'person_id', 'year')
logit_rob_p    = PooledLogit(formula_panel, health, 'person_id', 'year')
logit_clust_p  = PooledLogit(formula_panel, health, 'person_id', 'year')

res_nonrob_p = logit_nonrob_p.fit(cov_type='nonrobust')
res_rob_p    = logit_rob_p.fit(cov_type='robust')
res_clust_p  = logit_clust_p.fit(cov_type='cluster')  # cluster by entity (default)

print('Cluster-Robust Results — Health Insurance Panel')
print(res_clust_p.summary())

In [None]:
# ── Three-way comparison table ────────────────────────────────────────────────
panel_vars = [v for v in res_clust_p.params.index if v != 'Intercept']

comp_panel = pd.DataFrame({
    'Coef.'      : res_clust_p.params[panel_vars],
    'SE Classic' : res_nonrob_p.std_errors[panel_vars],
    'SE Robust'  : res_rob_p.std_errors[panel_vars],
    'SE Cluster' : res_clust_p.std_errors[panel_vars],
    'Clust/Class': res_clust_p.std_errors[panel_vars] / res_nonrob_p.std_errors[panel_vars],
})

G = health['person_id'].nunique()
print(f'Health Insurance Panel — G = {G} clusters (persons)')
print('=' * 72)
print(comp_panel.round(4).to_string())
print()
print('Observation: Clustered SEs are substantially larger than both classical')
print('and robust SEs, reflecting within-person correlation across years.')

In [None]:
# ── Cluster diagnostics ───────────────────────────────────────────────────────
cluster_sizes = health.groupby('person_id').size()

print('Cluster (person) diagnostics:')
print(f'  Number of clusters (G) : {G}')
print(f'  Min cluster size       : {cluster_sizes.min()}')
print(f'  Mean cluster size      : {cluster_sizes.mean():.1f}')
print(f'  Max cluster size       : {cluster_sizes.max()}')
print(f'  Balanced               : {cluster_sizes.nunique() == 1}')
print()

if G < 20:
    print('WARNING: Too few clusters. Consider cluster bootstrap.')
elif G < 50:
    print('CAUTION: Modest number of clusters. Verify robustness.')
else:
    print('Sufficient clusters for reliable asymptotic inference.')

# Visualise SE comparison for panel
fig, ax = plt.subplots(figsize=(12, 5))
x_pos = np.arange(len(panel_vars))
w = 0.25

ax.bar(x_pos - w, res_nonrob_p.std_errors[panel_vars], w,
       label='Classical', color='steelblue', edgecolor='black', alpha=0.8)
ax.bar(x_pos,     res_rob_p.std_errors[panel_vars], w,
       label='Robust', color='darkorange', edgecolor='black', alpha=0.8)
ax.bar(x_pos + w, res_clust_p.std_errors[panel_vars], w,
       label='Clustered (by person)', color='green', edgecolor='black', alpha=0.8)

ax.set_xlabel('Variable', fontsize=12, fontweight='bold')
ax.set_ylabel('Standard Error', fontsize=12, fontweight='bold')
ax.set_title('SE Comparison: Classical / Robust / Clustered\nHealth Insurance Panel Logit',
             fontsize=12, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(panel_vars, rotation=30, ha='right')
ax.legend(fontsize=10)
ax.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig(FIG_PATH + 'cluster_se_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

---

<a id='delta'></a>
## 6. Delta Method for Nonlinear Transformations

### 6.1 The Problem

We often report transformations of MLE coefficients:
- **Odds ratios**: $OR = \exp(\beta)$
- **Marginal effects**: $ME_j = \frac{1}{n}\sum_i \Lambda'(x_i'\beta)\beta_j$
- **Elasticities**: $\varepsilon = \beta_j \cdot \bar{x}_j / \bar{P}$

**Challenge**: $\mathrm{Var}(\exp(\hat\beta)) \neq \exp(\mathrm{Var}(\hat\beta))$

### 6.2 Delta Method Theory

For a smooth transformation $g: \mathbb{R}^k \to \mathbb{R}^m$, the **first-order Taylor approximation** gives:

$$
g(\hat{\beta}) \approx g(\beta_0) + J(\beta_0)(\hat{\beta} - \beta_0)
$$

where $J(\beta) = \nabla_\beta g(\beta)$ is the $m \times k$ **Jacobian**. Therefore:

$$
\boxed{\widehat{\mathrm{Var}}\bigl(g(\hat{\beta})\bigr) \approx J(\hat{\beta})\; \widehat{\mathrm{Var}}(\hat{\beta})\; J(\hat{\beta})'}
$$

### 6.3 Application: Odds Ratios in Logit

**Odds ratio**: $OR_j = \exp(\hat\beta_j)$

**Jacobian** (element-wise): $\partial OR_j / \partial \beta_j = \exp(\beta_j) = OR_j$

**SE of OR**: $SE(OR_j) = OR_j \cdot SE(\hat\beta_j)$

In [None]:
# ── Delta method: Odds Ratios ─────────────────────────────────────────────────
# Use the robust result on credit data
beta_hat = res_rob.params.values
vcov_rob = res_rob.cov_params.values  # k×k covariance matrix
var_names = res_rob.params.index.tolist()

# Transformation g(β) = exp(β)  [element-wise]
def odds_ratio_transform(beta):
    return np.exp(beta)

# PanelBox delta_method: returns covariance matrix of g(β̂)
vcov_or = delta_method(vcov_rob, odds_ratio_transform, beta_hat)
se_or   = np.sqrt(np.diag(vcov_or))
or_vals = np.exp(beta_hat)

# 95% CI for OR (log-scale is more accurate, then exponentiate)
ci_or_lower = np.exp(beta_hat - 1.96 * res_rob.std_errors.values)
ci_or_upper = np.exp(beta_hat + 1.96 * res_rob.std_errors.values)

or_table = pd.DataFrame({
    'Coefficient': beta_hat,
    'Odds Ratio' : or_vals,
    'OR SE (delta)': se_or,
    'OR 95% CI lower': ci_or_lower,
    'OR 95% CI upper': ci_or_upper,
}, index=var_names)

print('Odds Ratios with Delta Method Standard Errors')
print('(Based on robust / sandwich covariance matrix)')
print('=' * 80)
print(or_table.round(4).to_string())
print()
print('Interpretation:')
print('  OR > 1: Factor increases odds of approval')
print('  OR < 1: Factor decreases odds of approval')
print('  OR = 1: No effect on approval odds')

In [None]:
# ── Forest plot: Odds Ratios with 95% CIs ────────────────────────────────────
vars_or = [v for v in var_names if v != 'Intercept']
idx_or  = [var_names.index(v) for v in vars_or]

or_vals_plot   = or_vals[idx_or]
ci_lower_plot  = ci_or_lower[idx_or]
ci_upper_plot  = ci_or_upper[idx_or]

fig, ax = plt.subplots(figsize=(10, 5))
y_pos = np.arange(len(vars_or))

ax.scatter(or_vals_plot, y_pos, color='steelblue', s=100, zorder=5)
ax.hlines(y_pos, ci_lower_plot, ci_upper_plot, color='steelblue', linewidth=2)
ax.axvline(1, color='red', linestyle='--', linewidth=2, label='OR = 1 (no effect)')

ax.set_yticks(y_pos)
ax.set_yticklabels(vars_or, fontsize=11)
ax.set_xlabel('Odds Ratio (95% CI)', fontsize=12, fontweight='bold')
ax.set_title('Odds Ratios — Credit Approval Logit\n(Robust SEs, Delta Method CIs)',
             fontsize=12, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig(FIG_PATH + 'odds_ratios_forest.png', dpi=300, bbox_inches='tight')
plt.show()

### 6.4 Application: Average Marginal Effects

**Marginal effect** of variable $j$ for observation $i$:

$$
ME_{ij} = \Lambda'(x_i'\hat\beta) \cdot \hat\beta_j = \Lambda(x_i'\hat\beta)\bigl[1-\Lambda(x_i'\hat\beta)\bigr] \hat\beta_j
$$

**Average Marginal Effect (AME)** — average over all observations:

$$
\widehat{AME}_j = \frac{1}{n} \sum_{i=1}^n ME_{ij}
$$

Standard errors via **delta method** account for estimation uncertainty in $\hat\beta$.

In [None]:
# ── Average Marginal Effects ──────────────────────────────────────────────────
ame_result = compute_ame(res_rob)

print('Average Marginal Effects — Credit Approval Logit (Robust SEs)')
ame_df = ame_result.summary()
print(ame_df.round(5).to_string())

In [None]:
# ── AME horizontal bar chart with 95% CIs ────────────────────────────────────
me_vals  = ame_result.marginal_effects
me_se    = ame_result.std_errors
me_ci    = ame_result.conf_int()

# Exclude intercept if present
me_vars  = [v for v in me_vals.index if v.lower() != 'intercept']
me_vals  = me_vals[me_vars]
me_se    = me_se[me_vars]
me_ci    = me_ci.loc[me_vars]

fig, ax = plt.subplots(figsize=(10, 5))
y_pos = np.arange(len(me_vars))
colors = ['steelblue' if v > 0 else 'darkorange' for v in me_vals]

ax.barh(y_pos, me_vals, xerr=1.96 * me_se, color=colors,
        edgecolor='black', alpha=0.8, capsize=5)
ax.axvline(0, color='red', linestyle='--', linewidth=2)
ax.set_yticks(y_pos)
ax.set_yticklabels(me_vars, fontsize=11)
ax.set_xlabel('Average Marginal Effect on P(Approved)', fontsize=12, fontweight='bold')
ax.set_title('Average Marginal Effects with 95% Confidence Intervals\n(Delta Method, Robust SEs)',
             fontsize=12, fontweight='bold')
ax.grid(alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig(FIG_PATH + 'ame_barplot.png', dpi=300, bbox_inches='tight')
plt.show()

print('Interpretation:')
print('  AME = change in P(approved) for a 1-unit increase in the variable')
print('  Averaged over all 5000 observations in the sample')

---

<a id='bootstrap'></a>
## 7. Bootstrap for MLE

### 7.1 When Bootstrap Is Necessary

| Situation | Recommended approach |
|-----------|---------------------|
| Asymmetric sampling distribution (OR, RR) | Percentile/BCa bootstrap |
| Small samples (n < 200) | Bootstrap |
| Complex transformations | Bootstrap |
| Few clusters (G < 20) | Cluster bootstrap |
| Large n, simple transformation | Delta method |

### 7.2 Nonparametric Bootstrap Algorithm

1. Draw $n$ observations **with replacement** from the sample
2. Re-estimate the model: obtain $\hat\beta^{*(b)}$
3. Repeat $B$ times (typically $B = 999$ or $1999$)
4. Use the empirical distribution of $\{\hat\beta^{*(b)}\}_{b=1}^B$:
   - $SE_{boot} = \mathrm{sd}(\hat\beta^{*(b)})$
   - Percentile CI: $[q_{2.5}, q_{97.5}]$ of bootstrap distribution

In [None]:
# ── Bootstrap SEs for Logit — Credit Approval ────────────────────────────────
import patsy

y_boot_cr, X_boot_cr = patsy.dmatrices(
    'approved ~ income + age + debt_ratio + credit_score + employment_length',
    credit, return_type='matrix'
)
y_boot_cr = np.asarray(y_boot_cr).ravel()
X_boot_cr = np.asarray(X_boot_cr)

def estimate_logit(y, X):
    """Logit MLE: returns parameter vector."""
    def neg_ll(beta):
        eta = X @ beta
        # Numerically stable log-likelihood
        return -np.sum(y * eta - np.log1p(np.exp(np.clip(eta, -500, 500))))
    result = minimize(neg_ll, np.zeros(X.shape[1]), method='BFGS',
                      options={'maxiter': 1000, 'disp': False})
    return result.x

# Bootstrap (B=499 for speed; use B=999 in practice)
boot_result = bootstrap_mle(
    estimate_logit, y_boot_cr, X_boot_cr,
    n_bootstrap=499, seed=42
)

var_names_boot = res_rob.params.index.tolist()

print('Bootstrap Standard Errors (B=499)')
print(f'{"Variable":<22} {"SE Classical":>13} {"SE Robust":>10} {"SE Bootstrap":>13}')
print('-' * 60)
for i, v in enumerate(var_names_boot):
    print(f'{v:<22} {res_nonrob.std_errors.iloc[i]:>13.5f}'
          f' {res_rob.std_errors.iloc[i]:>10.5f}'
          f' {boot_result.std_errors[i]:>13.5f}')

### 7.3 Bootstrap Confidence Intervals

Three CI methods, each with different properties:

| Method | Formula | When to use |
|--------|---------|-------------|
| **Normal** | $\hat\beta \pm 1.96 \cdot SE_{boot}$ | Symmetric distribution |
| **Percentile** | $[q_{2.5}, q_{97.5}]$ of $\hat\beta^*$ | Asymmetric distribution |
| **BCa** | Adjusted percentile (bias+acceleration) | Most accurate, asymmetric |

For **odds ratios** (always positive, skewed), percentile or BCa CIs are preferred.

In [None]:
# ── Reconstruct bootstrap distribution and compute CIs ────────────────────────
# Re-run bootstrap to store individual estimates
np.random.seed(42)
n_obs_cr = len(y_boot_cr)
B = 499
boot_estimates = []

for _ in range(B):
    idx = np.random.choice(n_obs_cr, size=n_obs_cr, replace=True)
    try:
        b_est = estimate_logit(y_boot_cr[idx], X_boot_cr[idx])
        boot_estimates.append(b_est)
    except Exception:
        pass

boot_arr = np.array(boot_estimates)  # shape (B_valid, k)
beta_orig = res_rob.params.values
boot_se   = boot_arr.std(axis=0)

# ── Confidence intervals ──────────────────────────────────────────────────────
alpha = 0.05

# Normal CI
ci_normal_lo = beta_orig - 1.96 * boot_se
ci_normal_hi = beta_orig + 1.96 * boot_se

# Percentile CI
ci_pct_lo = np.percentile(boot_arr, 100 * alpha/2, axis=0)
ci_pct_hi = np.percentile(boot_arr, 100 * (1 - alpha/2), axis=0)

# BCa CI (simplified: percentile with bias-correction only)
# Bias correction z0
z0 = stats.norm.ppf((boot_arr < beta_orig).mean(axis=0).clip(1e-6, 1-1e-6))
z_alpha    = stats.norm.ppf(alpha / 2)
z_1malpha  = stats.norm.ppf(1 - alpha / 2)
p_lo = stats.norm.cdf(2*z0 + z_alpha)
p_hi = stats.norm.cdf(2*z0 + z_1malpha)
ci_bca_lo  = np.array([np.percentile(boot_arr[:, j], 100*p_lo[j])
                        for j in range(len(beta_orig))])
ci_bca_hi  = np.array([np.percentile(boot_arr[:, j], 100*p_hi[j])
                        for j in range(len(beta_orig))])

print('95% Bootstrap Confidence Intervals for Logit Coefficients')
print(f'{"Variable":<22} {"Normal CI":>22} {"Percentile CI":>22} {"BCa CI":>22}')
print('-' * 90)
for i, v in enumerate(var_names_boot):
    print(f'{v:<22} '
          f'[{ci_normal_lo[i]:7.4f}, {ci_normal_hi[i]:7.4f}]  '
          f'[{ci_pct_lo[i]:7.4f}, {ci_pct_hi[i]:7.4f}]  '
          f'[{ci_bca_lo[i]:7.4f}, {ci_bca_hi[i]:7.4f}]')

In [None]:
# ── Bootstrap distribution: histogram + Q-Q plot for income ──────────────────
# Select a coefficient of interest (income)
try:
    param_idx = var_names_boot.index('income')
except ValueError:
    param_idx = 1

boot_param = boot_arr[:, param_idx]
orig_val   = beta_orig[param_idx]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
axes[0].hist(boot_param, bins=40, density=True, alpha=0.7,
             edgecolor='black', color='steelblue', label='Bootstrap dist.')
axes[0].axvline(orig_val, color='red', linewidth=2.5, linestyle='--', label='Original estimate')
axes[0].axvline(ci_pct_lo[param_idx], color='green', linewidth=2, linestyle=':',
                label='Percentile 95% CI')
axes[0].axvline(ci_pct_hi[param_idx], color='green', linewidth=2, linestyle=':')
axes[0].set_xlabel(f'Bootstrap $\\hat{{\\beta}}_{{\\mathrm{{{var_names_boot[param_idx]}}}}}$',
                   fontsize=12)
axes[0].set_ylabel('Density', fontsize=12)
axes[0].set_title(f'Bootstrap Distribution: {var_names_boot[param_idx]}', fontsize=12, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(alpha=0.3)

# Q-Q plot
stats.probplot(boot_param, dist='norm', plot=axes[1])
axes[1].set_title('Q-Q Plot: Normality of Bootstrap Distribution', fontsize=12, fontweight='bold')
axes[1].grid(alpha=0.3)

plt.suptitle('Bootstrap Inference Diagnostics', fontsize=13, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig(FIG_PATH + 'bootstrap_distribution.png', dpi=300, bbox_inches='tight')
plt.show()

skewness = stats.skew(boot_param)
print(f'Skewness of bootstrap distribution: {skewness:.3f}')
if abs(skewness) > 0.3:
    print('  Substantial skewness detected: use Percentile or BCa CIs (not Normal CI)')
else:
    print('  Distribution is approximately symmetric: Normal CI is acceptable')

### 7.4 Cluster Bootstrap for Panel Data

**Algorithm**: Instead of resampling observations, resample **entire clusters** (persons). This preserves within-person dependence.

Particularly useful when:
- $G < 20$ clusters (analytical cluster SEs unreliable)
- Cluster size is highly unbalanced

In [None]:
# ── Cluster bootstrap — health insurance panel ────────────────────────────────
y_h, X_h = patsy.dmatrices(
    'high_coverage ~ age + income + family_size + health_status',
    health, return_type='matrix'
)
y_h = np.asarray(y_h).ravel()
X_h = np.asarray(X_h)
cluster_ids = health['person_id'].values

# Cluster bootstrap (B=199 for speed)
boot_cluster_result = bootstrap_mle(
    estimate_logit, y_h, X_h,
    n_bootstrap=199,
    cluster_ids=cluster_ids,
    seed=42
)

var_names_panel = res_clust_p.params.index.tolist()

print('Cluster Bootstrap vs Cluster-Robust SEs — Health Panel')
print(f'{"Variable":<22} {"Cluster-Robust SE":>18} {"Cluster Bootstrap SE":>22}')
print('-' * 65)
for i, v in enumerate(var_names_panel):
    print(f'{v:<22} {res_clust_p.std_errors.iloc[i]:>18.5f}'
          f' {boot_cluster_result.std_errors[i]:>22.5f}')

print()
print('Cluster bootstrap is the gold standard when G < 20.')
print(f'Here G = {G}, so both methods should agree closely.')

---

<a id='comparison'></a>
## 8. Comparison and Best Practices

### 8.1 Summary Table

| Method | Variance Formula | Assumptions | Pros | Cons | When to use |
|--------|-----------------|-------------|------|------|-------------|
| **Classical** | $[-H]^{-1}$ | Correct spec., independence | Efficient if valid | Biased under misspec. | Never (educational only) |
| **Robust (sandwich)** | $H^{-1}SH^{-1}$ | Independence | Valid under misspec. | Assumes indep. | Cross-section |
| **Cluster-robust** | $H^{-1}[\sum_g S_g S_g']H^{-1}$ | Indep. clusters | Handles within-cluster corr. | Need $G\geq 20$ | Panel data |
| **Bootstrap** | $\mathrm{Var}(\hat\beta^*)$ | Minimal | Works w/ few clusters, skewed dist. | Computationally costly | Small $n$, $G < 20$, transformations |

### 8.2 Decision Tree

```
Nonlinear model (Logit / Probit / Tobit / Count)
         │
         ├── Cross-sectional data
         │       └── Robust (sandwich) SEs
         │
         └── Panel data
                 ├── G >= 20 clusters → Cluster-robust SEs
                 └── G <  20 clusters → Cluster bootstrap

Reporting transformations (OR, ME, elasticity)?
         ├── Large n, symmetric → Delta method
         └── Small n, or skewed → Bootstrap CIs (percentile/BCa)
```

In [None]:
# ── Final visual: all SE methods side by side ─────────────────────────────────
# Focus on panel model where we have all four
panel_vars_plot = [v for v in var_names_panel if v.lower() != 'intercept']
idx_plot = [var_names_panel.index(v) for v in panel_vars_plot]

se_c  = res_nonrob_p.std_errors[panel_vars_plot].values
se_r  = res_rob_p.std_errors[panel_vars_plot].values
se_cl = res_clust_p.std_errors[panel_vars_plot].values
se_b  = boot_cluster_result.std_errors[idx_plot]

x_pos = np.arange(len(panel_vars_plot))
w = 0.2

fig, ax = plt.subplots(figsize=(13, 5))
ax.bar(x_pos - 1.5*w, se_c,  w, label='Classical', color='steelblue',   edgecolor='black', alpha=0.85)
ax.bar(x_pos - 0.5*w, se_r,  w, label='Robust',    color='darkorange',  edgecolor='black', alpha=0.85)
ax.bar(x_pos + 0.5*w, se_cl, w, label='Clustered', color='green',       edgecolor='black', alpha=0.85)
ax.bar(x_pos + 1.5*w, se_b,  w, label='Cluster Bootstrap', color='purple', edgecolor='black', alpha=0.85)

ax.set_xlabel('Variable', fontsize=12, fontweight='bold')
ax.set_ylabel('Standard Error', fontsize=12, fontweight='bold')
ax.set_title('All SE Methods — Health Insurance Panel Logit\n'
             'Classical / Robust / Cluster-Robust / Cluster Bootstrap',
             fontsize=12, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(panel_vars_plot, rotation=30, ha='right')
ax.legend(fontsize=10)
ax.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig(FIG_PATH + 'all_se_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print('Summary:')
print('  Classical < Robust < Clustered (expected hierarchy in panel data)')
print('  Cluster Bootstrap serves as a cross-check on cluster-robust SEs')

### 8.3 Reporting Standards

**In papers**:
- "Standard errors are robust (Huber-White sandwich)" — for cross-section
- "Standard errors are clustered by *person* (1,000 clusters)" — for panel
- "95% confidence intervals based on 999 cluster bootstrap replications" — for few clusters

**For transformations**:
- "Marginal effects computed via delta method with robust standard errors"
- "Odds ratios with 95% CIs based on log-scale robust SEs"

**Always include**: (1) method name, (2) number of clusters if applicable, (3) number of bootstrap replications.

---

<a id='exercises'></a>
## 9. Exercises

### Exercise 1: Classical vs Robust SEs — Probit Model (Easy)

**Task**:
1. Estimate a **Probit** model (not Logit) on `credit_approval.csv` using `PooledProbit`
2. Obtain classical and robust SEs
3. Identify which coefficient shows the largest ratio $SE_{robust}/SE_{classical}$
4. Interpret: What does a large ratio suggest about that variable?

**Hint**: Use `from panelbox.models.discrete import PooledProbit`

In [None]:
# Exercise 1: Classical vs Robust SEs with Probit
# ── YOUR CODE HERE ──────────────────────────────────────────────────────────

# Step 1: Import PooledProbit and fit the model
from panelbox.models.discrete import PooledProbit
# probit_classic = PooledProbit(...)
# res_probit_classic = probit_classic.fit(cov_type='nonrobust')

# Step 2: Fit with robust SEs
# probit_robust = PooledProbit(...)
# res_probit_robust = probit_robust.fit(cov_type='robust')

# Step 3: Build comparison table and find largest ratio
# ...

# Step 4: Print interpretation
# ───────────────────────────────────────────────────────────────────────────
print('Exercise 1 — complete the code above.')

### Exercise 2: Delta Method for Income Elasticity (Moderate)

**Task**: Compute the **income elasticity** of the probability of approval.

**Definition**:
$$
\varepsilon_{income} = \hat\beta_{income} \cdot \frac{\bar{x}_{income}}{\bar{P}}
$$

where $\bar P = \frac{1}{n}\sum_i \hat P_i$ is the average predicted probability.

**Steps**:
1. Estimate Logit on `credit_approval.csv` with robust SEs (already done: `res_rob`)
2. Define the elasticity as a function of `beta`
3. Use `delta_method` from `panelbox.standard_errors.mle` to get SE
4. Construct 95% CI
5. Interpret: A 1% increase in income → how many % change in approval probability?

In [None]:
# Exercise 2: Income elasticity via delta method
# ── YOUR CODE HERE ──────────────────────────────────────────────────────────

# Available objects: res_rob, X_cr, y_cr, var_names_boot

# Step 1: Compute mean income and mean predicted probability
# income_idx = var_names_boot.index('income')
# x_bar_income = credit['income'].mean()
# p_bar = expit(X_cr @ res_rob.params.values).mean()

# Step 2: Define elasticity as function of beta
# def income_elasticity(beta):
#     beta_income = beta[income_idx]
#     # ε = β_income * x̄_income / P̄
#     return np.array([beta_income * x_bar_income / p_bar])

# Step 3: Apply delta method
# vcov_elast = delta_method(res_rob.cov_params.values, income_elasticity, res_rob.params.values)
# se_elast   = np.sqrt(vcov_elast[0, 0])
# elast_val  = income_elasticity(res_rob.params.values)[0]

# Step 4: 95% CI
# ci_lo = elast_val - 1.96 * se_elast
# ci_hi = elast_val + 1.96 * se_elast

# Step 5: Print interpretation
# ───────────────────────────────────────────────────────────────────────────
print('Exercise 2 — complete the code above.')

### Exercise 3: Bootstrap vs Cluster-Robust with Few Clusters (Challenging)

**Task**: Demonstrate when bootstrap is superior to cluster-robust SEs.

**Steps**:
1. Simulate a panel: $G = 10$ clusters, $T = 50$ time periods, binary outcome
   - Within-cluster correlation $\rho = 0.4$ (strong clustering)
2. Estimate Logit with (a) cluster-robust SEs and (b) cluster bootstrap ($B = 999$)
3. Run Monte Carlo (200 replications):
   - Check empirical coverage of 95% CI for $\beta_{x}$ under each method
   - Nominal coverage = 95%
4. Compare: Which method is closer to 95%?
5. Write a short conclusion (3-4 sentences)

**Deliverable**: Coverage table and written conclusion.

In [None]:
# Exercise 3: Few clusters — bootstrap vs cluster-robust
# ── YOUR CODE HERE ──────────────────────────────────────────────────────────

# Step 1: Simulate panel with G=10, T=50, rho=0.4
# G, T_obs, rho = 10, 50, 0.4
# beta_true = np.array([0.0, 1.0])  # intercept, slope
# ...

# Step 2: Estimation functions for cluster-robust and cluster-bootstrap SEs
# ...

# Step 3: Monte Carlo loop (200 replications)
# coverage_clustered  = []
# coverage_bootstrap  = []
# for rep in range(200):
#     # generate data
#     # fit model
#     # compute CIs
#     # check coverage

# Step 4: Report results
# print(f'Cluster-robust 95% CI coverage: {np.mean(coverage_clustered):.3f}')
# print(f'Cluster bootstrap 95% CI coverage: {np.mean(coverage_bootstrap):.3f}')

# Step 5: Written conclusion
# ───────────────────────────────────────────────────────────────────────────
print('Exercise 3 — complete the code above.')
print()
print('CONCLUSION:')
print('[Your 3-4 sentence conclusion here]')

---

<a id='summary'></a>
## 10. Summary and Key Takeaways

### What We Learned

1. **Classical MLE SEs** are based on the inverted Hessian $[-H]^{-1}$ and valid only under correct specification
2. **Robust (sandwich) SEs** use $H^{-1}SH^{-1}$ and are valid even under misspecification
3. **Cluster-robust SEs** aggregate scores within clusters to handle within-cluster correlation (essential for panels)
4. **Delta method** propagates parameter uncertainty to nonlinear transformations (OR, AME, elasticities)
5. **Bootstrap** is flexible: works with few clusters, non-normal sampling distributions, and complex transformations
6. **Never rely on classical SEs** in applied work — use robust or better

### Key Formulas

**Classical**: $\widehat{\mathrm{Var}}_{\mathrm{cls}} = [-H(\hat\beta)]^{-1}$

**Sandwich**: $\widehat{\mathrm{Var}}_{\mathrm{rob}} = H^{-1}\, S\, H^{-1}, \quad S = \textstyle\sum_i s_i s_i'$

**Cluster**: $\widehat{\mathrm{Var}}_{\mathrm{cl}} = H^{-1}\left[\sum_g S_g S_g'\right]H^{-1}, \quad S_g = \textstyle\sum_{i\in g} s_i$

**Delta method**: $\widehat{\mathrm{Var}}\bigl(g(\hat\beta)\bigr) \approx J\, \widehat{\mathrm{Var}}(\hat\beta)\, J'$

### PanelBox Cheat Sheet

```python
from panelbox.models.discrete import PooledLogit
from panelbox.marginal_effects.discrete_me import compute_ame
from panelbox.standard_errors.mle import delta_method, bootstrap_mle

# Estimation
model = PooledLogit(formula, data, entity_col, time_col)
res_robust  = model.fit(cov_type='robust')   # cross-section
res_cluster = model.fit(cov_type='cluster')  # panel (cluster by entity)

# Marginal effects
ame = compute_ame(res_robust)
ame.summary()

# Delta method
vcov_or = delta_method(res_robust.cov_params.values, np.exp, res_robust.params.values)

# Bootstrap
boot = bootstrap_mle(estimate_func, y, X, n_bootstrap=999,
                     cluster_ids=entity_ids, seed=42)
```

### Connection to Next Tutorial

**Tutorial 06**: Bootstrap for Quantile Regression
- Quantile regression has no closed-form asymptotic variance
- Bootstrap is the primary inference method
- Extends ideas from this notebook to distributional effects

---

<a id='references'></a>
## 11. References

### Foundational Papers

1. **White, H. (1982)**. "Maximum likelihood estimation of misspecified models." *Econometrica*, 50(1), 1–25.  
   *The original paper establishing the sandwich (robust) covariance matrix for MLE.*

2. **Huber, P. J. (1967)**. "The behavior of maximum likelihood estimates under nonstandard conditions." *Proceedings of the Fifth Berkeley Symposium*, 1, 221–233.  
   *Foundation for robustness of MLE under misspecification.*

3. **Cameron, A. C., & Miller, D. L. (2015)**. "A practitioner's guide to cluster-robust inference." *Journal of Human Resources*, 50(2), 317–372.  
   *Comprehensive review of clustering, including few-cluster problems.*

4. **Efron, B. (1987)**. "Better bootstrap confidence intervals." *Journal of the American Statistical Association*, 82(397), 171–185.  
   *Introduces BCa confidence intervals.*

### Textbooks

1. **Cameron, A. C., & Trivedi, P. K. (2005)**. *Microeconometrics: Methods and Applications*. Cambridge University Press. [Chapters 5, 10, 11]  
   *Comprehensive treatment of MLE, robust inference, and bootstrapping for microeconometrics.*

2. **Wooldridge, J. M. (2010)**. *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press. [Chapters 13, 15]  
   *Standard graduate reference for MLE in panel data contexts.*

3. **Efron, B., & Tibshirani, R. J. (1994)**. *An Introduction to the Bootstrap*. CRC Press.  
   *The definitive reference for bootstrap methods.*

### PanelBox Documentation

- MLE models: `panelbox.readthedocs.io/models/discrete.html`
- Standard errors: `panelbox.readthedocs.io/inference/standard-errors.html`
- Marginal effects: `panelbox.readthedocs.io/inference/marginal-effects.html`

---

**End of Tutorial 05**