# Tutorial 06: Dynamic Panel VAR with GMM Estimation

**Duration:** 120--150 minutes  
**Level:** Advanced  
**Prerequisites:** Panel VAR estimation (Tutorials 01--05), basic GMM concepts, matrix algebra

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Explain the **Nickell bias** problem in dynamic panel models and quantify its magnitude
2. Implement **Difference GMM** (Arellano-Bond) to eliminate fixed-effect endogeneity
3. Understand when **System GMM** (Blundell-Bond) improves upon Difference GMM
4. Perform essential **GMM diagnostics**: Hansen J-test, AR(2) test, instrument count rules
5. Address **instrument proliferation** with collapsed instruments
6. Compare **OLS vs. GMM** estimates and apply decision rules for estimator choice
7. Calculate **half-lives** of shock persistence and assess how estimator bias affects them

## Outline

1. [Nickell Bias Problem](#1-nickell-bias-problem) (25 min)
2. [Difference GMM (Arellano-Bond)](#2-difference-gmm-arellano-bond) (30 min)
3. [System GMM (Blundell-Bond)](#3-system-gmm-blundell-bond) (25 min)
4. [GMM Diagnostics](#4-gmm-diagnostics) (30 min)
5. [Instrument Collapse](#5-instrument-collapse) (20 min)
6. [OLS vs GMM Comparison](#6-ols-vs-gmm-comparison) (15 min)
7. [Application: Shock Persistence](#7-application-shock-persistence) (15 min)
8. [Summary](#8-summary)
9. [Exercises](#9-exercises)

In [None]:
# ============================================================
# Setup
# ============================================================
import sys
import os
import warnings
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from scipy import stats

%matplotlib inline

# Reproducibility
np.random.seed(42)

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Add project root and utilities to path
project_root = Path('../../../').resolve()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))
sys.path.insert(0, str(Path('../utils').resolve()))

# PanelBox imports
from panelbox.var import PanelVARData, PanelVAR

# Tutorial utilities
from data_generators import generate_dynamic_panel, generate_macro_panel
from var_simulation import simulate_panel_var
from visualization_helpers import set_academic_style

# Apply academic style
set_academic_style()

# Output directory
os.makedirs('../outputs/figures/gmm', exist_ok=True)
os.makedirs('../outputs/tables', exist_ok=True)

# Track GMM availability
GMM_AVAILABLE = False
try:
    from panelbox.var.gmm import estimate_panel_var_gmm
    from panelbox.var.instruments import build_gmm_instruments
    GMM_AVAILABLE = True
    print('panelbox.var.gmm module: available')
except ImportError:
    print('panelbox.var.gmm module: NOT available (will use manual implementations)')

print('Setup complete.')
print(f'NumPy: {np.__version__}')
print(f'Pandas: {pd.__version__}')

---

## 1. Nickell Bias Problem

### The Core Issue

Consider the simplest dynamic panel model:

$$Y_{it} = \alpha_i + \rho \cdot Y_{i,t-1} + \varepsilon_{it}$$

where:
- $\alpha_i$ is an entity-specific fixed effect
- $\rho$ is the autoregressive parameter (persistence)
- $\varepsilon_{it}$ is an i.i.d. error term

**The problem:** The fixed effect $\alpha_i$ is correlated with $Y_{i,t-1}$ by construction. Since $Y_{i,t-1}$ depends on $\alpha_i$ (through the recursive structure of the model), the regressor is endogenous.

### Why the Within Transformation Fails

The standard approach to handle fixed effects is the **within transformation** (demeaning):

$$\tilde{Y}_{it} = \tilde{\rho} \cdot \tilde{Y}_{i,t-1} + \tilde{\varepsilon}_{it}$$

where $\tilde{X}_{it} = X_{it} - \bar{X}_i$. The problem is that $\tilde{Y}_{i,t-1}$ contains $-\bar{Y}_i$, which depends on $\varepsilon_{it}$ through $\bar{Y}_i = \frac{1}{T}\sum_{s=1}^T Y_{is}$. This creates a **mechanical correlation** between the transformed regressor and the transformed error.

### Bias Magnitude

Nickell (1981) showed that the OLS bias on $\hat{\rho}$ is:

$$\text{plim}(\hat{\rho}_{FE} - \rho) \approx -\frac{1+\rho}{T-1}$$

Key implications:
- The bias is $O(1/T)$ -- severe when $T$ is small
- The bias is **always negative** (downward bias on persistence)
- For $T=5$: bias $\approx -(1+\rho)/4$, which is huge
- For $T=10$: bias $\approx -(1+\rho)/9$
- For $T=100$: bias $\approx -(1+\rho)/99$, negligible

In [None]:
# ============================================================
# Numerical Demonstration: Nickell Bias
# ============================================================

def simulate_dynamic_panel_simple(N, T, rho_true, sigma_alpha=1.0, sigma_eps=1.0, seed=42):
    """
    Simulate a simple dynamic panel: Y_it = alpha_i + rho * Y_{i,t-1} + eps_it.
    Returns a DataFrame with columns: entity, time, y.
    """
    np.random.seed(seed)
    records = []
    for i in range(N):
        alpha_i = sigma_alpha * np.random.randn()
        # Initial value drawn from stationary distribution
        y_prev = alpha_i / (1 - rho_true) + sigma_eps / np.sqrt(1 - rho_true**2) * np.random.randn()
        for t in range(T):
            eps = sigma_eps * np.random.randn()
            y_curr = alpha_i + rho_true * y_prev + eps
            records.append({'entity': i, 'time': t, 'y': y_curr})
            y_prev = y_curr
    return pd.DataFrame(records)


def estimate_fe_ols(df, entity_col='entity', time_col='time', y_col='y'):
    """
    Estimate rho from dynamic panel using within (FE) OLS.
    Returns estimated rho.
    """
    # Create lagged y within each entity
    df = df.sort_values([entity_col, time_col]).copy()
    df['y_lag'] = df.groupby(entity_col)[y_col].shift(1)
    df = df.dropna(subset=['y_lag'])

    # Within transformation (demean by entity)
    df['y_dm'] = df[y_col] - df.groupby(entity_col)[y_col].transform('mean')
    df['y_lag_dm'] = df['y_lag'] - df.groupby(entity_col)['y_lag'].transform('mean')

    # OLS on demeaned data
    x = df['y_lag_dm'].values
    y = df['y_dm'].values
    rho_hat = np.dot(x, y) / np.dot(x, x)
    return rho_hat


# True parameters
rho_true = 0.7
N = 100
T = 10

# Simulate and estimate
df_sim = simulate_dynamic_panel_simple(N=N, T=T, rho_true=rho_true)
rho_fe = estimate_fe_ols(df_sim)

# Theoretical bias
bias_theoretical = -(1 + rho_true) / (T - 1)

print('=== Nickell Bias Demonstration ===')
print(f'True rho:            {rho_true:.4f}')
print(f'FE-OLS estimate:     {rho_fe:.4f}')
print(f'Bias (actual):       {rho_fe - rho_true:.4f}')
print(f'Bias (theoretical):  {bias_theoretical:.4f}')
print(f'\nThe FE-OLS estimator is severely biased downward!')

In [None]:
# ============================================================
# Monte Carlo: Nickell Bias for Different T Values
# ============================================================

rho_true = 0.7
N = 200
T_values = [5, 10, 20, 50, 100]
n_simulations = 200

results_mc = {}

for T in T_values:
    rho_estimates = []
    for sim in range(n_simulations):
        df_mc = simulate_dynamic_panel_simple(N=N, T=T, rho_true=rho_true, seed=sim * 100 + T)
        rho_hat = estimate_fe_ols(df_mc)
        rho_estimates.append(rho_hat)
    results_mc[T] = rho_estimates

# Create summary table
mc_summary = pd.DataFrame({
    'T': T_values,
    'Mean rho_hat': [np.mean(results_mc[T]) for T in T_values],
    'Std rho_hat': [np.std(results_mc[T]) for T in T_values],
    'Mean Bias': [np.mean(results_mc[T]) - rho_true for T in T_values],
    'Theoretical Bias': [-(1 + rho_true) / (T - 1) for T in T_values],
    'RMSE': [np.sqrt(np.mean([(r - rho_true)**2 for r in results_mc[T]])) for T in T_values],
})

print('=== Monte Carlo Results: Nickell Bias ===')
print(f'True rho = {rho_true}, N = {N}, Simulations = {n_simulations}')
print()
print(mc_summary.round(4).to_string(index=False))

# Save table
mc_summary.round(4).to_csv('../outputs/tables/06_nickell_bias_monte_carlo.csv', index=False)
print('\nTable saved to ../outputs/tables/06_nickell_bias_monte_carlo.csv')

In [None]:
# ============================================================
# Visualize the Nickell Bias: Bias vs T
# ============================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left panel: bias vs T
ax = axes[0]
T_range = np.arange(3, 101)
theoretical_bias = -(1 + rho_true) / (T_range - 1)

ax.plot(T_range, theoretical_bias, 'b-', linewidth=2, label='Theoretical bias $-(1+\\rho)/(T-1)$')
ax.scatter(T_values, [np.mean(results_mc[T]) - rho_true for T in T_values],
           color='red', s=80, zorder=5, label='Monte Carlo mean bias')
ax.axhline(y=0, color='black', linewidth=0.8, linestyle='--', alpha=0.5)
ax.set_xlabel('T (time periods)', fontsize=12)
ax.set_ylabel('Bias', fontsize=12)
ax.set_title('Nickell Bias: O(1/T) Decline', fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)

# Right panel: distribution of estimates for different T
ax = axes[1]
colors = sns.color_palette('husl', len(T_values))
for T_val, color in zip(T_values, colors):
    ax.hist(results_mc[T_val], bins=30, alpha=0.4, color=color,
            label=f'T={T_val}', density=True, edgecolor='white', linewidth=0.5)
ax.axvline(x=rho_true, color='black', linewidth=2, linestyle='--', label=f'True $\\rho$ = {rho_true}')
ax.set_xlabel('$\\hat{\\rho}_{FE}$', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Distribution of FE-OLS Estimates by T', fontsize=13, fontweight='bold')
ax.legend(fontsize=9)
ax.grid(True, alpha=0.3)

fig.suptitle('Nickell (1981) Bias in Dynamic Panel Models', fontsize=14, fontweight='bold', y=1.02)
fig.tight_layout()
fig.savefig('../outputs/figures/gmm/06_nickell_bias.png', dpi=150, bbox_inches='tight')
plt.show()

print('Key insight: As T grows, the bias shrinks toward zero.')
print(f'At T=5, the bias is {-(1+rho_true)/4:.3f} -- nearly half the true value!')
print(f'At T=100, the bias is {-(1+rho_true)/99:.4f} -- negligible.')

### Key Takeaway: When Does Nickell Bias Matter?

| T (time periods) | Bias Magnitude | Practical Implication |
|---|---|---|
| T < 10 | Very large | **Must use GMM** |
| 10 < T < 20 | Moderate | GMM strongly recommended |
| 20 < T < 30 | Small but non-trivial | Compare OLS and GMM |
| T > 30 | Small | OLS usually acceptable |
| T > 100 | Negligible | OLS is fine |

Our dynamic panel dataset has **T = 15**, placing it squarely in the zone where GMM is strongly recommended.

---

## 2. Difference GMM (Arellano-Bond)

### Solution Strategy

Arellano and Bond (1991) proposed a two-step solution:

**Step 1:** Take **first differences** to eliminate the fixed effect:

$$\Delta Y_{it} = \rho \cdot \Delta Y_{i,t-1} + \Delta \varepsilon_{it}$$

Now $\alpha_i$ is gone. But there is a new problem: $\Delta Y_{i,t-1} = Y_{i,t-1} - Y_{i,t-2}$ is correlated with $\Delta \varepsilon_{it} = \varepsilon_{it} - \varepsilon_{i,t-1}$ because $Y_{i,t-1}$ depends on $\varepsilon_{i,t-1}$.

**Step 2:** Use **lagged levels** as instruments for the differenced equation:

$$E[Y_{i,t-s} \cdot \Delta\varepsilon_{it}] = 0 \quad \text{for } s \geq 2$$

This is valid because $Y_{i,t-2}$ and further lags are predetermined with respect to $\varepsilon_{it}$ and $\varepsilon_{i,t-1}$.

### Instrument Matrix

The instruments form a **block-diagonal** matrix where the number of available instruments grows with $t$:
- At $t = 3$: only $Y_{i,1}$ is available
- At $t = 4$: $Y_{i,1}, Y_{i,2}$ are available
- At $t = T$: $Y_{i,1}, \ldots, Y_{i,T-2}$ are available

In [None]:
# ============================================================
# Load the Dynamic Panel Data
# ============================================================

df_dyn = generate_dynamic_panel()

print('=== Dynamic Panel Data ===')
print(f'Shape: {df_dyn.shape}')
print(f'Columns: {list(df_dyn.columns)}')
print(f'Countries: {df_dyn["country"].nunique()}')
print(f'Years per country: {df_dyn["year"].nunique()}')
print(f'\nFirst few rows:')
df_dyn.head(10)

In [None]:
# ============================================================
# Descriptive Statistics
# ============================================================

print('=== Descriptive Statistics ===')
print(df_dyn[['y1', 'y2', 'y3']].describe().round(4).to_string())

# Correlation matrix
print('\n=== Correlation Matrix ===')
print(df_dyn[['y1', 'y2', 'y3']].corr().round(4).to_string())

In [None]:
# ============================================================
# Difference GMM Estimation
# ============================================================

# Create PanelVARData and PanelVAR objects
data_gmm = PanelVARData(
    df_dyn,
    endog_vars=['y1', 'y2'],
    entity_col='country',
    time_col='year',
    lags=2
)

model_gmm = PanelVAR(data_gmm)

print('=== PanelVARData Properties ===')
print(f'K (endogenous vars): {data_gmm.K}')
print(f'p (lags):            {data_gmm.p}')
print(f'N (entities):        {data_gmm.N}')
print(f'n_obs (total):       {data_gmm.n_obs}')

In [None]:
# ============================================================
# Estimate OLS for comparison (baseline with Nickell bias)
# ============================================================

results_ols = model_gmm.fit(method='ols', cov_type='clustered')

print('=== OLS Estimation (with Nickell Bias) ===')
print(results_ols.summary())

In [None]:
# ============================================================
# Difference GMM Estimation (try/except for API availability)
# ============================================================
# PanelVAR.fit() currently supports method='ols' only. For GMM estimation,
# we use the lower-level panelbox.var.gmm module with try/except wrapping.

coef_names = ['y1(t-1)', 'y2(t-1)', 'y1(t-2)', 'y2(t-2)']
result_diff_gmm = None

# First: try via PanelVAR.fit(method='gmm')
try:
    result_via_fit = model_gmm.fit(
        method='gmm',
        gmm_type='difference',
        max_lags_instruments=4
    )
    print('Difference GMM via PanelVAR.fit() succeeded!')
    print(result_via_fit.summary())
except Exception as e:
    print(f'PanelVAR.fit(method="gmm"): {e}')
    print('  (Expected -- PanelVAR currently supports method="ols" only)')

# Second: try via panelbox.var.gmm module directly
print()
try:
    result_diff_gmm = estimate_panel_var_gmm(
        data=df_dyn,
        var_lags=2,
        value_cols=['y1', 'y2'],
        entity_col='country',
        time_col='year',
        transform='fd',
        gmm_step='two-step',
        instrument_type='all',
        max_instruments=4,
        windmeijer_correction=True
    )

    print('=== Difference GMM Results ===')
    print(f'Coefficients shape: {result_diff_gmm.coefficients.shape}')
    print(f'Number of instruments: {result_diff_gmm.n_instruments}')
    print(f'Number of observations: {result_diff_gmm.n_obs}')
    print(f'GMM step: {result_diff_gmm.gmm_step}')
    print(f'Transform: {result_diff_gmm.transform}')
    print(f'Instrument type: {result_diff_gmm.instrument_type}')
    print(f'Windmeijer corrected: {result_diff_gmm.windmeijer_corrected}')
    print(f'\nCoefficients:')
    for eq in range(result_diff_gmm.coefficients.shape[1]):
        print(f'\n  Equation y{eq+1}:')
        for j, name in enumerate(coef_names):
            if j < result_diff_gmm.coefficients.shape[0]:
                coef = result_diff_gmm.coefficients[j, eq]
                se = result_diff_gmm.standard_errors[j, eq] if result_diff_gmm.standard_errors.ndim > 1 else result_diff_gmm.standard_errors[j]
                print(f'    {name}: {coef:+.4f} (SE: {se:.4f})')

except Exception as e:
    print(f'Difference GMM via panelbox.var.gmm: {e}')
    print()
    print('Theoretical Result:')
    print('Difference GMM removes the Nickell bias by using')
    print('lagged levels as instruments for first-differenced equations.')
    print('The estimator is consistent for N -> infinity with T fixed.')

In [None]:
# ============================================================
# Visualize: OLS vs Difference GMM Coefficients
# ============================================================

# Extract OLS coefficients
A1_ols = results_ols.A_matrices[0]  # K x K matrix for lag 1

# Extract Diff-GMM coefficients (lag 1 portion) -- with fallback
K = 2
if result_diff_gmm is not None:
    A1_diff = result_diff_gmm.coefficients[:K, :K]
else:
    # Fallback: use theoretical values adjusted for illustration
    A1_diff = A1_ols * 1.0  # placeholder
    print('Note: GMM not available; using OLS as placeholder for visualization.')

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Heatmap for OLS
ax = axes[0]
df_ols_coef = pd.DataFrame(A1_ols, index=['y1', 'y2'], columns=['y1(t-1)', 'y2(t-1)'])
sns.heatmap(df_ols_coef, annot=True, fmt='.4f', cmap='RdBu_r', center=0,
            linewidths=0.5, square=True, ax=ax, vmin=-0.6, vmax=0.6)
ax.set_title('OLS (FE) - A$_1$ Coefficients', fontsize=13, fontweight='bold')
ax.set_xlabel('Regressor (lag 1)', fontsize=11)
ax.set_ylabel('Equation', fontsize=11)

# Heatmap for Diff-GMM
ax = axes[1]
df_diff_coef = pd.DataFrame(A1_diff, index=['y1', 'y2'], columns=['y1(t-1)', 'y2(t-1)'])
sns.heatmap(df_diff_coef, annot=True, fmt='.4f', cmap='RdBu_r', center=0,
            linewidths=0.5, square=True, ax=ax, vmin=-0.6, vmax=0.6)
title_suffix = '' if result_diff_gmm is not None else ' (placeholder)'
ax.set_title(f'Difference GMM - A$_1$ Coefficients{title_suffix}', fontsize=13, fontweight='bold')
ax.set_xlabel('Regressor (lag 1)', fontsize=11)
ax.set_ylabel('Equation', fontsize=11)

fig.suptitle('OLS vs Difference GMM: Lag 1 Coefficient Comparison',
             fontsize=14, fontweight='bold', y=1.02)
fig.tight_layout()
fig.savefig('../outputs/figures/gmm/06_ols_vs_diff_gmm_coefs.png', dpi=150, bbox_inches='tight')
plt.show()

print('Interpretation:')
print('  - OLS (with FE) typically underestimates the diagonal coefficients (persistence)')
print('  - Difference GMM corrects the downward Nickell bias')
print('  - The true diagonal values of A_1 are 0.50 and 0.40 (from the DGP)')

---

## 3. System GMM (Blundell-Bond)

### Motivation: Weak Instruments in Difference GMM

Difference GMM uses lagged **levels** $Y_{i,t-s}$ as instruments for **differenced** equations. When the autoregressive parameter $\rho$ is close to 1 (near unit root), lagged levels are **weakly correlated** with first differences.

Intuitively, if $Y_{it}$ is very persistent, then:
- $\Delta Y_{it} = Y_{it} - Y_{i,t-1} \approx \varepsilon_{it}$ (nearly unpredictable)
- $Y_{i,t-2}$ is a poor predictor of $\Delta Y_{i,t-1}$

This leads to **weak instruments**, resulting in:
- Large standard errors
- Biased estimates (toward zero in finite samples)
- Poor finite-sample performance

### The System GMM Solution

Blundell and Bond (1998) proposed adding a second set of **level equations** to the system:

$$Y_{it} = \alpha_i + \rho \cdot Y_{i,t-1} + \varepsilon_{it}$$

with **lagged differences** $\Delta Y_{i,t-1}$ as instruments for the level equation.

The additional moment condition is:

$$E[\Delta Y_{i,t-1} \cdot (\alpha_i + \varepsilon_{it})] = 0$$

This requires a **stationarity** assumption: the initial deviations $Y_{i,1} - \alpha_i/(1-\rho)$ must be uncorrelated with $\alpha_i$.

### System GMM = Differenced Equations + Level Equations

| Component | Equation | Instruments |
|---|---|---|
| Differenced | $\Delta Y_{it} = \rho \cdot \Delta Y_{i,t-1} + \Delta \varepsilon_{it}$ | Lagged levels: $Y_{i,t-2}, Y_{i,t-3}, \ldots$ |
| Level | $Y_{it} = \alpha_i + \rho \cdot Y_{i,t-1} + \varepsilon_{it}$ | Lagged differences: $\Delta Y_{i,t-1}$ |

In [None]:
# ============================================================
# System GMM Estimation (try/except)
# ============================================================

# Forward Orthogonal Deviations (FOD) transform -- preferred for system GMM
result_sys_gmm = None

try:
    result_sys_gmm = estimate_panel_var_gmm(
        data=df_dyn,
        var_lags=2,
        value_cols=['y1', 'y2'],
        entity_col='country',
        time_col='year',
        transform='fod',
        gmm_step='two-step',
        instrument_type='all',
        max_instruments=3,
        windmeijer_correction=True
    )

    print('=== System GMM (FOD) Results ===')
    print(f'Coefficients shape: {result_sys_gmm.coefficients.shape}')
    print(f'Number of instruments: {result_sys_gmm.n_instruments}')
    print(f'Number of observations: {result_sys_gmm.n_obs}')
    print(f'Transform: {result_sys_gmm.transform}')
    print(f'\nCoefficients:')
    for eq in range(result_sys_gmm.coefficients.shape[1]):
        print(f'\n  Equation y{eq+1}:')
        for j, name in enumerate(coef_names):
            if j < result_sys_gmm.coefficients.shape[0]:
                coef = result_sys_gmm.coefficients[j, eq]
                se = result_sys_gmm.standard_errors[j, eq] if result_sys_gmm.standard_errors.ndim > 1 else result_sys_gmm.standard_errors[j]
                print(f'    {name}: {coef:+.4f} (SE: {se:.4f})')

except Exception as e:
    print(f'System GMM via panelbox.var.gmm: {e}')
    print()
    print('Theoretical Comparison:')
    print('-' * 60)
    print('System GMM adds level equations instrumented by lagged differences.')
    print('This provides more efficient estimates, especially when rho is high.')
    print()
    print('Expected properties:')
    print('  - Lower standard errors than Difference GMM')
    print('  - More robust to weak instruments near unit root')
    print('  - Requires mean-stationarity assumption')
    print('  - Approximately doubles the number of moment conditions')

In [None]:
# ============================================================
# Three-Way Comparison: OLS vs Diff-GMM vs Sys-GMM
# ============================================================

# True DGP values (from data_generators.py)
A1_true = np.array([
    [0.50, 0.00],
    [0.15, 0.40]
])

# Collect lag-1 coefficient matrices (with fallbacks)
K = 2
A1_ols_vals = results_ols.A_matrices[0]
A1_diff_vals = result_diff_gmm.coefficients[:K, :K] if result_diff_gmm is not None else A1_ols_vals.copy()
A1_sys_vals = result_sys_gmm.coefficients[:K, :K] if result_sys_gmm is not None else A1_ols_vals.copy()

# Build comparison table
var_pairs = [('y1', 'y1(t-1)'), ('y1', 'y2(t-1)'), ('y2', 'y1(t-1)'), ('y2', 'y2(t-1)')]
comparison_data = []
for idx, (eq, reg) in enumerate(var_pairs):
    i, j = idx // K, idx % K
    row = {
        'Equation': eq,
        'Regressor': reg,
        'True': A1_true[i, j],
        'OLS (FE)': A1_ols_vals[i, j],
    }
    if result_diff_gmm is not None:
        row['Diff-GMM'] = A1_diff_vals[i, j]
    if result_sys_gmm is not None:
        row['Sys-GMM'] = A1_sys_vals[i, j]
    comparison_data.append(row)

df_comparison = pd.DataFrame(comparison_data)
df_comparison['OLS Bias'] = df_comparison['OLS (FE)'] - df_comparison['True']
if result_diff_gmm is not None:
    df_comparison['Diff-GMM Bias'] = df_comparison['Diff-GMM'] - df_comparison['True']
if result_sys_gmm is not None:
    df_comparison['Sys-GMM Bias'] = df_comparison['Sys-GMM'] - df_comparison['True']

print('=== Estimator Comparison (Lag 1 Coefficients) ===')
print(df_comparison.round(4).to_string(index=False))

if result_diff_gmm is None and result_sys_gmm is None:
    print('\nNote: GMM estimates not available. Only OLS shown.')
    print('The OLS diagonal elements are biased downward (Nickell bias).')

# Save comparison table
df_comparison.round(4).to_csv('../outputs/tables/06_estimator_comparison.csv', index=False)
print('\nTable saved to ../outputs/tables/06_estimator_comparison.csv')

In [None]:
# ============================================================
# Visual Comparison: True vs Estimated Coefficients
# ============================================================

fig, ax = plt.subplots(figsize=(12, 6))

coef_labels = [f'{eq} <- {reg}' for eq, reg in var_pairs]
x_pos = np.arange(len(coef_labels))

# Determine how many estimators to plot
estimator_data = [
    ('True DGP', [A1_true[i//K, i%K] for i in range(K*K)], '#2ca02c', 'darkgreen'),
    ('OLS (FE)', [A1_ols_vals[i//K, i%K] for i in range(K*K)], '#d62728', 'darkred'),
]
if result_diff_gmm is not None:
    estimator_data.append(
        ('Diff-GMM', [A1_diff_vals[i//K, i%K] for i in range(K*K)], '#1f77b4', 'navy')
    )
if result_sys_gmm is not None:
    estimator_data.append(
        ('Sys-GMM', [A1_sys_vals[i//K, i%K] for i in range(K*K)], '#ff7f0e', 'darkorange')
    )

n_est = len(estimator_data)
width = 0.8 / n_est

for idx, (label, vals, color, edge) in enumerate(estimator_data):
    offset = (idx - (n_est - 1) / 2) * width
    ax.bar(x_pos + offset, vals, width, label=label, color=color, alpha=0.8, edgecolor=edge)

ax.set_xticks(x_pos)
ax.set_xticklabels(coef_labels, fontsize=11)
ax.set_ylabel('Coefficient Value', fontsize=12)
ax.set_title('Estimator Comparison: Lag 1 Coefficients',
             fontsize=14, fontweight='bold')
ax.legend(fontsize=10, loc='upper right')
ax.axhline(y=0, color='black', linewidth=0.8, linestyle='--', alpha=0.5)
ax.grid(True, alpha=0.3, axis='y')

fig.tight_layout()
fig.savefig('../outputs/figures/gmm/06_three_way_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print('Key observations:')
print('  - OLS underestimates persistence (diagonal) due to Nickell bias')
if result_diff_gmm is not None or result_sys_gmm is not None:
    print('  - GMM estimators correct the downward bias')
    print('  - Sys-GMM often has lower variance than Diff-GMM')

---

## 4. GMM Diagnostics

Proper GMM estimation requires careful diagnostic checking. Three tests are essential:

### 4.1 Hansen J-Test (Over-identifying Restrictions)

The Hansen J-test checks whether the instruments are **jointly valid** (i.e., uncorrelated with the error term):

$$H_0: E[Z_i' \varepsilon_i] = 0 \quad \text{(instruments are valid)}$$
$$J = n \cdot \bar{g}' \hat{W} \bar{g} \sim \chi^2(m - k)$$

where $m$ is the number of instruments and $k$ is the number of estimated parameters.

- **Reject** ($p < 0.05$): instruments may be invalid
- **Do not reject** ($p > 0.05$): instruments appear valid
- **Caveat:** A very high p-value ($p > 0.99$) with many instruments suggests the test has lost power

### 4.2 Arellano-Bond AR(2) Test

The validity of GMM instruments depends on the **absence of second-order serial correlation** in the differenced residuals:

- **AR(1)**: Expected to be significant (first differencing induces AR(1) mechanically)
- **AR(2)**: Should NOT be significant; rejection indicates invalid instruments

### 4.3 Instrument Count Rule

A critical rule of thumb:

$$\text{Number of instruments} \leq \text{Number of entities (N)}$$

Too many instruments:
- Weaken the Hansen J-test
- Can bias GMM toward OLS
- Create numerical instability in the weight matrix

In [None]:
# ============================================================
# GMM Diagnostics: Instrument Construction (try/except)
# ============================================================

Z_instruments = None
instrument_meta = None

try:
    # Build instruments for diagnostics demonstration
    Z_instruments, instrument_meta = build_gmm_instruments(
        data=df_dyn,
        var_lags=2,
        n_vars=2,
        entity_col='country',
        time_col='year',
        value_cols=['y1', 'y2'],
        instrument_type='all',
        max_instruments=4
    )

    print('=== Instrument Information ===')
    print(f'Instrument matrix shape: {Z_instruments.shape}')
    print(f'Number of instruments:   {Z_instruments.shape[1]}')
    print(f'Number of observations:  {Z_instruments.shape[0]}')
    print(f'\nInstrument metadata:')
    for key, val in instrument_meta.items():
        if key != 'observation_metadata':
            print(f'  {key}: {val}')

except Exception as e:
    print(f'Instrument construction: {e}')
    print()
    print('Theoretical instrument structure for VAR(2) with K=2:')
    print('  Instruments are lagged levels y_{i,t-s} for s >= p+1 = 3')
    print('  For T=15: up to 12 lag depths per variable')
    print('  Total standard instruments: K * (T-2)*(T-1)/2 = 2 * 13*14/2 = 182')
    print('  With max_instruments=4: 2 * 4 = 8 instruments')

In [None]:
# ============================================================
# Hansen J-Test Framework
# ============================================================

# The test statistic is: J = n * g_bar' * W_hat * g_bar
# where g_bar = (1/n) * sum_i Z_i' * epsilon_hat_i
# Under H0: J ~ chi^2(n_instruments - n_params)

if Z_instruments is not None:
    n_instruments = Z_instruments.shape[1]
else:
    n_instruments = 8  # theoretical value with max_instruments=4, K=2

n_params = 2 * 2 * 2  # K * K * p parameters
df_hansen = n_instruments - n_params

print('=== Hansen J-Test Framework ===')
print(f'Number of instruments (m):      {n_instruments}')
print(f'Number of parameters (k):       {n_params}')
print(f'Degrees of freedom (m - k):     {df_hansen}')
print(f'Number of entities (N):         {df_dyn["country"].nunique()}')
print(f'\nInstrument count rule check:')
print(f'  Instruments ({n_instruments}) <= Entities ({df_dyn["country"].nunique()})? '
      f'{"PASS" if n_instruments <= df_dyn["country"].nunique() else "WARNING: Too many instruments!"}')
print()
print('Interpretation guidelines:')
print('  - p-value > 0.05: Do not reject H0 -> instruments appear valid')
print('  - p-value < 0.05: Reject H0 -> instruments may be invalid')
print('  - p-value > 0.99 with many instruments: Test may lack power')

# Try to run the actual Hansen J-test if diagnostics are available
try:
    from panelbox.var.diagnostics import GMMDiagnostics
    if result_diff_gmm is not None and Z_instruments is not None:
        resid = result_diff_gmm.residuals
        n_p = result_diff_gmm.coefficients.shape[0]
        Z_aligned = Z_instruments[:resid.shape[0], :] if Z_instruments.shape[0] >= resid.shape[0] else Z_instruments
        if Z_aligned.shape[0] > resid.shape[0]:
            Z_aligned = Z_aligned[:resid.shape[0], :]
        elif Z_aligned.shape[0] < resid.shape[0]:
            resid = resid[:Z_aligned.shape[0], :]

        diagnostics = GMMDiagnostics(
            residuals=resid,
            instruments=Z_aligned,
            n_params=n_p,
            n_entities=result_diff_gmm.n_entities
        )
        hansen = diagnostics.hansen_j_test()
        print(f'\n=== Hansen J-Test Result ===')
        print(f'  Statistic: {hansen["statistic"]:.4f}')
        print(f'  P-value:   {hansen["p_value"]:.4f}')
        print(f'  DF:        {hansen["df"]}')
        print(f'  Result:    {hansen["interpretation"]}')
except Exception as e:
    print(f'\nHansen J-test computation: {e}')

In [None]:
# ============================================================
# AR(2) Test for Serial Correlation
# ============================================================

# The Arellano-Bond AR test checks for serial correlation in the
# differenced residuals. The key logic:
#
# In first differences: Delta_eps_it = eps_it - eps_{i,t-1}
# AR(1) in differenced residuals is EXPECTED (mechanical)
# AR(2) in differenced residuals indicates AR(1) in LEVELS -> instruments invalid

print('=== Arellano-Bond Serial Correlation Tests ===')
print()
print('Logic of the AR tests:')
print('  1. AR(1) in differenced residuals: EXPECTED to be significant')
print('     (because Delta_eps_it and Delta_eps_{i,t-1} share eps_{i,t-1})')
print()
print('  2. AR(2) in differenced residuals: SHOULD NOT be significant')
print('     (if significant, suggests eps_it has serial correlation in levels,')
print('     which would invalidate the moment conditions)')
print()
print('Decision rule:')
print('  - Reject AR(1)?  Expected -> OK')
print('  - Reject AR(2)?  Problem! -> instruments may be invalid')
print('  - Do not reject AR(2)?  Good -> model specification is likely correct')

In [None]:
# ============================================================
# Difference-in-Hansen Test
# ============================================================

# The Difference-in-Hansen test (also called C-test or incremental Sargan)
# tests the validity of a SUBSET of instruments.
# 
# For System GMM, it tests whether the ADDITIONAL level-equation
# instruments are valid:
#
# H0: The additional instruments in the system are valid
# Test: J_system - J_difference ~ chi^2(df_system - df_difference)

print('=== Difference-in-Hansen Test ===')
print()
print('Purpose: Test validity of the additional System GMM instruments')
print()
print('Procedure:')
print('  1. Estimate Difference GMM -> obtain J_diff statistic')
print('  2. Estimate System GMM     -> obtain J_sys statistic')
print('  3. Compute: C = J_sys - J_diff')
print('  4. Under H0: C ~ chi^2(df_sys - df_diff)')
print()
print('If C is significant (p < 0.05):')
print('  -> The stationarity assumption for System GMM may be violated')
print('  -> Fall back to Difference GMM')
print()
print('If C is not significant (p > 0.05):')
print('  -> System GMM additional instruments appear valid')
print('  -> System GMM is preferred (more efficient)')

In [None]:
# ============================================================
# Diagnostic Summary Table
# ============================================================

diag_summary = pd.DataFrame({
    'Test': [
        'Hansen J-test',
        'AR(1) test',
        'AR(2) test',
        'Instrument count rule',
        'Difference-in-Hansen'
    ],
    'Null Hypothesis': [
        'Instruments are valid',
        'No AR(1) in diff. residuals',
        'No AR(2) in diff. residuals',
        'n_instruments <= N',
        'Additional instruments valid'
    ],
    'Desired Outcome': [
        'Do NOT reject (p > 0.05)',
        'Reject (p < 0.05) -- expected',
        'Do NOT reject (p > 0.05)',
        'PASS (instruments <= entities)',
        'Do NOT reject (p > 0.05)'
    ],
    'Action if Failed': [
        'Reconsider instrument set',
        'No action needed (expected)',
        'Add more lags, respecify model',
        'Use collapsed instruments',
        'Use Diff-GMM instead of Sys-GMM'
    ]
})

print('=== GMM Diagnostic Decision Table ===')
print(diag_summary.to_string(index=False))

# Save
diag_summary.to_csv('../outputs/tables/06_diagnostic_decision_table.csv', index=False)

# Visualize as a formatted table figure
fig, ax = plt.subplots(figsize=(14, 4))
ax.axis('off')
table = ax.table(
    cellText=diag_summary.values,
    colLabels=diag_summary.columns,
    cellLoc='left',
    loc='center',
    colColours=['#f0f0f0'] * len(diag_summary.columns)
)
table.auto_set_font_size(False)
table.set_fontsize(9)
table.scale(1.0, 1.6)

ax.set_title('GMM Diagnostic Decision Table', fontsize=14, fontweight='bold', pad=20)
fig.tight_layout()
fig.savefig('../outputs/figures/gmm/06_diagnostic_decision_table.png', dpi=150, bbox_inches='tight')
plt.show()

---

## 5. Instrument Collapse

### The Problem: Instrument Proliferation

In standard GMM, the number of instruments grows **quadratically** with T:

- At $t=3$: 1 instrument per variable
- At $t=4$: 2 instruments per variable
- At $t=T$: $T-2$ instruments per variable
- **Total per variable:** $\sum_{s=1}^{T-2} s = \frac{(T-2)(T-1)}{2}$

For our panel with $T=15$ and $K=2$ variables, this can quickly exceed the number of entities $N=100$.

### Consequences of Too Many Instruments

1. **Weakens the Hansen J-test:** The test loses power and almost never rejects, even when instruments are invalid
2. **Biases GMM toward OLS:** With many instruments, GMM approaches the biased OLS estimator
3. **Overfits endogenous variables:** Creates a spurious perfect fit

### Solution: Collapsed Instruments (Roodman 2009)

Instead of separate columns for each lag at each time period, **collapsed instruments** sum across time periods:

$$z_{it}^{collapsed} = \sum_{s \geq 2} y_{i,t-s}$$

This reduces the instrument count from $O(T^2)$ to $O(T)$, dramatically limiting proliferation.

In [None]:
# ============================================================
# Instrument Count: Standard vs Collapsed (try/except)
# ============================================================

Z_all = None
Z_collapsed = None
meta_all = None
meta_collapsed = None

try:
    # Standard instruments (all available lags)
    Z_all, meta_all = build_gmm_instruments(
        data=df_dyn,
        var_lags=2,
        n_vars=2,
        entity_col='country',
        time_col='year',
        value_cols=['y1', 'y2'],
        instrument_type='all',
        max_instruments=None  # use all available
    )

    # Collapsed instruments
    Z_collapsed, meta_collapsed = build_gmm_instruments(
        data=df_dyn,
        var_lags=2,
        n_vars=2,
        entity_col='country',
        time_col='year',
        value_cols=['y1', 'y2'],
        instrument_type='collapsed',
        max_instruments=None
    )

    print('=== Instrument Count Comparison ===')
    print(f'Standard instruments: {Z_all.shape[1]}')
    print(f'Collapsed instruments: {Z_collapsed.shape[1]}')
    print(f'Reduction: {Z_all.shape[1] - Z_collapsed.shape[1]} '
          f'({(1 - Z_collapsed.shape[1]/Z_all.shape[1])*100:.1f}%)')
    print(f'\nNumber of entities (N): {df_dyn["country"].nunique()}')
    print(f'\nRule check (instruments <= N):')
    print(f'  Standard:  {Z_all.shape[1]} <= {df_dyn["country"].nunique()} -> '
          f'{"PASS" if Z_all.shape[1] <= df_dyn["country"].nunique() else "FAIL"}')
    print(f'  Collapsed: {Z_collapsed.shape[1]} <= {df_dyn["country"].nunique()} -> '
          f'{"PASS" if Z_collapsed.shape[1] <= df_dyn["country"].nunique() else "FAIL"}')

except Exception as e:
    print(f'Instrument construction: {e}')
    print()
    print('Theoretical Instrument Counts (K=2, p=2):')
    print('-' * 50)
    print(f'{"T":>5} {"Standard":>12} {"Collapsed":>12}')
    for T_ex in [5, 10, 15, 20, 30]:
        standard = 2 * (T_ex - 2) * (T_ex - 1) // 2
        collapsed = 2 * (T_ex - 2)
        print(f'{T_ex:>5} {standard:>12} {collapsed:>12}')

In [None]:
# ============================================================
# Estimate with Collapsed Instruments (try/except)
# ============================================================

result_collapsed = None

try:
    result_collapsed = estimate_panel_var_gmm(
        data=df_dyn,
        var_lags=2,
        value_cols=['y1', 'y2'],
        entity_col='country',
        time_col='year',
        transform='fod',
        gmm_step='two-step',
        instrument_type='collapsed',
        windmeijer_correction=True
    )

    print('=== Collapsed Instruments GMM Results ===')
    print(f'Number of instruments: {result_collapsed.n_instruments}')
    print(f'Instrument type: {result_collapsed.instrument_type}')
    print(f'\nCoefficients:')
    for eq in range(result_collapsed.coefficients.shape[1]):
        print(f'\n  Equation y{eq+1}:')
        for j, name in enumerate(coef_names):
            if j < result_collapsed.coefficients.shape[0]:
                coef = result_collapsed.coefficients[j, eq]
                se = result_collapsed.standard_errors[j, eq] if result_collapsed.standard_errors.ndim > 1 else result_collapsed.standard_errors[j]
                print(f'    {name}: {coef:+.4f} (SE: {se:.4f})')

except Exception as e:
    print(f'Collapsed GMM estimation: {e}')
    print()
    print('Theoretical Result:')
    print('  Collapsed instruments reduce the instrument count from O(T^2) to O(T)')
    print('  by summing instruments across time periods within each lag depth.')
    print('  The coefficient estimates should be similar to standard GMM,')
    print('  but diagnostic tests retain more power.')

In [None]:
# ============================================================
# Instrument Proliferation Visualization
# ============================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Instrument count growth with T
T_range = np.arange(5, 31)
K_var = 2
n_instr_standard = [K_var * (t-2)*(t-1)//2 for t in T_range]
n_instr_collapsed = [K_var * (t-2) for t in T_range]

ax = axes[0]
ax.plot(T_range, n_instr_standard, 'o-', color='#d62728', linewidth=2, markersize=5,
        label='Standard: $O(T^2)$')
ax.plot(T_range, n_instr_collapsed, 's-', color='#2ca02c', linewidth=2, markersize=5,
        label='Collapsed: $O(T)$')
ax.axhline(y=100, color='blue', linewidth=2, linestyle='--',
           label='N = 100 (entity count)', alpha=0.7)
ax.fill_between(T_range, 100, max(n_instr_standard), alpha=0.1, color='red')
ax.set_xlabel('T (time periods)', fontsize=12)
ax.set_ylabel('Number of Instruments', fontsize=12)
ax.set_title('Instrument Growth: Standard vs Collapsed', fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)

# Right: Coefficient comparison (with fallbacks for unavailable GMM results)
ax = axes[1]
estimators = ['OLS (FE)']
rho_y1_estimates = [A1_ols_vals[0, 0]]
colors_bar = ['#d62728']

if result_diff_gmm is not None:
    estimators.append('Diff-GMM\n(standard)')
    rho_y1_estimates.append(A1_diff_vals[0, 0])
    colors_bar.append('#1f77b4')

if result_sys_gmm is not None:
    estimators.append('Sys-GMM\n(standard)')
    rho_y1_estimates.append(A1_sys_vals[0, 0])
    colors_bar.append('#ff7f0e')

if result_collapsed is not None:
    estimators.append('Sys-GMM\n(collapsed)')
    rho_y1_estimates.append(result_collapsed.coefficients[0, 0])
    colors_bar.append('#2ca02c')

bars = ax.bar(estimators, rho_y1_estimates, color=colors_bar, alpha=0.8,
              edgecolor='black', linewidth=0.5)
ax.axhline(y=0.50, color='black', linewidth=2, linestyle='--',
           label='True $\\rho_{y1}$ = 0.50')
ax.set_ylabel('Estimated $\\hat{\\rho}_{y1}$', fontsize=12)
ax.set_title('Effect of Instrument Choice on $\\hat{\\rho}$',
             fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3, axis='y')

if result_collapsed is None and result_diff_gmm is None and result_sys_gmm is None:
    ax.text(0.5, 0.5, 'GMM not available\n(OLS only shown)',
            transform=ax.transAxes, ha='center', va='center',
            fontsize=11, style='italic', color='gray')

fig.suptitle('Instrument Proliferation and Its Consequences',
             fontsize=14, fontweight='bold', y=1.02)
fig.tight_layout()
fig.savefig('../outputs/figures/gmm/06_instrument_proliferation.png', dpi=150, bbox_inches='tight')
plt.show()

print('Key insights:')
print('  - Standard instruments grow quadratically with T')
print('  - Collapsed instruments grow linearly, staying well below N')
if result_collapsed is not None:
    print('  - Collapsed instruments produce similar coefficient estimates')
print('  - Always check: n_instruments <= N')

---

## 6. OLS vs GMM Comparison

### When to Use Which Estimator?

The choice between OLS (with fixed effects) and GMM depends primarily on the **time dimension T**.

Recall:
- **OLS bias:** $O(1/T)$ -- decreases as T grows
- **GMM:** Consistent for fixed T, large N -- but higher variance in finite samples

### Decision Rule

| Condition | Recommendation | Rationale |
|---|---|---|
| $T < 20$ | **Use GMM** | Nickell bias is substantial |
| $T > 30$ | **OLS usually OK** | Bias is small relative to variance |
| $20 \leq T \leq 30$ | **Compare both** | Gray zone -- bias is moderate |

When comparing, the GMM estimate should be **above** the OLS estimate (for persistence parameters), since OLS has downward bias. If OLS and GMM are close, this suggests the bias is small and OLS is adequate.

In [None]:
# ============================================================
# Comprehensive OLS vs GMM Comparison Table
# ============================================================

# Generate macro panel for a large-T comparison
df_macro = generate_macro_panel(n_countries=30, n_quarters=40, seed=42)

print(f'Dynamic panel: N={df_dyn["country"].nunique()}, T={df_dyn["year"].nunique()} (small T)')
print(f'Macro panel:   N={df_macro["country"].nunique()}, T={df_macro["quarter"].nunique()} (large T)')

# Estimate OLS on macro panel for comparison
try:
    data_macro = PanelVARData(
        df_macro,
        endog_vars=['gdp_growth', 'inflation'],
        entity_col='country',
        time_col='quarter',
        lags=2
    )
    model_macro = PanelVAR(data_macro)
    results_macro_ols = model_macro.fit(method='ols', cov_type='clustered')

    print('\n=== Macro Panel OLS Results ===')
    print(f'A1 diagonal (persistence):')
    for i, var in enumerate(['gdp_growth', 'inflation']):
        print(f'  {var}: {results_macro_ols.A_matrices[0][i,i]:.4f}')
except Exception as e:
    results_macro_ols = None
    print(f'\nMacro panel OLS estimation: {e}')
    print('Continuing with dynamic panel comparison only.')

In [None]:
# ============================================================
# Side-by-Side Comparison Table
# ============================================================

# Dynamic panel (T=15): OLS vs GMM (with fallbacks for unavailable GMM)
comparison_cols = {
    'Property': [
        'Dataset',
        'N (entities)',
        'T (time periods)',
        'Method',
        'rho_y1 (persistence of y1)',
        'rho_y2 (persistence of y2)',
        'y1 <- y2(t-1) (cross-effect)',
        'Number of instruments',
        'Nickell bias concern',
    ],
    'OLS (FE)': [
        'Dynamic panel',
        str(df_dyn['country'].nunique()),
        str(df_dyn['year'].nunique()),
        'Within OLS',
        f'{A1_ols_vals[0,0]:.4f}',
        f'{A1_ols_vals[1,1]:.4f}',
        f'{A1_ols_vals[1,0]:.4f}',
        'N/A',
        'SEVERE (T=15)',
    ],
}

if result_diff_gmm is not None:
    comparison_cols['Diff-GMM'] = [
        'Dynamic panel',
        str(df_dyn['country'].nunique()),
        str(df_dyn['year'].nunique()),
        'Arellano-Bond (FD)',
        f'{A1_diff_vals[0,0]:.4f}',
        f'{A1_diff_vals[1,1]:.4f}',
        f'{A1_diff_vals[1,0]:.4f}',
        str(result_diff_gmm.n_instruments),
        'Corrected',
    ]

if result_sys_gmm is not None:
    comparison_cols['Sys-GMM'] = [
        'Dynamic panel',
        str(df_dyn['country'].nunique()),
        str(df_dyn['year'].nunique()),
        'Blundell-Bond (FOD)',
        f'{A1_sys_vals[0,0]:.4f}',
        f'{A1_sys_vals[1,1]:.4f}',
        f'{A1_sys_vals[1,0]:.4f}',
        str(result_sys_gmm.n_instruments),
        'Corrected',
    ]

comparison_table = pd.DataFrame(comparison_cols)

print('=== OLS vs GMM Decision Framework ===')
print(comparison_table.to_string(index=False))
print()
print('Decision rules:')
print('  - T < 20:  Use GMM (Nickell bias is substantial)')
print('  - T > 30:  OLS usually OK (bias is small)')
print('  - 20 <= T <= 30: Compare both estimates; if close, OLS is fine')

if result_diff_gmm is None and result_sys_gmm is None:
    print()
    print('Note: GMM estimates are not available. In practice, you would')
    print('compare OLS against GMM results to assess Nickell bias severity.')

# Save
comparison_table.to_csv('../outputs/tables/06_ols_vs_gmm_comparison.csv', index=False)

---

## 7. Application: Shock Persistence

### Half-Life of Shocks

A key application of dynamic panel models is measuring **how long shocks persist**. The half-life tells us how many periods it takes for a shock to decay to half its initial impact.

For an AR(1) process with autoregressive coefficient $\rho$:

$$\text{Half-life} = \frac{\ln(0.5)}{\ln(\rho)}$$

The half-life is increasing in $\rho$. Since OLS underestimates $\rho$ (Nickell bias), it also **underestimates the half-life** -- making shocks appear to dissipate faster than they actually do.

This has important policy implications:
- If a GDP shock truly persists for 5 years but OLS says 2 years, policymakers may withdraw stimulus too early
- If an inflation shock persists longer than estimated, central banks may under-react

In [None]:
# ============================================================
# Half-Life Calculation: OLS vs GMM
# ============================================================

def compute_half_life(rho):
    """Compute half-life from autoregressive coefficient."""
    if rho <= 0 or rho >= 1:
        return np.inf
    return np.log(0.5) / np.log(rho)

# Extract persistence parameters (diagonal of A1)
rho_y1_ols = A1_ols_vals[0, 0]
rho_y2_ols = A1_ols_vals[1, 1]
rho_y1_diff = A1_diff_vals[0, 0]
rho_y2_diff = A1_diff_vals[1, 1]
rho_y1_sys = A1_sys_vals[0, 0]
rho_y2_sys = A1_sys_vals[1, 1]

# Build half-life table dynamically based on available results
hl_rows = []
for var_name, rho_true_val, rho_ols_val, rho_diff_val, rho_sys_val in [
    ('y1', 0.50, rho_y1_ols, rho_y1_diff, rho_y1_sys),
    ('y2', 0.40, rho_y2_ols, rho_y2_diff, rho_y2_sys),
]:
    hl_rows.append({'Variable': var_name, 'Method': 'True DGP', 'rho': rho_true_val})
    hl_rows.append({'Variable': var_name, 'Method': 'OLS (FE)', 'rho': rho_ols_val})
    if result_diff_gmm is not None:
        hl_rows.append({'Variable': var_name, 'Method': 'Diff-GMM', 'rho': rho_diff_val})
    if result_sys_gmm is not None:
        hl_rows.append({'Variable': var_name, 'Method': 'Sys-GMM', 'rho': rho_sys_val})

hl_table = pd.DataFrame(hl_rows)
hl_table['Half-Life (periods)'] = hl_table['rho'].apply(compute_half_life)

print('=== Half-Life Comparison ===')
print(hl_table.round(3).to_string(index=False))

if result_diff_gmm is None and result_sys_gmm is None:
    print()
    print('Note: GMM results not available. Only True DGP and OLS shown.')
    print('In practice, GMM estimates would show longer half-lives (higher rho)')
    print('because OLS underestimates persistence due to Nickell bias.')

# Save
hl_table.round(3).to_csv('../outputs/tables/06_half_life_comparison.csv', index=False)
print('\nTable saved to ../outputs/tables/06_half_life_comparison.csv')

In [None]:
# ============================================================
# Impulse Decay Visualization
# ============================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

horizons = np.arange(0, 21)

# y1 persistence
ax = axes[0]
y1_lines = [
    ('True ($\\rho$=0.50)', 0.50, '#2ca02c', '-'),
    ('OLS', rho_y1_ols, '#d62728', '--'),
]
if result_diff_gmm is not None:
    y1_lines.append(('Diff-GMM', rho_y1_diff, '#1f77b4', '-.'))
if result_sys_gmm is not None:
    y1_lines.append(('Sys-GMM', rho_y1_sys, '#ff7f0e', ':'))

for label, rho, color, ls in y1_lines:
    if 0 < rho < 1:
        decay = rho ** horizons
        ax.plot(horizons, decay, color=color, linewidth=2, linestyle=ls, label=label)

ax.axhline(y=0.5, color='gray', linewidth=0.8, linestyle='--', alpha=0.5)
ax.set_xlabel('Periods After Shock', fontsize=12)
ax.set_ylabel('Remaining Impact (fraction)', fontsize=12)
ax.set_title('Shock Decay: y1', fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim(-0.05, 1.05)

# y2 persistence
ax = axes[1]
y2_lines = [
    ('True ($\\rho$=0.40)', 0.40, '#2ca02c', '-'),
    ('OLS', rho_y2_ols, '#d62728', '--'),
]
if result_diff_gmm is not None:
    y2_lines.append(('Diff-GMM', rho_y2_diff, '#1f77b4', '-.'))
if result_sys_gmm is not None:
    y2_lines.append(('Sys-GMM', rho_y2_sys, '#ff7f0e', ':'))

for label, rho, color, ls in y2_lines:
    if 0 < rho < 1:
        decay = rho ** horizons
        ax.plot(horizons, decay, color=color, linewidth=2, linestyle=ls, label=label)

ax.axhline(y=0.5, color='gray', linewidth=0.8, linestyle='--', alpha=0.5)
ax.set_xlabel('Periods After Shock', fontsize=12)
ax.set_ylabel('Remaining Impact (fraction)', fontsize=12)
ax.set_title('Shock Decay: y2', fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim(-0.05, 1.05)

fig.suptitle('How Estimator Bias Affects Perceived Shock Persistence',
             fontsize=14, fontweight='bold', y=1.02)
fig.tight_layout()
fig.savefig('../outputs/figures/gmm/06_shock_persistence.png', dpi=150, bbox_inches='tight')
plt.show()

print('Key insight:')
print('  OLS makes shocks appear to die out FASTER than they actually do.')
print('  This is because OLS underestimates persistence (rho) due to Nickell bias.')
if result_diff_gmm is not None or result_sys_gmm is not None:
    print('  GMM provides more accurate half-life estimates.')

---

## 8. Summary

### Key Takeaways

1. **Nickell Bias** is a fundamental problem in dynamic panel models. The fixed-effects OLS estimator has a downward bias of $O(1/T)$ on the autoregressive parameter. This bias is **severe** when $T < 20$.

2. **Difference GMM (Arellano-Bond)** eliminates the fixed effect through first-differencing and uses lagged levels as instruments. It is consistent for fixed T, large N.

3. **System GMM (Blundell-Bond)** adds level equations with lagged-difference instruments. It is more efficient than Difference GMM when $\rho$ is close to 1 (persistent processes), but requires a stationarity assumption.

4. **GMM Diagnostics** are essential:
   - **Hansen J-test**: instruments must be jointly valid ($p > 0.05$)
   - **AR(2) test**: no second-order serial correlation ($p > 0.05$)
   - **Instrument count**: must not exceed the number of entities ($m \leq N$)

5. **Instrument collapse** (Roodman 2009) reduces the instrument count from $O(T^2)$ to $O(T)$, preventing proliferation and maintaining the power of diagnostic tests.

6. **Half-life analysis** demonstrates a practical consequence of estimator bias: OLS underestimates shock persistence, potentially leading to premature policy withdrawal.

### Decision Framework

```
Is T < 20?
  YES -> Use GMM (Diff-GMM or Sys-GMM)
           Is rho near 1 (persistent)?
             YES -> Prefer System GMM
             NO  -> Difference GMM is fine
           Check: n_instruments <= N?
             NO  -> Use collapsed instruments
  NO  -> OLS is usually adequate
           Still compare with GMM as robustness check
```

### References

- Nickell, S. (1981). Biases in dynamic models with fixed effects. *Econometrica*, 49(6), 1417-1426.
- Arellano, M., & Bond, S. (1991). Some tests of specification for panel data. *Review of Economic Studies*, 58(2), 277-297.
- Blundell, R., & Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. *Journal of Econometrics*, 87(1), 115-143.
- Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. *The Stata Journal*, 9(1), 86-136.
- Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step GMM estimators. *Journal of Econometrics*, 126(1), 25-51.

---

## 9. Exercises

### Exercise 1: Nickell Bias Monte Carlo (Easy)

Reproduce the Monte Carlo experiment from Section 1 with:
- $\rho_{true} = 0.9$ (high persistence)
- $N = 500$
- $T \in \{5, 10, 20, 50, 100\}$
- 100 simulations per T

Questions:
1. How does the bias compare to $\rho_{true} = 0.7$?
2. Is the theoretical formula $-(1+\rho)/(T-1)$ still accurate?
3. At what T does the bias become less than 5% of the true value?

In [None]:
# YOUR CODE HERE

### Exercise 2: Difference GMM vs System GMM Comparison (Medium)

Using the dynamic panel data (`generate_dynamic_panel()`), estimate both Difference GMM and System GMM with:
- `var_lags=1` (simpler model)
- `max_instruments=3`
- Both one-step and two-step variants

Compare:
1. Coefficient estimates across all four variants
2. Standard errors (are two-step SEs smaller?)
3. Number of instruments
4. Which is closest to the true DGP values?

In [None]:
# YOUR CODE HERE

### Exercise 3: Instrument Proliferation Analysis (Medium)

Systematically analyze how the number of instruments affects GMM estimates:

1. Estimate System GMM with `max_instruments` ranging from 2 to 10
2. For each, record: number of instruments, coefficient estimates, and standard errors
3. Plot how the persistence estimate $\hat{\rho}_{y1}$ changes with instrument count
4. Identify the point where adding more instruments starts pushing estimates toward OLS

This exercise demonstrates why controlling instrument count matters.

In [None]:
# YOUR CODE HERE

### Exercise 4: Forward Orthogonal Deviations (Hard)

The Forward Orthogonal Deviations (FOD) transformation is an alternative to first-differencing:

$$\tilde{Y}_{it} = \sqrt{\frac{T-t}{T-t+1}} \left( Y_{it} - \frac{1}{T-t} \sum_{s=t+1}^{T} Y_{is} \right)$$

FOD has several advantages:
- Preserves orthogonality of the errors (if original errors are i.i.d.)
- Works better with unbalanced panels
- Uses all available instruments efficiently

Tasks:
1. Implement the FOD transformation manually for a single entity
2. Compare first-difference and FOD transformed data visually
3. Estimate GMM using both `transform='fd'` and `transform='fod'`
4. Discuss: when does the choice of transformation matter most?

In [None]:
# YOUR CODE HERE