# Bias-Corrected GMM for Dynamic Panels

This tutorial demonstrates bias correction in dynamic panel data models using the Hahn-Kuersteiner (2002) approach.

## Contents
1. Dynamic Panel Model Setup
2. Standard GMM (Arellano-Bond)
3. Bias-Corrected GMM
4. Comparison and Bias Magnitude
5. Monte Carlo: Demonstrating Bias Reduction
6. Economic Interpretation

## References
- Hahn, J., & Kuersteiner, G. (2002). "Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both n and T Are Large." *Econometrica*, 70(4), 1639-1657.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

from panelbox.gmm import BiasCorrectedGMM
from panelbox import DifferenceGMM  # Standard Arellano-Bond

# Set random seed for reproducibility
np.random.seed(123)
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)

## 1. Dynamic Panel Model Setup

Consider the dynamic panel data model:

$$y_{it} = \rho y_{i,t-1} + X_{it}'\beta + \alpha_i + \varepsilon_{it}$$

where:
- $y_{it}$ is the outcome for entity $i$ at time $t$
- $\rho$ is the autoregressive coefficient
- $X_{it}$ are exogenous regressors
- $\alpha_i$ are fixed effects
- $\varepsilon_{it}$ are idiosyncratic errors

**Problem:** Standard GMM has bias of order $O(1/N)$ in finite samples

**Solution:** Hahn-Kuersteiner bias correction reduces bias to $O(1/N^2)$

In [None]:
def generate_dynamic_panel(N=100, T=10, rho=0.5, beta=0.3, seed=None):
    """
    Generate dynamic panel data.
    
    Parameters
    ----------
    N : int
        Number of entities
    T : int
        Number of time periods
    rho : float
        Autoregressive coefficient (persistence)
    beta : float
        Coefficient on exogenous regressor
    seed : int, optional
        Random seed
    
    Returns
    -------
    pd.DataFrame
        Panel data with columns: entity, time, y, y_lag, x, alpha
    """
    if seed is not None:
        np.random.seed(seed)
    
    # Fixed effects
    alpha = np.random.randn(N)
    
    # Exogenous regressor
    X = np.random.randn(N, T)
    
    # Initialize outcome
    y = np.zeros((N, T))
    
    # Initial condition
    y[:, 0] = alpha + np.random.randn(N)
    
    # Generate dynamic process
    for t in range(1, T):
        epsilon = np.random.randn(N)
        y[:, t] = rho * y[:, t-1] + beta * X[:, t] + alpha + epsilon
    
    # Create long-format dataframe
    data = []
    for i in range(N):
        for t in range(T):
            data.append({
                'entity': i,
                'time': t,
                'y': y[i, t],
                'y_lag': y[i, t-1] if t > 0 else np.nan,
                'x': X[i, t],
                'alpha': alpha[i]
            })
    
    df = pd.DataFrame(data)
    
    # Drop first period (no lag)
    df = df[df['time'] > 0].reset_index(drop=True)
    
    return df

# Generate data with moderate N and T
N, T = 100, 10
rho_true = 0.5
beta_true = 0.3

df = generate_dynamic_panel(N=N, T=T, rho=rho_true, beta=beta_true, seed=123)

print(f"Panel dimensions: N={N}, T={T}")
print(f"Total observations: {len(df)}")
print(f"\nTrue parameters:")
print(f"  ρ (persistence): {rho_true}")
print(f"  β (x coefficient): {beta_true}")
print(f"\nFirst few rows:")
print(df.head(10))

### Visualize the Dynamic Process

In [None]:
# Plot time series for first 5 entities
fig, ax = plt.subplots(figsize=(12, 6))

for entity in range(min(5, N)):
    entity_data = df[df['entity'] == entity]
    ax.plot(entity_data['time'], entity_data['y'], marker='o', label=f'Entity {entity}')

ax.set_xlabel('Time')
ax.set_ylabel('y')
ax.set_title('Dynamic Panel: First 5 Entities')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

# Summary statistics
print("\nSummary statistics:")
print(df[['y', 'y_lag', 'x']].describe())

## 2. Standard GMM (Arellano-Bond)

First, estimate using standard GMM without bias correction.

**Moment conditions:**
$$E[Z_{it}' \Delta \varepsilon_{it}] = 0$$

where $Z_{it}$ are instruments (lags of $y$).

In [None]:
# Estimate standard GMM (Arellano-Bond)
gmm_standard = DifferenceGMM.from_formula(
    formula='y ~ y_lag + x',
    data=df,
    entity_col='entity',
    time_col='time'
)

result_gmm = gmm_standard.fit()
print("="*60)
print("STANDARD GMM (ARELLANO-BOND)")
print("="*60)
print(result_gmm.summary())

## 3. Bias-Corrected GMM

Now estimate using Hahn-Kuersteiner bias correction:

$$\hat{\beta}^{BC} = \hat{\beta}^{GMM} - \frac{\hat{B}(\hat{\beta}^{GMM})}{N}$$

where $\hat{B}(\beta)$ is the estimated bias term based on analytical formulas.

In [None]:
# Estimate Bias-Corrected GMM
gmm_bc = BiasCorrectedGMM.from_formula(
    formula='y ~ y_lag + x',
    data=df,
    entity_col='entity',
    time_col='time',
    bias_order=1  # First-order bias correction
)

result_bc = gmm_bc.fit()
print("="*60)
print("BIAS-CORRECTED GMM (HAHN-KUERSTEINER)")
print("="*60)
print(result_bc.summary())

## 4. Comparison and Bias Magnitude

Compare standard GMM vs bias-corrected GMM:

In [None]:
# Create comparison table
comparison = pd.DataFrame({
    'True': [rho_true, beta_true],
    'Standard GMM': result_gmm.params[['y_lag', 'x']].values,
    'Bias-Corrected': result_bc.params[['y_lag', 'x']].values,
    'GMM Bias': result_gmm.params[['y_lag', 'x']].values - [rho_true, beta_true],
    'BC Bias': result_bc.params[['y_lag', 'x']].values - [rho_true, beta_true]
}, index=['ρ (y_lag)', 'β (x)'])

print("\n" + "="*80)
print("COMPARISON: STANDARD GMM vs BIAS-CORRECTED GMM")
print("="*80)
print(comparison)

# Bias magnitude
print("\n" + "="*80)
print("BIAS CORRECTION MAGNITUDE")
print("="*80)
bias_mag = result_bc.bias_magnitude()
print(f"Bias correction applied: {bias_mag}")
print(f"\nBias reduction:")
print(f"  ρ: {abs(comparison.loc['ρ (y_lag)', 'GMM Bias']):.4f} → {abs(comparison.loc['ρ (y_lag)', 'BC Bias']):.4f}")
print(f"  β: {abs(comparison.loc['β (x)', 'GMM Bias']):.4f} → {abs(comparison.loc['β (x)', 'BC Bias']):.4f}")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

params = ['ρ (y_lag)', 'β (x)']
true_vals = [rho_true, beta_true]

for i, (param, true_val) in enumerate(zip(params, true_vals)):
    ax = axes[i]
    
    x_pos = np.arange(3)
    values = [
        true_val,
        comparison.loc[param, 'Standard GMM'],
        comparison.loc[param, 'Bias-Corrected']
    ]
    colors = ['green', 'orange', 'blue']
    labels = ['True', 'Standard GMM', 'Bias-Corrected']
    
    bars = ax.bar(x_pos, values, color=colors, alpha=0.7, edgecolor='black')
    ax.axhline(y=true_val, color='green', linestyle='--', linewidth=2, label='True value')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(labels, rotation=15, ha='right')
    ax.set_ylabel('Estimate')
    ax.set_title(f'Parameter: {param}')
    ax.grid(True, alpha=0.3, axis='y')
    ax.legend()

plt.tight_layout()
plt.show()

## 5. Monte Carlo: Demonstrating Bias Reduction

Run a Monte Carlo simulation to demonstrate bias reduction properties:
- 500 replications
- N=50, T=10 (moderate panel)
- Compare bias and RMSE

In [None]:
def monte_carlo_bias_correction(n_reps=500, N=50, T=10, rho=0.5, beta=0.3):
    """
    Run Monte Carlo to demonstrate bias correction.
    """
    gmm_estimates = []
    bc_estimates = []
    
    print(f"Running {n_reps} Monte Carlo replications with N={N}, T={T}...")
    
    for rep in range(n_reps):
        if (rep + 1) % 50 == 0:
            print(f"  Replication {rep + 1}/{n_reps}")
        
        # Generate data
        df_mc = generate_dynamic_panel(N=N, T=T, rho=rho, beta=beta, seed=rep)
        
        try:
            # Standard GMM
            gmm = DifferenceGMM.from_formula(
                'y ~ y_lag + x',
                data=df_mc,
                entity_col='entity',
                time_col='time'
            )
            res_gmm = gmm.fit()
            gmm_estimates.append(res_gmm.params[['y_lag', 'x']].values)
            
            # Bias-Corrected GMM
            bc = BiasCorrectedGMM.from_formula(
                'y ~ y_lag + x',
                data=df_mc,
                entity_col='entity',
                time_col='time'
            )
            res_bc = bc.fit()
            bc_estimates.append(res_bc.params[['y_lag', 'x']].values)
        except:
            # Skip failed replications
            continue
    
    gmm_estimates = np.array(gmm_estimates)
    bc_estimates = np.array(bc_estimates)
    
    true_params = np.array([rho, beta])
    
    results = {
        'gmm_bias': np.mean(gmm_estimates - true_params, axis=0),
        'bc_bias': np.mean(bc_estimates - true_params, axis=0),
        'gmm_std': np.std(gmm_estimates, axis=0),
        'bc_std': np.std(bc_estimates, axis=0),
        'gmm_rmse': np.sqrt(np.mean((gmm_estimates - true_params)**2, axis=0)),
        'bc_rmse': np.sqrt(np.mean((bc_estimates - true_params)**2, axis=0))
    }
    
    return results

# Run Monte Carlo (using fewer reps for speed)
print("Running Monte Carlo simulation...")
print("(Using 100 replications for speed - increase for production)\n")
mc_results = monte_carlo_bias_correction(n_reps=100, N=50, T=10)

In [None]:
# Visualize Monte Carlo results
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

metrics = ['bias', 'std', 'rmse']
metric_names = ['Bias', 'Std. Dev.', 'RMSE']
param_names = ['ρ (persistence)', 'β (x coef.)']

x_pos = np.arange(2)
width = 0.35

for i, (metric, metric_name) in enumerate(zip(metrics, metric_names)):
    ax = axes[i]
    
    gmm_vals = mc_results[f'gmm_{metric}']
    bc_vals = mc_results[f'bc_{metric}']
    
    ax.bar(x_pos - width/2, gmm_vals, width, label='Standard GMM', 
           color='orange', alpha=0.7, edgecolor='black')
    ax.bar(x_pos + width/2, bc_vals, width, label='Bias-Corrected', 
           color='blue', alpha=0.7, edgecolor='black')
    
    if metric == 'bias':
        ax.axhline(y=0, color='black', linestyle='--', linewidth=1, alpha=0.5)
    
    ax.set_xticks(x_pos)
    ax.set_xticklabels(param_names)
    ax.set_ylabel(metric_name)
    ax.set_title(f'Monte Carlo: {metric_name}')
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Print results
print("\n" + "="*80)
print("MONTE CARLO RESULTS (N=50, T=10)")
print("="*80)
print("\nBias:")
print(f"  Standard GMM: ρ={mc_results['gmm_bias'][0]:.4f}, β={mc_results['gmm_bias'][1]:.4f}")
print(f"  Bias-Corrected: ρ={mc_results['bc_bias'][0]:.4f}, β={mc_results['bc_bias'][1]:.4f}")
print(f"  Bias reduction: ρ={abs(mc_results['gmm_bias'][0]) - abs(mc_results['bc_bias'][0]):.4f}, "
      f"β={abs(mc_results['gmm_bias'][1]) - abs(mc_results['bc_bias'][1]):.4f}")

print("\nStandard Deviation:")
print(f"  Standard GMM: ρ={mc_results['gmm_std'][0]:.4f}, β={mc_results['gmm_std'][1]:.4f}")
print(f"  Bias-Corrected: ρ={mc_results['bc_std'][0]:.4f}, β={mc_results['bc_std'][1]:.4f}")

print("\nRMSE:")
print(f"  Standard GMM: ρ={mc_results['gmm_rmse'][0]:.4f}, β={mc_results['gmm_rmse'][1]:.4f}")
print(f"  Bias-Corrected: ρ={mc_results['bc_rmse'][0]:.4f}, β={mc_results['bc_rmse'][1]:.4f}")
print(f"  RMSE reduction: ρ={mc_results['gmm_rmse'][0] - mc_results['bc_rmse'][0]:.4f}, "
      f"β={mc_results['gmm_rmse'][1] - mc_results['bc_rmse'][1]:.4f}")

## 6. Economic Interpretation

### Key Findings:

1. **Persistence Parameter (ρ):**
   - True value: 0.5
   - Standard GMM shows downward bias (typical in dynamic panels)
   - Bias-Corrected GMM substantially reduces this bias
   - Interpretation: Half-life of shocks is more accurately estimated

2. **Exogenous Regressor (β):**
   - True value: 0.3
   - Bias correction also improves estimates for other coefficients
   - Standard errors properly adjusted for the correction

3. **When to Use Bias Correction:**
   - **N and T both moderate** (e.g., N=50-200, T=5-20): Substantial benefits
   - **High persistence** (ρ close to 1): Bias can be severe without correction
   - **Small T relative to N**: Bias correction more important

4. **Trade-offs:**
   - **Bias vs Variance:** Bias correction reduces bias but may slightly increase variance
   - **RMSE:** Usually improved overall (bias reduction dominates)
   - **Computational cost:** Minimal additional cost over standard GMM

### Practical Recommendations:

1. **Always report bias magnitude:** Use `.bias_magnitude()` to show correction size
2. **Check sensitivity:** Compare standard GMM vs bias-corrected
3. **Sample size warnings:** Heed warnings if N < 50 or T < 10
4. **Economic significance:** A 10% bias in ρ can substantially affect policy conclusions

### Example Application: Firm Growth Dynamics

If this were firm growth data:
- **ρ = 0.5** means shocks to growth have a half-life of ~1 period
- **Biased ρ → biased persistence estimates → incorrect forecasts**
- Bias correction ensures accurate inference about firm dynamics

### References:

1. Hahn, J., & Kuersteiner, G. (2002). "Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both n and T Are Large." *Econometrica*, 70(4), 1639-1657.

2. Arellano, M., & Bond, S. (1991). "Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations." *Review of Economic Studies*, 58(2), 277-297.

3. Bun, M.J.G., & Carree, M.A. (2005). "Bias-Corrected Estimation in Dynamic Panel Data Models." *Journal of Business & Economic Statistics*, 23(2), 200-210.

---

**PanelBox** - Advanced Panel Data Econometrics in Python  
https://github.com/bernardodionisi/panelbox