# Complete Guide to Dynamic Panel GMMThis notebook provides a **comprehensive guide to dynamic panel data models** using the Generalized Method of Moments (GMM) - the flagship feature of PanelBox.## What You'll Learn- ✅ Why GMM? (When OLS/FE fail)- ✅ Difference GMM (Arellano-Bond 1991)- ✅ System GMM (Blundell-Bond 1998)- ✅ Instrument selection and collapse option- ✅ All 5 GMM specification tests- ✅ Troubleshooting and common pitfalls- ✅ Difference vs System GMM comparison## Table of Contents1. [Why GMM?](#why-gmm)2. [Data Preparation](#data-preparation)3. [Difference GMM](#difference-gmm)4. [System GMM](#system-gmm)5. [Specification Tests](#specification-tests)6. [Troubleshooting](#troubleshooting)7. [Comparison](#comparison)8. [Decision Guide](#decision-guide)---

## 1. Why GMM? {#why-gmm}### The Dynamic Panel ProblemConsider: $y_{it} = \alpha y_{it-1} + \beta' X_{it} + \eta_i + \varepsilon_{it}$**Why OLS fails**: $\mathbb{E}[y_{it-1} \eta_i] \neq 0$ → **Upward bias****Why FE fails**: $(y_{it-1} - \bar{y}_i)$ correlated with $(\varepsilon_{it} - \bar{\varepsilon}_i)$ → **Downward bias (Nickell bias)**### The GMM Solution1. **First-difference** to eliminate $\eta_i$2. Use **lags as instruments** (valid instruments)### Key ResultIn well-specified GMM: $\hat{\alpha}_{FE} < \hat{\alpha}_{GMM} < \hat{\alpha}_{OLS}$Let's demonstrate this!

In [None]:
# Import librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport panelbox as pb# Configurationpd.set_option('display.max_columns', None)pd.set_option('display.precision', 4)np.random.seed(42)plt.style.use('seaborn-v0_8-darkgrid')sns.set_palette("husl")print(f"PanelBox version: {pb.__version__}")print("Ready for GMM!")

---## 2. Data Preparation {#data-preparation}We'll use the **Arellano-Bond employment dataset** - the classic dataset for GMM.

In [None]:
# Load Arellano-Bond datadata = pb.load_abdata()print("Arellano-Bond Dataset:")print("="*60)print(f"Shape: {data.shape}")print(f"\nVariables: {list(data.columns)}")print(f"\nFirst rows:")data.head(10)

### Check Panel Structure

In [None]:
# Panel structureprint("Panel Structure Analysis:")print("="*60)print(f"Number of firms (N): {data['id'].nunique()}")print(f"Number of years (T): {data['year'].nunique()}")  print(f"Total observations: {len(data)}")print(f"Expected if balanced: {data['id'].nunique() * data['year'].nunique()}")print(f"Panel type: {'Balanced' if len(data) == data['id'].nunique() * data['year'].nunique() else 'Unbalanced'}")

---## 3. Difference GMM (Arellano-Bond 1991) {#difference-gmm}### Theory**Step 1**: First-difference to eliminate $\eta_i$$$\Delta y_{it} = \alpha \Delta y_{it-1} + \beta' \Delta X_{it} + \Delta\varepsilon_{it}$$**Step 2**: Use deeper lags as instrumentsValid instruments: $y_{it-2}, y_{it-3}, ..., y_{i1}$ (uncorrelated with $\Delta\varepsilon_{it}$)### Implementation

In [None]:
# Estimate Difference GMMdiff_gmm = pb.DifferenceGMM(    data=data,    dep_var='n',          # Employment    lags=1,               # One lag of dependent variable    id_var='id',          # Firm ID    time_var='year',      # Time variable    exog_vars=['w', 'k'], # Wages and capital    time_dummies=False,   # No time dummies (for simplicity)    collapse=True,        # ⭐ RECOMMENDED: Collapse instruments (Roodman 2009)    two_step=True,        # Two-step estimation    robust=True           # Windmeijer correction)diff_gmm_results = diff_gmm.fit()print("="*70)print("DIFFERENCE GMM (ARELLANO-BOND 1991)")print("="*70)print(diff_gmm_results.summary())

---## 4. System GMM (Blundell-Bond 1998) {#system-gmm}### When to Use System GMMUse System GMM when:- Variables are **persistent** (high autocorrelation)- Difference GMM instruments are **weak**- You have **additional moment conditions** available### TheorySystem GMM adds **level equations** to difference equations:**Difference**: $\Delta y_{it} = \alpha \Delta y_{it-1} + ... + \Delta\varepsilon_{it}$**Level**: $y_{it} = \alpha y_{it-1} + ... + \eta_i + \varepsilon_{it}$Uses $\Delta y_{it-1}$ as instrument for level equation!### Implementation

In [None]:
# Estimate System GMMsys_gmm = pb.SystemGMM(    data=data,    dep_var='n',    lags=1,    id_var='id',    time_var='year',    exog_vars=['w', 'k'],    time_dummies=False,    collapse=True,        # ⭐ Always use collapse    two_step=True,    robust=True)sys_gmm_results = sys_gmm.fit()print("="*70)print("SYSTEM GMM (BLUNDELL-BOND 1998)")print("="*70)print(sys_gmm_results.summary())

---## 5. GMM Specification Tests - CRITICAL! {#specification-tests}### The 5 Essential Tests1. **Hansen J-test**: Overidentification2. **Sargan test**: Alternative overidentification3. **AR(1) test**: First-order serial correlation4. **AR(2) test**: Second-order serial correlation5. **Instrument ratio**: Instrument proliferation checkLet's examine each:

In [None]:
# Extract test resultsprint("="*70)print("GMM SPECIFICATION TESTS")print("="*70)print("\n1. HANSEN J-TEST (Overidentification)")print("-"*60)print(f"Statistic: {sys_gmm_results.hansen_j.statistic:.4f}")print(f"P-value: {sys_gmm_results.hansen_j.pvalue:.4f}")print(f"Interpretation: ", end="")if sys_gmm_results.hansen_j.pvalue > 0.10:    print("✓ PASS (p > 0.10) - Instruments valid")else:    print("✗ FAIL (p < 0.10) - Instruments may be invalid")print("\n2. AR(1) TEST")print("-"*60)print(f"Statistic: {sys_gmm_results.ar1_test.statistic:.4f}")print(f"P-value: {sys_gmm_results.ar1_test.pvalue:.4f}")print(f"Interpretation: ", end="")if sys_gmm_results.ar1_test.pvalue < 0.05:    print("✓ PASS - AR(1) expected in differenced errors")else:    print("⚠ Unexpected - Check specification")print("\n3. AR(2) TEST - MOST IMPORTANT!")print("-"*60)print(f"Statistic: {sys_gmm_results.ar2_test.statistic:.4f}")print(f"P-value: {sys_gmm_results.ar2_test.pvalue:.4f}")print(f"Interpretation: ", end="")if sys_gmm_results.ar2_test.pvalue > 0.10:    print("✓ PASS (p > 0.10) - No AR(2), instruments valid")else:    print("✗ FAIL (p < 0.10) - AR(2) present, instruments invalid!")print("\n4. INSTRUMENT RATIO")print("-"*60)print(f"Number of instruments: {sys_gmm_results.n_instruments}")print(f"Number of groups: {sys_gmm_results.n_groups}")print(f"Instrument ratio: {sys_gmm_results.instrument_ratio:.3f}")print(f"Recommendation: ", end="")if sys_gmm_results.instrument_ratio < 1.0:    print("✓ GOOD (ratio < 1.0) - Not too many instruments")else:    print("⚠ WARNING (ratio >= 1.0) - Too many instruments!")    print("  → Use collapse=True or reduce lags")

### Interpretation Guide| Test | Desired Result | If Failed ||------|----------------|-----------|| Hansen J | p > 0.10 | Try different instruments, check for weak instruments || AR(1) | p < 0.05 | Usually OK, expected in differences || AR(2) | **p > 0.10** | ⚠ **Critical failure!** Instruments invalid, use deeper lags || Inst. Ratio | < 1.0 | Use `collapse=True`, reduce lag depth |**Golden Rule**: AR(2) test is most important. If it fails, **do not trust your results**!

---## 7. Difference vs System GMM {#comparison}Let's compare all estimators:

In [None]:
# Also estimate OLS and FE for comparisonfrom panelbox import PooledOLS, FixedEffects# Create lagged variabledata_lag = data.copy()data_lag = data_lag.sort_values(['id', 'year'])data_lag['n_lag1'] = data_lag.groupby('id')['n'].shift(1)data_lag = data_lag.dropna()# Pooled OLS (biased upward)ols = PooledOLS(    formula="n ~ n_lag1 + w + k",    data=data_lag,    entity_col='id',    time_col='year')ols_results = ols.fit()# Fixed Effects (biased downward - Nickell bias)fe = FixedEffects(    formula="n ~ n_lag1 + w + k",    data=data_lag,    entity_col='id',    time_col='year')fe_results = fe.fit()# Comparison tablecomparison = pd.DataFrame({    'Model': ['OLS (upward bias)', 'FE (Nickell bias)', 'Diff GMM', 'Sys GMM'],    'n_lag1': [        ols_results.params['n_lag1'],        fe_results.params['n_lag1'],        diff_gmm_results.params.get('n.L1', np.nan),        sys_gmm_results.params.get('n.L1', np.nan)    ],    'SE': [        ols_results.std_errors['n_lag1'],        fe_results.std_errors['n_lag1'],        diff_gmm_results.std_errors.get('n.L1', np.nan),        sys_gmm_results.std_errors.get('n.L1', np.nan)    ]})print("="*70)print("ESTIMATOR COMPARISON")print("="*70)print(comparison.to_string(index=False))print("\nExpected: FE < GMM < OLS")print("If GMM outside this range → specification problem!")

---## 8. Decision Guide {#decision-guide}### When to Use Which GMM?```Do you have a lagged dependent variable?    |    YES → Dynamic panel    |     |    |     Is the series highly persistent (ρ > 0.8)?    |     |    |     YES → System GMM ✓    |     |     (More efficient, uses level moments)    |     |    |     NO → Difference GMM ✓    |           (Safer, fewer assumptions)    |    NO → Use static models (see notebook 01)```### Best Practices1. ✅ **Always use `collapse=True`** (Roodman 2009)2. ✅ **Check AR(2) test** (most critical!)3. ✅ **Hansen J p-value > 0.10**4. ✅ **Instrument ratio < 1.0**5. ✅ **Compare with OLS and FE** (GMM should be between them)6. ✅ **Use two-step with Windmeijer correction**7. ✅ **Report all specification tests**### Common Mistakes to Avoid- ❌ Using `collapse=False` (too many instruments)- ❌ Ignoring AR(2) test failure- ❌ Using time dummies with unbalanced panels- ❌ Not checking instrument ratio- ❌ GMM estimate outside OLS-FE bounds---## SummaryYou learned:✅ **Why GMM**: Solves dynamic panel bias✅ **Difference GMM**: First-differencing + lag instruments✅ **System GMM**: Additional level moments✅ **5 Specification Tests**: Hansen, AR(1), AR(2), Sargan, Inst. Ratio✅ **Best Practices**: collapse=True, check AR(2)✅ **When to Use Which**: Decision tree### Next Steps- **[03_validation_complete.ipynb](./03_validation_complete.ipynb)**: Validation tests- **[04_robust_inference.ipynb](./04_robust_inference.ipynb)**: Advanced inference- **[08_unbalanced_panels.ipynb](./08_unbalanced_panels.ipynb)**: Unbalanced panel tricks---*Master GMM with PanelBox!*