# Complete Validation and Diagnostic Tests

This notebook provides a **comprehensive guide to validation and diagnostic testing** for panel data models.

## What You'll Learn

- ✅ 20+ validation tests for panel data
- ✅ Specification tests (Hausman, RESET, Mundlak, Chow)
- ✅ Diagnostic tests (serial correlation, heteroskedasticity, cross-sectional dependence)
- ✅ Unit root tests (LLC, IPS, Fisher)
- ✅ Cointegration tests (Pedroni, Kao)
- ✅ ValidationSuite (run all tests at once)
- ✅ Interpretation and remediation strategies

## Table of Contents

1. [Introduction to Validation](#introduction)
2. [Setup and Data](#setup)
3. [Specification Tests](#specification-tests)
4. [Diagnostic Tests](#diagnostic-tests)
5. [Unit Root Tests](#unit-root)
6. [Cointegration Tests](#cointegration)
7. [ValidationSuite](#validation-suite)
8. [Remediation Guide](#remediation)

---

## 1. Introduction to Validation {#introduction}

### Why Validate?

**Model validation** is critical for reliable inference:
- Ensures assumptions are satisfied
- Detects specification errors
- Guides model improvements
- Increases credibility of results

### Types of Tests

| Category | Purpose | Examples |
|----------|---------|----------|
| **Specification** | Model form correct? | Hausman, RESET, Mundlak |
| **Diagnostics** | Assumptions satisfied? | Serial correlation, heteroskedasticity |
| **Stationarity** | Data properties | Unit root tests |
| **Long-run** | Cointegration | Pedroni, Kao |

### The Validation Workflow

```
1. Estimate model
   ↓
2. Run specification tests
   ↓
3. Run diagnostic tests
   ↓
4. Check for violations
   ↓
5. Apply remediation if needed
   ↓
6. Re-estimate and re-test
```

Let's start!

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

import panelbox as pb

# Configuration
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)
np.random.seed(42)

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print(f"PanelBox version: {pb.__version__}")
print("Validation toolkit ready!")

---

## 2. Setup and Data {#setup}

We'll use the **Grunfeld dataset** for demonstration.

In [None]:
# Load data
data = pb.load_grunfeld()

print("Dataset loaded:")
print(f"Shape: {data.shape}")
print(f"Variables: {list(data.columns)}")
print(f"\nFirst rows:")
data.head()

### Estimate Baseline Models

We'll test these models:

In [None]:
# Fixed Effects model
fe = pb.FixedEffects(
    formula="invest ~ value + capital",
    data=data,
    entity_col="firm",
    time_col="year"
)
fe_results = fe.fit()

print("Fixed Effects Model Estimated")
print("="*60)
print(fe_results.summary())

In [None]:
# Random Effects model
re = pb.RandomEffects(
    formula="invest ~ value + capital",
    data=data,
    entity_col="firm",
    time_col="year"
)
re_results = re.fit()

print("\nRandom Effects Model Estimated")
print("="*60)

---

## 3. Specification Tests {#specification-tests}

Specification tests check if the **model form is correct**.

### 3.1 Hausman Test (FE vs RE)

**H₀**: Random effects is consistent (no correlation between effects and regressors)

**H₁**: Fixed effects is consistent, random effects is not

**Rule**: If p < 0.05 → Use Fixed Effects

In [None]:
# Hausman test
hausman_test = pb.HausmanTest(fe_results, re_results)
hausman = hausman_test.run()

print("HAUSMAN TEST")
print("="*60)
print(hausman)
print(f"\nStatistic: {hausman.statistic:.4f}")
print(f"P-value: {hausman.pvalue:.4f}")
print(f"\nDecision: ", end="")
if hausman.pvalue < 0.05:
    print("✗ Reject H₀ → Use Fixed Effects")
    print("   (Random effects assumption violated)")
else:
    print("✓ Fail to reject H₀ → Can use Random Effects")
    print("   (RE is consistent and more efficient)")

### 3.2 RESET Test (Functional Form)

**Ramsey RESET test** checks for specification errors in functional form.

**H₀**: Model is correctly specified

**H₁**: Model has specification errors (missing variables, wrong functional form)

In [None]:
# RESET test
reset_test = pb.RESETTest(fe_results)

print("\nRESET TEST (Ramsey)")
print("="*60)
print(reset_test)
print(f"\nStatistic: {reset_test.statistic:.4f}")
print(f"P-value: {reset_test.pvalue:.4f}")
print(f"\nDecision: ", end="")
if reset_test.pvalue < 0.05:
    print("✗ Reject H₀ → Specification error detected")
    print("   Consider: non-linear terms, interactions, omitted variables")
else:
    print("✓ Fail to reject H₀ → Functional form appears correct")

### 3.3 Mundlak Test (RE Specification)

**Mundlak test** checks if random effects specification is appropriate.

**H₀**: Random effects specification is correct

**H₁**: Correlated random effects (need FE or include group means)

In [None]:
# Mundlak test
mundlak = pb.MundlakTest(re_results)

print("\nMUNDLAK TEST")
print("="*60)
print(mundlak)
print(f"\nStatistic: {mundlak.statistic:.4f}")
print(f"P-value: {mundlak.pvalue:.4f}")
print(f"\nDecision: ", end="")
if mundlak.pvalue < 0.05:
    print("✗ Reject H₀ → Use Fixed Effects or correlated RE")
else:
    print("✓ Fail to reject H₀ → RE specification OK")

### 3.4 Chow Test (Structural Break)

**Chow test** detects structural breaks in the panel.

**H₀**: No structural break

**H₁**: Structural break exists

In [None]:
# Chow test (test break at year 1945 - middle of sample)
chow = pb.ChowTest(fe_results, break_point=1945)

print("\nCHOW TEST (Structural Break)")
print("="*60)
print(f"Break point: 1945")
print(f"Statistic: {chow.statistic:.4f}")
print(f"P-value: {chow.pvalue:.4f}")
print(f"\nDecision: ", end="")
if chow.pvalue < 0.05:
    print("✗ Reject H₀ → Structural break detected")
    print("   Consider: split sample, interaction with time dummy")
else:
    print("✓ Fail to reject H₀ → No structural break")

---

## 4. Diagnostic Tests {#diagnostic-tests}

Diagnostic tests check if model **assumptions are satisfied**.

### 4.1 Serial Correlation Tests

Serial correlation violates the assumption $\mathbb{E}[\varepsilon_{it}\varepsilon_{is}] = 0$ for $t \neq s$.

#### Wooldridge AR Test

**H₀**: No first-order serial correlation

Most commonly used test for panel data.

In [None]:
# Wooldridge test
wooldridge = pb.WooldridgeARTest(fe_results)

print("WOOLDRIDGE AR TEST")
print("="*60)
print(wooldridge)
print(f"\nStatistic: {wooldridge.statistic:.4f}")
print(f"P-value: {wooldridge.pvalue:.4f}")
print(f"\nDecision: ", end="")
if wooldridge.pvalue < 0.05:
    print("✗ Reject H₀ → Serial correlation detected")
    print("   Remedy: Use robust SE (Driscoll-Kraay or Newey-West)")
else:
    print("✓ Fail to reject H₀ → No serial correlation")

#### Breusch-Godfrey Test

**H₀**: No serial correlation up to order p

In [None]:
# Breusch-Godfrey test
bg = pb.BreuschGodfreyTest(fe_results, lags=2)

print("\nBREUSCH-GODFREY TEST")
print("="*60)
print(f"Statistic: {bg.statistic:.4f}")
print(f"P-value: {bg.pvalue:.4f}")
print(f"\nDecision: ", end="")
if bg.pvalue < 0.05:
    print("✗ Serial correlation detected")
else:
    print("✓ No serial correlation")

#### Baltagi-Wu Test

**H₀**: No first-order serial correlation

Based on Durbin-Watson for panels.

In [None]:
# Baltagi-Wu test
bw = pb.BaltagiWuTest(fe_results)

print("\nBALTAGI-WU TEST")
print("="*60)
print(f"Statistic: {bw.statistic:.4f}")
print(f"Critical value (5%): ~2.0")
print(f"\nInterpretation: ", end="")
if bw.statistic < 1.5:
    print("Strong positive serial correlation")
elif bw.statistic > 2.5:
    print("Negative serial correlation")
else:
    print("No strong evidence of serial correlation")

### 4.2 Heteroskedasticity Tests

Heteroskedasticity violates $\mathbb{E}[\varepsilon_{it}^2] = \sigma^2$ (constant variance).

#### Modified Wald Test

**H₀**: Homoskedastic errors (constant variance across entities)

Most commonly used for panel data.

In [None]:
# Modified Wald test
mwald = pb.ModifiedWaldTest(fe_results)

print("MODIFIED WALD TEST")
print("="*60)
print(mwald)
print(f"\nStatistic: {mwald.statistic:.4f}")
print(f"P-value: {mwald.pvalue:.4f}")
print(f"\nDecision: ", end="")
if mwald.pvalue < 0.05:
    print("✗ Reject H₀ → Heteroskedasticity detected")
    print("   Remedy: Use robust SE (HC0-HC3, clustered)")
else:
    print("✓ Fail to reject H₀ → Homoskedastic errors")

#### Breusch-Pagan Test

**H₀**: Homoskedasticity

In [None]:
# Breusch-Pagan test
bp = pb.BreuschPaganTest(fe_results)

print("\nBREUSCH-PAGAN TEST")
print("="*60)
print(f"Statistic: {bp.statistic:.4f}")
print(f"P-value: {bp.pvalue:.4f}")
print(f"\nDecision: ", end="")
if bp.pvalue < 0.05:
    print("✗ Heteroskedasticity detected")
else:
    print("✓ Homoskedasticity")

#### White Test

**H₀**: Homoskedasticity (more general than BP)

In [None]:
# White test
white = pb.WhiteTest(fe_results)

print("\nWHITE TEST")
print("="*60)
print(f"Statistic: {white.statistic:.4f}")
print(f"P-value: {white.pvalue:.4f}")
print(f"\nDecision: ", end="")
if white.pvalue < 0.05:
    print("✗ Heteroskedasticity detected")
else:
    print("✓ Homoskedasticity")

### 4.3 Cross-Sectional Dependence Tests

Cross-sectional dependence: $\mathbb{E}[\varepsilon_{it}\varepsilon_{jt}] \neq 0$ for $i \neq j$

Common in macro panels (countries, regions).

#### Pesaran CD Test

**H₀**: No cross-sectional dependence

Most powerful test, works for large T.

In [None]:
# Pesaran CD test
cd = pb.PesaranCDTest(fe_results)

print("PESARAN CD TEST")
print("="*60)
print(cd)
print(f"\nStatistic: {cd.statistic:.4f}")
print(f"P-value: {cd.pvalue:.4f}")
print(f"\nDecision: ", end="")
if cd.pvalue < 0.05:
    print("✗ Reject H₀ → Cross-sectional dependence detected")
    print("   Remedy: Use Driscoll-Kraay SE or spatial models")
else:
    print("✓ Fail to reject H₀ → No cross-sectional dependence")

#### Breusch-Pagan LM Test

**H₀**: No cross-sectional dependence

Works for small T, large N.

In [None]:
# Breusch-Pagan LM test
bplm = pb.BreuschPaganLMTest(fe_results)

print("\nBREUSCH-PAGAN LM TEST")
print("="*60)
print(f"Statistic: {bplm.statistic:.4f}")
print(f"P-value: {bplm.pvalue:.4f}")
print(f"\nDecision: ", end="")
if bplm.pvalue < 0.05:
    print("✗ Cross-sectional dependence detected")
else:
    print("✓ No cross-sectional dependence")

#### Frees Test

**H₀**: No cross-sectional dependence

Distribution-free test.

In [None]:
# Frees test
frees = pb.FreesTest(fe_results)

print("\nFREES TEST")
print("="*60)
print(f"Statistic: {frees.statistic:.4f}")
print(f"Critical value (5%): {frees.critical_value:.4f}")
print(f"\nDecision: ", end="")
if frees.statistic > frees.critical_value:
    print("✗ Cross-sectional dependence detected")
else:
    print("✓ No cross-sectional dependence")

---

## 5. Unit Root Tests {#unit-root}

Unit root tests check if variables are **stationary** (constant mean/variance over time).

**Why it matters**: Non-stationary variables can lead to spurious regressions!

### 5.1 LLC Test (Levin-Lin-Chu)

**H₀**: All panels contain a unit root (non-stationary)

**H₁**: All panels are stationary

**Assumption**: Homogeneous autoregressive coefficient (restrictive)

In [None]:
# LLC test for 'value'
llc_value = pb.LLCTest(data, var='value', entity_col='firm', time_col='year')

print("LLC TEST - Value")
print("="*60)
print(llc_value)
print(f"\nStatistic: {llc_value.statistic:.4f}")
print(f"P-value: {llc_value.pvalue:.4f}")
print(f"\nDecision: ", end="")
if llc_value.pvalue < 0.05:
    print("✓ Reject H₀ → Variable is stationary")
else:
    print("✗ Fail to reject H₀ → Unit root present (non-stationary)")
    print("   Consider: first-differencing or cointegration")

### 5.2 IPS Test (Im-Pesaran-Shin)

**H₀**: All panels contain a unit root

**H₁**: Some panels are stationary (at least one)

**Advantage**: Allows heterogeneous autoregressive coefficients (more flexible)

In [None]:
# IPS test for 'value'
ips_value = pb.IPSTest(data, var='value', entity_col='firm', time_col='year')

print("\nIPS TEST - Value")
print("="*60)
print(ips_value)
print(f"\nStatistic: {ips_value.statistic:.4f}")
print(f"P-value: {ips_value.pvalue:.4f}")
print(f"\nDecision: ", end="")
if ips_value.pvalue < 0.05:
    print("✓ Reject H₀ → At least some panels stationary")
else:
    print("✗ Unit root in all panels")

### 5.3 Fisher Test

**H₀**: All panels contain a unit root

**H₁**: At least one panel is stationary

**Method**: Combines p-values from individual ADF tests

In [None]:
# Fisher test for 'value'
fisher_value = pb.FisherTest(data, var='value', entity_col='firm', time_col='year')

print("\nFISHER TEST - Value")
print("="*60)
print(fisher_value)
print(f"\nStatistic: {fisher_value.statistic:.4f}")
print(f"P-value: {fisher_value.pvalue:.4f}")
print(f"\nDecision: ", end="")
if fisher_value.pvalue < 0.05:
    print("✓ Reject H₀ → Evidence of stationarity")
else:
    print("✗ Unit root present")

### Unit Root Summary

Compare all three tests:

In [None]:
# Summary table
unit_root_summary = pd.DataFrame({
    'Test': ['LLC', 'IPS', 'Fisher'],
    'Statistic': [llc_value.statistic, ips_value.statistic, fisher_value.statistic],
    'P-value': [llc_value.pvalue, ips_value.pvalue, fisher_value.pvalue],
    'Decision': [
        'Stationary' if llc_value.pvalue < 0.05 else 'Unit Root',
        'Stationary' if ips_value.pvalue < 0.05 else 'Unit Root',
        'Stationary' if fisher_value.pvalue < 0.05 else 'Unit Root'
    ]
})

print("\nUNIT ROOT TEST SUMMARY - Value Variable")
print("="*60)
print(unit_root_summary.to_string(index=False))
print("\nRecommendation:")
print("- If all tests reject → Variable is stationary ✓")
print("- If all tests fail to reject → Variable has unit root")
print("- If mixed results → Check with additional lags or transformations")

---

## 6. Cointegration Tests {#cointegration}

**Cointegration**: Long-run equilibrium relationship between non-stationary variables.

If $y$ and $x$ are I(1) but $y - \beta x$ is I(0), they are **cointegrated**.

### 6.1 Pedroni Test

**H₀**: No cointegration

**H₁**: Cointegration exists

Provides 7 different test statistics.

In [None]:
# Pedroni test
pedroni = pb.PedroniTest(fe_results)

print("PEDRONI COINTEGRATION TEST")
print("="*60)
print(pedroni)
print("\nPanel statistics (within-dimension):")
print(f"  Panel v-stat: {pedroni.panel_v:.4f}")
print(f"  Panel rho-stat: {pedroni.panel_rho:.4f}")
print(f"  Panel PP-stat: {pedroni.panel_pp:.4f}")
print(f"  Panel ADF-stat: {pedroni.panel_adf:.4f}")
print("\nGroup statistics (between-dimension):")
print(f"  Group rho-stat: {pedroni.group_rho:.4f}")
print(f"  Group PP-stat: {pedroni.group_pp:.4f}")
print(f"  Group ADF-stat: {pedroni.group_adf:.4f}")
print("\nDecision: If majority of tests reject H₀ → Cointegration exists")

### 6.2 Kao Test

**H₀**: No cointegration

**H₁**: Cointegration exists

Simpler than Pedroni, single test statistic.

In [None]:
# Kao test
kao = pb.KaoTest(fe_results)

print("\nKAO COINTEGRATION TEST")
print("="*60)
print(kao)
print(f"\nStatistic: {kao.statistic:.4f}")
print(f"P-value: {kao.pvalue:.4f}")
print(f"\nDecision: ", end="")
if kao.pvalue < 0.05:
    print("✓ Reject H₀ → Cointegration exists")
    print("   Long-run relationship is valid")
else:
    print("✗ Fail to reject H₀ → No cointegration")
    print("   Spurious regression risk if variables are I(1)")

---

## 7. ValidationSuite - Run All Tests at Once {#validation-suite}

**ValidationSuite** runs all relevant tests and creates a comprehensive report.

In [None]:
# Create validation suite
suite = pb.ValidationSuite(fe_results)

# Run all tests
suite.run_all()

print("VALIDATION SUITE - COMPREHENSIVE REPORT")
print("="*70)
print(suite.summary())

### Visualize Results

In [None]:
# Plot validation results
fig, ax = plt.subplots(figsize=(12, 6))

# Get test results
test_names = []
test_results = []
test_colors = []

for test_name, result in suite.results.items():
    test_names.append(test_name)
    passed = result.pvalue > 0.05  # Simplified
    test_results.append(1 if passed else 0)
    test_colors.append('green' if passed else 'red')

# Create bar plot
y_pos = np.arange(len(test_names))
ax.barh(y_pos, test_results, color=test_colors, alpha=0.7)
ax.set_yticks(y_pos)
ax.set_yticklabels(test_names)
ax.set_xlabel('Result', fontsize=12)
ax.set_title('Validation Test Results', fontsize=14, fontweight='bold')
ax.set_xlim([0, 1.2])

# Add legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='green', alpha=0.7, label='Pass'),
    Patch(facecolor='red', alpha=0.7, label='Fail')
]
ax.legend(handles=legend_elements, loc='lower right')

plt.tight_layout()
plt.show()

print("\nGreen = Test passed")
print("Red = Test failed (assumption violated)")

---

## 8. Remediation Guide {#remediation}

### Decision Tree: What to Do When Tests Fail

```
Test Failed → What to do?
│
├─ Hausman test failed
│  └→ Use Fixed Effects (not Random Effects)
│
├─ RESET test failed
│  └→ Add non-linear terms, interactions, or omitted variables
│
├─ Serial correlation detected
│  ├→ Use Driscoll-Kraay SE
│  ├→ Use Newey-West SE
│  └→ Or include lagged dependent variable (→ GMM)
│
├─ Heteroskedasticity detected
│  ├→ Use robust SE (HC0-HC3)
│  ├→ Use clustered SE
│  └→ Or use WLS (weighted least squares)
│
├─ Cross-sectional dependence
│  ├→ Use Driscoll-Kraay SE
│  ├→ Add common time effects
│  └→ Or use spatial econometrics models
│
├─ Unit root detected
│  ├→ First-difference the data
│  ├→ Test for cointegration
│  └→ If cointegrated → Error correction model
│
└─ No cointegration (but unit roots)
   └→ Use differenced data (avoid spurious regression)
```

### Quick Reference Table

| Problem | Solutions | PanelBox Implementation |
|---------|-----------|------------------------|
| **Serial Correlation** | Robust SE | `fit(cov_type='driscoll_kraay')` |
| | | `fit(cov_type='newey_west')` |
| **Heteroskedasticity** | Robust SE | `fit(cov_type='HC1')` |
| | Clustered SE | `fit(cov_type='clustered')` |
| **Cross-sectional Dep** | Driscoll-Kraay | `fit(cov_type='driscoll_kraay')` |
| **Unit Root** | First difference | Use `FirstDifferenceEstimator` |
| | GMM | Use `DifferenceGMM` or `SystemGMM` |
| **Non-stationarity** | Cointegration | ECM (coming soon) |

### Best Practices

1. ✅ **Always test**: Don't assume assumptions hold
2. ✅ **Test in order**: Specification → Diagnostics → Stationarity
3. ✅ **Use ValidationSuite**: Run all tests systematically
4. ✅ **Report tests**: Include in papers/reports
5. ✅ **Apply remediation**: Fix violations before making inference
6. ✅ **Re-test**: After remediation, test again

---

## Summary

You learned:

✅ **Specification Tests**: Hausman, RESET, Mundlak, Chow
✅ **Serial Correlation**: Wooldridge, Breusch-Godfrey, Baltagi-Wu
✅ **Heteroskedasticity**: Modified Wald, Breusch-Pagan, White
✅ **Cross-sectional Dependence**: Pesaran CD, BP-LM, Frees
✅ **Unit Root**: LLC, IPS, Fisher
✅ **Cointegration**: Pedroni, Kao
✅ **ValidationSuite**: Comprehensive testing framework
✅ **Remediation**: How to fix violations

### Key Takeaways

1. **Validation is not optional** - always test your assumptions
2. **ValidationSuite** makes it easy to run all tests at once
3. **Most violations can be fixed** with robust SE or model adjustments
4. **Report your tests** to increase credibility

### Next Steps

- **[04_robust_inference.ipynb](./04_robust_inference.ipynb)**: Deep dive into robust SE and bootstrap
- **[05_report_generation.ipynb](./05_report_generation.ipynb)**: Create publication-ready reports
- **[02_dynamic_gmm_complete.ipynb](./02_dynamic_gmm_complete.ipynb)**: If you have dynamics

---

*Validate with confidence using PanelBox!*