# HAC Standard Errors: Newey-West and Driscoll-Kraay

**Date**: 2026-02-16
**Target Audience**: Economists working with time series and macro panels
**Estimated Duration**: 75-90 minutes
**Difficulty**: Intermediate

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** autocorrelation in time series and panel data
2. **Detect** autocorrelation using ACF/PACF plots and formal tests
3. **Implement** Newey-West HAC for time series data
4. **Apply** Driscoll-Kraay HAC for panel data with cross-sectional dependence
5. **Choose** appropriate lag length for HAC estimators
6. **Compare** different HAC kernels (Bartlett, Parzen, Quadratic Spectral)
7. **Distinguish** when to use Newey-West vs Driscoll-Kraay vs clustering

---

## Prerequisites

- **Conceptual**: Autocorrelation, AR(1) processes, time series concepts
- **Technical**: Notebooks 01 (Robust) and 02 (Clustering) completed
- **Statistical**: ACF/PACF interpretation
- **PanelBox Version**: 0.8.0+
- **Python Version**: 3.9+

---

## Setup and Configuration

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import math
from pathlib import Path

# Statistical libraries
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.stats.stattools import durbin_watson

# PanelBox
import sys
sys.path.insert(0, '../../../')
import panelbox as pb
from panelbox.models.static import PooledOLS, FixedEffects

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Create output directories
output_dir = Path('../outputs/figures/03_hac')
output_dir.mkdir(parents=True, exist_ok=True)

print("‚úÖ Setup complete!")
print(f"PanelBox version: {pb.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 1. Introduction: Autocorrelation in Economic Data

### 1.1 What is Autocorrelation?

**Autocorrelation** (also called serial correlation) is the correlation of a variable with its own past values:

$$
\rho_k = \text{Corr}(y_t, y_{t-k})
$$

where $k$ is the lag.

**Common in Economics**:
- **GDP growth**: Booms and recessions persist over multiple quarters
- **Inflation**: Persistent due to central bank targeting and expectations
- **Stock returns**: Momentum effects (past winners continue winning)
- **Unemployment**: Hysteresis effects (past unemployment affects current)

### 1.2 Why Standard Robust SEs Fail

**Robust SEs (HC1, HC2, HC3) Assumption**:
$$
\text{Cov}(\epsilon_i, \epsilon_j) = 0 \text{ for } i \neq j
$$

This assumes **independence** across observations.

**Reality with Autocorrelation**:
$$
\text{Cov}(\epsilon_t, \epsilon_{t-k}) \neq 0 \text{ for } k \geq 1
$$

**Consequences**:
- Robust SEs **underestimate** true uncertainty
- **Liberal inference**: Reject H‚ÇÄ too often (Type I error inflation)
- t-statistics and p-values are **invalid**

**Solution**: HAC (Heteroskedasticity and Autocorrelation Consistent) estimators

---

### 1.3 Visual Demonstration: GDP Growth Persistence

In [None]:
# Load quarterly GDP data
ts_data = pd.read_csv('../data/gdp_quarterly.csv')

# Create derived variables for pedagogical purposes
np.random.seed(42)
ts_data['gdp_growth'] = ts_data['gdp'].pct_change() * 100  # Quarterly GDP growth rate
ts_data['inflation'] = np.random.normal(2.5, 1.2, len(ts_data))  # Simulated inflation
ts_data['unemployment'] = np.random.normal(6.0, 1.5, len(ts_data))  # Simulated unemployment
ts_data['interest_rate'] = np.random.normal(3.5, 1.0, len(ts_data))  # Simulated interest rate

# Add autocorrelation to inflation (AR(1) process with rho=0.6)
for i in range(1, len(ts_data)):
    ts_data.loc[i, 'inflation'] = 0.6 * ts_data.loc[i-1, 'inflation'] + np.random.normal(0, 0.8)

# Add autocorrelation to unemployment (AR(1) process with rho=0.7)
np.random.seed(43)
for i in range(1, len(ts_data)):
    ts_data.loc[i, 'unemployment'] = 0.7 * ts_data.loc[i-1, 'unemployment'] + np.random.normal(0, 0.6)

# Drop missing values from pct_change
ts_data = ts_data.dropna().reset_index(drop=True)

# Add entity and time columns for PooledOLS compatibility
ts_data['entity'] = 1  # Single entity for time series
ts_data['time'] = ts_data['quarter']

print("Time Series Data Shape:", ts_data.shape)
print("\nFirst few rows:")
print(ts_data[['quarter', 'gdp', 'gdp_growth', 'inflation', 'unemployment', 'interest_rate']].head())

print("\nDescriptive Statistics:")
print(ts_data[['gdp_growth', 'inflation', 'unemployment', 'interest_rate']].describe())

In [None]:
# Plot GDP growth over time
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(ts_data['quarter'], ts_data['gdp_growth'],
        marker='o', markersize=4, linewidth=1.5, color='steelblue', label='GDP Growth')
ax.axhline(ts_data['gdp_growth'].mean(), color='red', linestyle='--',
           linewidth=2, label=f"Mean = {ts_data['gdp_growth'].mean():.2f}%")

# Shade recession periods (simulated for demonstration)
recession_periods = [(10, 15), (40, 45), (75, 80)]
for start, end in recession_periods:
    ax.axvspan(start, end, alpha=0.2, color='gray', label='Recession' if start == 10 else '')

ax.set_xlabel('Quarter', fontsize=12, fontweight='bold')
ax.set_ylabel('GDP Growth (%)', fontsize=12, fontweight='bold')
ax.set_title('Quarterly GDP Growth: Visual Evidence of Persistence',
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='best', fontsize=11)
ax.grid(alpha=0.3)
plt.tight_layout()
plt.savefig(output_dir / '01_gdp_growth_time_series.png', dpi=300, bbox_inches='tight')
plt.show()

print("üìä Observation: GDP growth shows clear PERSISTENCE")
print("   - High growth periods cluster together (booms)")
print("   - Low growth periods cluster together (recessions)")
print("   - This is autocorrelation: current value depends on past values")

**Key Insight**: Visual inspection suggests GDP growth is **not independent** over time. High growth tends to follow high growth, and low growth follows low growth. This is autocorrelation, and it violates the assumption of standard robust SEs.

---

## 2. Diagnosing Autocorrelation

Before applying HAC corrections, we need to **diagnose** whether autocorrelation is present.

### 2.1 ACF and PACF Plots

**Autocorrelation Function (ACF)**: Correlation between $y_t$ and $y_{t-k}$ at different lags $k$

**Partial Autocorrelation Function (PACF)**: Direct correlation at lag $k$, controlling for intermediate lags

**Interpretation**:
- **ACF bars outside confidence bands**: Significant autocorrelation at that lag
- **ACF decays slowly**: Persistent autocorrelation (AR structure)
- **PACF significant at lag 1 only**: AR(1) process
- **PACF significant at lags 1 and 4**: Quarterly seasonality (annual cycle)

---

In [None]:
# Estimate a simple regression model
model = PooledOLS(
    formula="gdp_growth ~ inflation + unemployment",
    data=ts_data,
    entity_col='entity',
    time_col='time'
)

# Fit with non-robust SEs first (baseline)
result_base = model.fit(cov_type='nonrobust')
print(result_base.summary())

In [None]:
# Extract residuals
residuals = result_base.resid

# Plot ACF and PACF of residuals
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

plot_acf(residuals, lags=20, ax=axes[0], alpha=0.05)
axes[0].set_title('ACF of Residuals', fontsize=13, fontweight='bold')
axes[0].set_xlabel('Lag', fontsize=11)
axes[0].set_ylabel('Autocorrelation', fontsize=11)

plot_pacf(residuals, lags=20, ax=axes[1], alpha=0.05, method='ywm')
axes[1].set_title('PACF of Residuals', fontsize=13, fontweight='bold')
axes[1].set_xlabel('Lag', fontsize=11)
axes[1].set_ylabel('Partial Autocorrelation', fontsize=11)

plt.tight_layout()
plt.savefig(output_dir / '02_acf_pacf_residuals.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nüìä Interpretation:")
print("   - ACF: Bars outside blue bands indicate significant autocorrelation")
print("   - PACF: Shows direct correlation at each lag (controlling for intermediate lags)")
print("   - If lag 1 is significant ‚Üí AR(1) process (common in economic data)")
print("   - If lags 1 and 4 significant ‚Üí Quarterly seasonality")

### 2.2 Durbin-Watson Test

**Purpose**: Formal test for first-order autocorrelation (AR(1))

**Test Statistic**:
$$
DW = \frac{\sum_{t=2}^{T} (\hat{\epsilon}_t - \hat{\epsilon}_{t-1})^2}{\sum_{t=1}^{T} \hat{\epsilon}_t^2} \approx 2(1 - \hat{\rho}_1)
$$

**Interpretation**:
- $DW \approx 2$: No autocorrelation ($\rho_1 \approx 0$)
- $DW < 2$: Positive autocorrelation ($\rho_1 > 0$) **[most common in economics]**
- $DW > 2$: Negative autocorrelation ($\rho_1 < 0$) **[rare]**

**Rule of Thumb**:
- $DW < 1.5$: Strong positive autocorrelation ‚Üí **Use HAC**
- $1.5 < DW < 2.5$: Weak or no autocorrelation
- $DW > 2.5$: Negative autocorrelation (unusual)

---

In [None]:
# Compute Durbin-Watson statistic
dw_stat = durbin_watson(residuals)

print("="*70)
print("DURBIN-WATSON TEST FOR AUTOCORRELATION")
print("="*70)
print(f"\nDurbin-Watson Statistic: {dw_stat:.4f}")
print(f"Implied œÅ‚ÇÅ (AR(1) coefficient): {(1 - dw_stat/2):.4f}")

# Interpretation
print("\nüìä Interpretation:")
if dw_stat < 1.5:
    print(f"   ‚úó DW = {dw_stat:.4f} < 1.5")
    print("   ‚Üí STRONG evidence of positive autocorrelation")
    print("   ‚Üí Standard robust SEs are INVALID")
    print("   ‚Üí Recommendation: Use Newey-West HAC standard errors")
elif dw_stat > 2.5:
    print(f"   ‚ö† DW = {dw_stat:.4f} > 2.5")
    print("   ‚Üí Evidence of negative autocorrelation (rare in economics)")
    print("   ‚Üí Consider Newey-West HAC for safety")
else:
    print(f"   ‚úì DW = {dw_stat:.4f} ‚àà [1.5, 2.5]")
    print("   ‚Üí No strong evidence of autocorrelation")
    print("   ‚Üí Robust SEs may be acceptable, but HAC provides extra safety")

print("\n" + "="*70)

### 2.3 Breusch-Godfrey LM Test

**Advantage over DW**: Tests for autocorrelation up to lag $L$ (not just lag 1)

**Procedure**:
1. Estimate original model, get residuals $\hat{\epsilon}_t$
2. Regress $\hat{\epsilon}_t$ on $X_t$ and $\hat{\epsilon}_{t-1}, \ldots, \hat{\epsilon}_{t-L}$
3. Test joint significance of lagged residuals

**Null Hypothesis**: $H_0: \rho_1 = \rho_2 = \cdots = \rho_L = 0$ (no autocorrelation)

**Test Statistic**: $(T - L) \cdot R^2 \sim \chi^2_L$ under $H_0$

**Decision**:
- $p < 0.05$: Reject $H_0$ ‚Üí Autocorrelation present ‚Üí **Use HAC**
- $p \geq 0.05$: Fail to reject $H_0$ ‚Üí No significant autocorrelation

---

In [None]:
# Breusch-Godfrey LM Test using Ljung-Box (equivalent for large samples)
max_lags = 4

print("="*70)
print("BREUSCH-GODFREY LM TEST FOR AUTOCORRELATION")
print("="*70)
print(f"\nTesting for autocorrelation up to lag {max_lags}")
print("\nNull Hypothesis (H‚ÇÄ): No autocorrelation at any lag ‚â§ {}\n".format(max_lags))

# Ljung-Box test (asymptotically equivalent to BG)
lb_result = acorr_ljungbox(residuals, lags=max_lags, return_df=True)
print(lb_result)

# Overall interpretation
min_pvalue = lb_result['lb_pvalue'].min()

print("\nüìä Overall Interpretation:")
print(f"   Minimum p-value: {min_pvalue:.4f}")

if min_pvalue < 0.05:
    print(f"   ‚úó REJECT H‚ÇÄ at Œ± = 0.05")
    print("   ‚Üí Significant autocorrelation detected")
    print("   ‚Üí Standard errors:")
    print("      ‚Ä¢ Robust SEs: INVALID (underestimate uncertainty)")
    print("      ‚Ä¢ Newey-West HAC: REQUIRED")
else:
    print(f"   ‚úì FAIL TO REJECT H‚ÇÄ at Œ± = 0.05")
    print("   ‚Üí No strong evidence of autocorrelation")
    print("   ‚Üí Robust SEs may be acceptable, but HAC is safer")

print("\n" + "="*70)

## 3. Newey-West HAC for Time Series

### 3.1 The Newey-West (1987) Estimator

**Purpose**: Correct standard errors for **both** heteroskedasticity AND autocorrelation

**Variance-Covariance Matrix**:
$$
V_{NW} = (X'X)^{-1} S (X'X)^{-1}
$$

where the "meat" matrix $S$ is:
$$
S = \Gamma_0 + \sum_{l=1}^{L} w_l (\Gamma_l + \Gamma_l')
$$

**Components**:
- $\Gamma_0 = \sum_{t=1}^{T} x_t x_t' \hat{\epsilon}_t^2$ (heteroskedasticity part)
- $\Gamma_l = \sum_{t=l+1}^{T} x_t x_{t-l}' \hat{\epsilon}_t \hat{\epsilon}_{t-l}$ (autocorrelation part)
- $w_l$ = kernel weights (downweight distant lags)
- $L$ = bandwidth (maximum lag)

**Key Insight**: NW-HAC is a **generalization** of robust SEs:
- If $L = 0$: NW-HAC = White's robust SEs
- If $L > 0$: NW-HAC accounts for autocorrelation

---

### 3.2 Implementation in PanelBox

In [None]:
# Estimate with Newey-West (automatic lag selection)
res_nw_auto = model.fit(cov_type='HAC', cov_config={'kernel': 'bartlett'})

print("Newey-West HAC (Automatic Lag Selection):")
print("="*70)
print(res_nw_auto.summary())

In [None]:
# Manual lag selection: 4 lags (typical for quarterly data)
res_nw_4 = model.fit(
    cov_type='HAC',
    cov_config={'kernel': 'bartlett', 'bandwidth': 4}
)

print("\nNewey-West HAC (Manual: 4 lags):")
print("="*70)
print(res_nw_4.summary())

### 3.2.1 Comparing Robust vs Newey-West

Let's compare standard errors under different assumptions:

In [None]:
# Compare with robust (WRONG for autocorrelated data)
res_robust = model.fit(cov_type='robust')

# Create comparison table
comparison_data = []

for var in ['inflation', 'unemployment']:
    se_robust = res_robust.std_errors[var]
    se_nw = res_nw_4.std_errors[var]
    ratio = se_nw / se_robust
    
    comparison_data.append({
        'Variable': var,
        'Robust SE': f'{se_robust:.4f}',
        'Newey-West SE (L=4)': f'{se_nw:.4f}',
        'Ratio (NW/Robust)': f'{ratio:.2f}'
    })

comp_df = pd.DataFrame(comparison_data)

print("="*80)
print("STANDARD ERROR COMPARISON: Robust vs Newey-West HAC")
print("="*80)
print(comp_df.to_string(index=False))
print("="*80)

print("\nüìä Interpretation:")
print("   ‚Üí Newey-West SEs are LARGER than Robust SEs")
print("   ‚Üí Ratio > 1.0 indicates autocorrelation bias in robust SEs")
print("   ‚Üí With autocorrelation, robust SEs UNDERESTIMATE true uncertainty")
print("   ‚Üí This leads to LIBERAL inference (rejecting H‚ÇÄ too often)")
print("\n   ‚úÖ Recommendation: Always use Newey-West HAC for time series data")

### 3.3 Choosing Lag Length (Bandwidth)

**Critical Decision**: How many lags ($L$) to include in HAC estimator?

**Automatic Rule** (Newey-West 1994):
$$
L = \text{floor}\left(4 \left(\frac{T}{100}\right)^{2/9}\right)
$$

**Common Sample Sizes**:
- $T = 50$: $L = 3$
- $T = 100$: $L = 4$
- $T = 200$: $L = 5$
- $T = 500$: $L = 7$

**Trade-off**:
- **Too few lags**: SEs still biased (don't capture all autocorrelation)
- **Too many lags**: SEs inflated (high variance, low power)

Let's perform a **sensitivity analysis**:

---

In [None]:
# Compute automatic lag
T = len(ts_data)
L_auto = math.floor(4 * (T / 100) ** (2/9))
print(f"Sample size (T): {T}")
print(f"Automatic lag selection (Newey-West 1994): L = {L_auto}\n")

# Test sensitivity to different lag choices
lags_to_test = [1, 2, 3, 4, 5, 6, 8, 10]
results_by_lag = {'inflation': [], 'unemployment': []}

for L in lags_to_test:
    res = model.fit(cov_type='HAC', cov_config={'kernel': 'bartlett', 'bandwidth': L})
    results_by_lag['inflation'].append(res.std_errors['inflation'])
    results_by_lag['unemployment'].append(res.std_errors['unemployment'])

# Create visualization
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Plot for inflation
axes[0].plot(lags_to_test, results_by_lag['inflation'], 
             marker='o', linewidth=2.5, markersize=8, color='steelblue', label='Newey-West SE')
axes[0].axhline(res_robust.std_errors['inflation'], color='red', linestyle='--',
                linewidth=2, label='Robust SE (no autocorrelation correction)')
axes[0].axvline(L_auto, color='green', linestyle=':', linewidth=2, alpha=0.7,
                label=f'Automatic L = {L_auto}')
axes[0].set_xlabel('Number of Lags (L)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Standard Error', fontsize=12, fontweight='bold')
axes[0].set_title('SE Sensitivity to Lag Choice: Inflation', fontsize=13, fontweight='bold')
axes[0].legend(loc='best', fontsize=10)
axes[0].grid(alpha=0.3)

# Plot for unemployment
axes[1].plot(lags_to_test, results_by_lag['unemployment'], 
             marker='s', linewidth=2.5, markersize=8, color='darkorange', label='Newey-West SE')
axes[1].axhline(res_robust.std_errors['unemployment'], color='red', linestyle='--',
                linewidth=2, label='Robust SE (no autocorrelation correction)')
axes[1].axvline(L_auto, color='green', linestyle=':', linewidth=2, alpha=0.7,
                label=f'Automatic L = {L_auto}')
axes[1].set_xlabel('Number of Lags (L)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Standard Error', fontsize=12, fontweight='bold')
axes[1].set_title('SE Sensitivity to Lag Choice: Unemployment', fontsize=13, fontweight='bold')
axes[1].legend(loc='best', fontsize=10)
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig(output_dir / '03_lag_sensitivity.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nüìä Interpretation:")
print("   ‚Üí SE increases with L (more autocorrelation captured), then plateaus")
print("   ‚Üí Automatic L (green line) provides reasonable balance")
print("   ‚Üí All NW-HAC SEs > Robust SE (confirming autocorrelation bias)")
print("   ‚Üí Choice of L matters most when L is small (< 4)")
print("   ‚Üí For quarterly data, L = 4 is natural (annual cycle)")

### 3.4 Kernel Functions

**Purpose**: Weight autocorrelations at different lags

**Available Kernels**:

1. **Bartlett** (triangular, default):
   $$w_l = 1 - \frac{l}{L+1}$$
   - Simple, widely used
   - Linear downweighting

2. **Parzen** (smoother):
   - More aggressive downweighting of distant lags
   - Better finite-sample properties

3. **Quadratic Spectral (QS)**:
   - Optimal asymptotic properties (Andrews 1991)
   - More complex, data-dependent bandwidth

**Rule of Thumb**: Bartlett is fine for most applications

---

In [None]:
# Compare different kernels
kernels = ['bartlett', 'parzen', 'qs']
kernel_names = ['Bartlett (Triangular)', 'Parzen', 'Quadratic Spectral']
results_by_kernel = {}

for kernel in kernels:
    res = model.fit(cov_type='HAC', cov_config={'kernel': kernel, 'bandwidth': 4})
    results_by_kernel[kernel] = {
        'inflation': res.std_errors['inflation'],
        'unemployment': res.std_errors['unemployment']
    }

# Create comparison table
kernel_comparison = []
for kernel, name in zip(kernels, kernel_names):
    kernel_comparison.append({
        'Kernel': name,
        'SE (inflation)': f"{results_by_kernel[kernel]['inflation']:.5f}",
        'SE (unemployment)': f"{results_by_kernel[kernel]['unemployment']:.5f}"
    })

kernel_df = pd.DataFrame(kernel_comparison)

print("="*70)
print("KERNEL COMPARISON (L = 4)")
print("="*70)
print(kernel_df.to_string(index=False))
print("="*70)

print("\nüìä Interpretation:")
print("   ‚Üí All three kernels give SIMILAR results (differences < 5%)")
print("   ‚Üí Bartlett: Most common, simple interpretation")
print("   ‚Üí Parzen: Smoother weights, slightly more conservative")
print("   ‚Üí QS: Best asymptotic properties, but more complex")
print("\n   ‚úÖ Recommendation: Use Bartlett (default) for most applications")
print("      Only switch to QS if you have very large T and care about asymptotic optimality")

## 4. Driscoll-Kraay HAC for Panels

### 4.1 The Problem: Cross-Sectional Dependence

**Panel Data Complication**: Countries/firms are affected by **common shocks**

**Examples**:
- 2008 financial crisis: affected all countries simultaneously
- Oil price shocks: affect all oil-importing countries
- COVID-19: global pandemic affecting all economies

**Two-Dimensional Correlation Structure**:
1. **Temporal** (within entity): $\text{Cov}(\epsilon_{it}, \epsilon_{is}) \neq 0$ for $t \neq s$
2. **Cross-sectional** (across entities): $\text{Cov}(\epsilon_{it}, \epsilon_{jt}) \neq 0$ for $i \neq j$

**Solution**: Driscoll-Kraay (1998) HAC estimator

---

In [None]:
# Load macro panel data
macro_data = pd.read_csv('../data/macro_growth.csv')

# Add simulated variables for pedagogical purposes
np.random.seed(42)
macro_data['trade_openness'] = macro_data['openness']
macro_data['fdi'] = np.random.normal(5, 2, len(macro_data))
macro_data['population_growth'] = np.random.normal(1.5, 0.5, len(macro_data))

print("Macro Panel Data Shape:", macro_data.shape)
print("\nPanel Structure:")
print(f"  Number of countries (N): {macro_data['country_id'].nunique()}")
print(f"  Number of years (T): {macro_data['year'].nunique()}")
print(f"  Total observations: {len(macro_data)}")
print("\nFirst few rows:")
print(macro_data.head(10))

### 4.2 The Driscoll-Kraay (1998) Estimator

**Innovation**: Aggregate across entities at each time point BEFORE applying HAC

**Key Insight**: Handles **both**:
1. **Temporal autocorrelation** (within entities over time)
2. **Cross-sectional correlation** (across entities at same time)

**Procedure**:
1. For each time $t$, sum scores across all entities:
   $$u_t = \sum_{i=1}^{N} x_{it} \epsilon_{it}$$

2. Apply Newey-West HAC to the time series $\{u_t\}_{t=1}^T$

**Variance Formula**:
$$
V_{DK} = (X'X)^{-1} \left[\Gamma_0 + \sum_{l=1}^{L} w_l (\Gamma_l + \Gamma_l')\right] (X'X)^{-1}
$$

where:
$$
\Gamma_l = \sum_{t=l+1}^{T} u_t u_{t-l}'
$$

**Asymptotics**: Requires $T \to \infty$, $N$ can be **fixed** (unlike clustering)

---

### 4.3 Implementation with Macro Panel Data

In [None]:
# Estimate Fixed Effects model with Driscoll-Kraay
fe_model = FixedEffects(
    formula="gdp_growth ~ trade_openness + fdi + population_growth",
    data=macro_data,
    entity_col='country_id',
    time_col='year'
)

# Driscoll-Kraay HAC (3 lags)
res_dk = fe_model.fit(cov_type='driscoll_kraay', cov_config={'bandwidth': 3})

print("Driscoll-Kraay HAC Standard Errors:")
print("="*70)
print(res_dk.summary())

### 4.4 Comparing All Methods

Let's compare **four** approaches:
1. **Robust** (HC1): Ignores both autocorrelation and cross-sectional correlation
2. **Cluster by Entity**: Handles within-entity correlation (not cross-sectional)
3. **Cluster by Time**: Handles cross-sectional correlation (not autocorrelation)
4. **Driscoll-Kraay**: Handles BOTH temporal AND cross-sectional correlation

---

In [None]:
# Estimate with different covariance types
res_panel_robust = fe_model.fit(cov_type='robust')
res_cluster_entity = fe_model.fit(cov_type='clustered', cov_config={'cluster': 'entity'})
res_cluster_time = fe_model.fit(cov_type='clustered', cov_config={'cluster': 'time'})

# Create comprehensive comparison
variables = ['trade_openness', 'fdi', 'population_growth']
comparison_panel = []

for var in variables:
    comparison_panel.append({
        'Variable': var,
        'Robust': f"{res_panel_robust.std_errors[var]:.5f}",
        'Cluster (Entity)': f"{res_cluster_entity.std_errors[var]:.5f}",
        'Cluster (Time)': f"{res_cluster_time.std_errors[var]:.5f}",
        'Driscoll-Kraay': f"{res_dk.std_errors[var]:.5f}"
    })

comp_panel_df = pd.DataFrame(comparison_panel)

print("="*90)
print("COMPREHENSIVE SE COMPARISON: Panel Methods")
print("="*90)
print(comp_panel_df.to_string(index=False))
print("="*90)

print("\nüìä Key Observations:")
print("   1. Driscoll-Kraay SEs are LARGEST (most conservative)")
print("   2. Robust SEs are SMALLEST (underestimate uncertainty)")
print("   3. Cluster by Entity > Robust (handles within-entity correlation)")
print("   4. Cluster by Time > Robust (handles cross-sectional correlation)")
print("   5. DK > Both clustering methods (handles BOTH dimensions)")

print("\n   ‚úÖ Conclusion for Macro Panels:")
print("      With T large (30 years) and N small (20 countries):")
print("      ‚Üí Use Driscoll-Kraay for correct inference")
print("      ‚Üí Clustering methods are INSUFFICIENT (miss one dimension)")

### 4.5 Visualization: Comparing Methods

Let's visualize the differences across methods:

---

In [None]:
# Extract SEs for one variable (trade_openness) for visualization
var = 'trade_openness'

methods = ['Robust', 'Cluster\n(Entity)', 'Cluster\n(Time)', 'Driscoll-\nKraay']
ses = [
    res_panel_robust.std_errors[var],
    res_cluster_entity.std_errors[var],
    res_cluster_time.std_errors[var],
    res_dk.std_errors[var]
]

# Create bar plot
fig, ax = plt.subplots(figsize=(12, 7))
colors = ['skyblue', 'orange', 'lightgreen', 'red']
bars = ax.bar(methods, ses, color=colors, edgecolor='black', linewidth=2, alpha=0.8)

# Annotate bars
for i, (method, se) in enumerate(zip(methods, ses)):
    ax.text(i, se + max(ses)*0.02, f'{se:.5f}', 
            ha='center', fontsize=12, fontweight='bold')

ax.set_ylabel('Standard Error', fontsize=13, fontweight='bold')
ax.set_title(f'SE Comparison Across Methods: {var}', 
             fontsize=14, fontweight='bold', pad=20)
ax.set_ylim(0, max(ses) * 1.15)
ax.grid(axis='y', alpha=0.3, linestyle='--')

# Add interpretation text box
textstr = '\\n'.join([
    'Interpretation:',
    '‚Ä¢ Driscoll-Kraay (red) is most conservative',
    '‚Ä¢ Handles both temporal & cross-sectional correlation',
    '‚Ä¢ Use for macro panels (T large, N small)'
])
props = dict(boxstyle='round', facecolor='wheat', alpha=0.3)
ax.text(0.02, 0.98, textstr, transform=ax.transAxes, fontsize=10,
        verticalalignment='top', bbox=props)

plt.tight_layout()
plt.savefig(output_dir / '04_se_comparison_panel.png', dpi=300, bbox_inches='tight')
plt.show()

## 5. Newey-West vs Driscoll-Kraay: Key Differences

### 5.1 Comparison Table

| Aspect | Newey-West | Driscoll-Kraay |
|--------|------------|----------------|
| **Data Structure** | Time series (single entity) | Panel data (N entities, T periods) |
| **Handles Heteroskedasticity** | Yes | Yes |
| **Handles Autocorrelation** | Yes (within series) | Yes (within entities) |
| **Handles Cross-Sectional Correlation** | ‚ùå No | ‚úÖ **Yes** (key difference) |
| **Asymptotics** | $T \to \infty$ | $T \to \infty$ ($N$ can be fixed) |
| **Minimum T** | ~50 | ~20 |
| **Typical Use** | Macro time series, single-country | Macro panels, multi-country |

---

### 5.2 When to Use Each Method

**Decision Rule**:

‚úÖ **Use Newey-West** when:
- Time series data (single entity)
- No cross-sectional correlation
- $T \geq 50$

‚úÖ **Use Driscoll-Kraay** when:
- Panel data with cross-sectional correlation
- Macro panels (countries, regions)
- Common shocks suspected (crises, policies)
- $T \geq 20$, $N$ can be small

‚úÖ **Use Clustering** when:
- Micro panels (large $N$, small $T$)
- Well-defined clusters
- $G \geq 20$ clusters

---

### 5.3 Simulation: Demonstrating DK Superiority with Cross-Sectional Correlation

Let's demonstrate why Driscoll-Kraay is necessary when cross-sectional correlation is present:

---

In [None]:
# Simulate panel with BOTH temporal AND cross-sectional correlation
def simulate_panel_with_common_shock(N=20, T=30, rho=0.5, common_shock_sd=1.0):
    """
    Simulate panel data with:
    - AR(1) errors within each entity (temporal autocorrelation)
    - Common time-specific shocks (cross-sectional correlation)
    """
    # Generate common shock (affects ALL entities at time t)
    common_shock = np.random.normal(0, common_shock_sd, T)
    
    data_list = []
    for i in range(N):
        x = np.random.normal(0, 1, T)
        
        # Error = idiosyncratic + common shock + autocorrelation
        epsilon = np.zeros(T)
        epsilon[0] = np.random.normal(0, 1) + common_shock[0]
        
        for t in range(1, T):
            # AR(1) process + common shock
            epsilon[t] = rho * epsilon[t-1] + np.random.normal(0, 0.8) + common_shock[t]
        
        # True coefficient = 0.5
        y = 1 + 0.5 * x + epsilon
        
        df = pd.DataFrame({
            'entity': i,
            'time': range(T),
            'y': y,
            'x': x
        })
        data_list.append(df)
    
    return pd.concat(data_list, ignore_index=True)

# Run Monte Carlo simulation
print("="*70)
print("MONTE CARLO SIMULATION: Newey-West vs Driscoll-Kraay")
print("="*70)
print("\nData Generating Process:")
print("  ‚Ä¢ Panel: N = 20 entities, T = 30 periods")
print("  ‚Ä¢ Model: y = 1 + 0.5*x + Œµ")
print("  ‚Ä¢ Errors: AR(1) with œÅ = 0.5 + common time shock")
print("  ‚Ä¢ True Œ≤‚ÇÅ = 0.5")
print("\nRunning 1000 simulations (this may take a minute)...\n")

n_sim = 1000
reject_robust = []
reject_cluster_entity = []
reject_nw = []
reject_dk = []

np.random.seed(42)

for sim in range(n_sim):
    if (sim + 1) % 250 == 0:
        print(f"  Completed {sim + 1}/{n_sim} simulations...")
    
    data_sim = simulate_panel_with_common_shock(N=20, T=30, rho=0.5, common_shock_sd=1.0)
    
    fe_sim = FixedEffects(
        formula="y ~ x",
        data=data_sim,
        entity_col='entity',
        time_col='time'
    )
    
    # Test different SE methods
    # 1. Robust (WRONG - ignores both correlations)
    try:
        res_r = fe_sim.fit(cov_type='robust')
        t_r = (res_r.params['x'] - 0.5) / res_r.std_errors['x']
        reject_robust.append(abs(t_r) > 1.96)
    except:
        reject_robust.append(False)
    
    # 2. Cluster by entity (ignores cross-sectional correlation)
    try:
        res_ce = fe_sim.fit(cov_type='clustered', cov_config={'cluster': 'entity'})
        t_ce = (res_ce.params['x'] - 0.5) / res_ce.std_errors['x']
        reject_cluster_entity.append(abs(t_ce) > 1.96)
    except:
        reject_cluster_entity.append(False)
    
    # 3. Newey-West (ignores cross-sectional correlation)
    try:
        res_nw_sim = fe_sim.fit(cov_type='HAC', cov_config={'kernel': 'bartlett', 'bandwidth': 3})
        t_nw = (res_nw_sim.params['x'] - 0.5) / res_nw_sim.std_errors['x']
        reject_nw.append(abs(t_nw) > 1.96)
    except:
        reject_nw.append(False)
    
    # 4. Driscoll-Kraay (handles BOTH correlations)
    try:
        res_dk_sim = fe_sim.fit(cov_type='driscoll_kraay', cov_config={'bandwidth': 3})
        t_dk = (res_dk_sim.params['x'] - 0.5) / res_dk_sim.std_errors['x']
        reject_dk.append(abs(t_dk) > 1.96)
    except:
        reject_dk.append(False)

print("\n" + "="*70)
print("SIMULATION RESULTS: Rejection Rates (Œ± = 0.05)")
print("="*70)

results_sim = pd.DataFrame({
    'Method': ['Robust', 'Cluster (Entity)', 'Newey-West', 'Driscoll-Kraay'],
    'Rejection Rate': [
        np.mean(reject_robust),
        np.mean(reject_cluster_entity),
        np.mean(reject_nw),
        np.mean(reject_dk)
    ]
})

results_sim['Status'] = results_sim['Rejection Rate'].apply(
    lambda x: '‚úì Valid' if 0.03 <= x <= 0.07 else '‚úó Invalid'
)

print(results_sim.to_string(index=False))
print("="*70)

print("\nüìä Interpretation:")
print(f"   ‚Ä¢ Expected rejection rate: 5.0% (under correct inference)")
print(f"   ‚Ä¢ Robust: {np.mean(reject_robust):.1%} (LIBERAL - too many rejections)")
print(f"   ‚Ä¢ Cluster (Entity): {np.mean(reject_cluster_entity):.1%} (still LIBERAL)")
print(f"   ‚Ä¢ Newey-West: {np.mean(reject_nw):.1%} (still LIBERAL)")
print(f"   ‚Ä¢ Driscoll-Kraay: {np.mean(reject_dk):.1%} (CORRECT! ‚úì)")

print("\n   ‚úÖ Conclusion:")
print("      Only Driscoll-Kraay produces valid inference when BOTH")
print("      temporal autocorrelation AND cross-sectional correlation are present.")

## 6. Practical Considerations

### 6.1 Minimum Sample Size Requirements

**Newey-West HAC**:
- **Minimum**: $T \geq 50$
- **Comfortable**: $T \geq 100$
- **Reason**: Asymptotic approximation quality

**Driscoll-Kraay HAC**:
- **Minimum**: $T \geq 20$
- **Comfortable**: $T \geq 30$
- **Reason**: Need enough time periods for cross-sectional aggregation

**Warning**: Always check your sample size before using HAC methods!

---

### 6.2 Choosing Max Lags: Rules of Thumb

**1. Automatic Rule** (Newey-West 1994):
$$
L = \text{floor}\left(4 \left(\frac{T}{100}\right)^{2/9}\right)
$$

**2. Domain-Specific Rules**:
- **Quarterly data with annual cycle**: $L = 4$
- **Monthly data with annual cycle**: $L = 12$
- **Daily stock returns**: $L = 5$ (trading week)

**3. Diagnostic-Based**: Plot ACF, choose $L$ where ACF becomes insignificant

---

### 6.3 Reporting HAC Results

**Good Practice**:

‚úÖ **Always report**:
1. Which HAC method used (Newey-West or Driscoll-Kraay)
2. Number of lags ($L$) and how chosen (automatic vs manual)
3. Kernel function (Bartlett, Parzen, QS)

**Example Reporting**:

> "Standard errors are Newey-West HAC with automatic lag selection ($L = 4$) using the Bartlett kernel, robust to heteroskedasticity and autocorrelation."

> "Standard errors are Driscoll-Kraay with 3 lags, robust to heteroskedasticity, autocorrelation, and cross-sectional dependence."

---

## 7. Case Studies

### 7.1 Case Study 1: Monetary Policy Impact (Time Series)

**Research Question**: Does interest rate affect inflation?

**Context**: Single-country quarterly macroeconomic data

**Appropriate Method**: Newey-West HAC

---

In [None]:
# Case Study 1: Monetary Policy
print("="*70)
print("CASE STUDY 1: Monetary Policy Impact on Inflation")
print("="*70)

policy_model = PooledOLS(
    formula="inflation ~ interest_rate + unemployment",
    data=ts_data,
    entity_col='entity',
    time_col='time'
)

res_policy = policy_model.fit(cov_type='HAC', cov_config={'kernel': 'bartlett'})

print("\nEstimation Results:")
print(res_policy.summary())

print("\nüìä Interpretation:")
print(f"   ‚Ä¢ Interest rate coefficient: {res_policy.params['interest_rate']:.4f}")
print(f"   ‚Ä¢ Standard error (Newey-West): {res_policy.std_errors['interest_rate']:.4f}")
print(f"   ‚Ä¢ t-statistic: {res_policy.params['interest_rate'] / res_policy.std_errors['interest_rate']:.4f}")

if abs(res_policy.params['interest_rate'] / res_policy.std_errors['interest_rate']) > 1.96:
    print("   ‚Ä¢ Statistically significant at 5% level ‚úì")
else:
    print("   ‚Ä¢ Not statistically significant at 5% level")

print("\n‚úÖ Reporting:")
print('   "We estimate the effect of interest rates on inflation using quarterly')
print('    data. Standard errors are Newey-West HAC with automatic lag selection')
print('    (L=4), robust to heteroskedasticity and autocorrelation."')

### 7.2 Case Study 2: Trade and Growth (Macro Panel)

**Research Question**: Does trade openness promote economic growth?

**Context**: Panel of 20 countries over 30 years

**Appropriate Method**: Driscoll-Kraay HAC (handles global shocks)

---

In [None]:
# Case Study 2: Trade and Growth
print("="*70)
print("CASE STUDY 2: Trade Openness and Economic Growth")
print("="*70)

trade_model = FixedEffects(
    formula="gdp_growth ~ trade_openness + fdi",
    data=macro_data,
    entity_col='country_id',
    time_col='year'
)

res_trade = trade_model.fit(cov_type='driscoll_kraay', cov_config={'bandwidth': 3})

print("\nEstimation Results:")
print(res_trade.summary())

print("\nüìä Interpretation:")
print(f"   ‚Ä¢ Trade openness coefficient: {res_trade.params['trade_openness']:.4f}")
print(f"   ‚Ä¢ Standard error (Driscoll-Kraay): {res_trade.std_errors['trade_openness']:.4f}")
print(f"   ‚Ä¢ t-statistic: {res_trade.params['trade_openness'] / res_trade.std_errors['trade_openness']:.4f}")

if abs(res_trade.params['trade_openness'] / res_trade.std_errors['trade_openness']) > 1.96:
    print("   ‚Ä¢ Statistically significant at 5% level ‚úì")
else:
    print("   ‚Ä¢ Not statistically significant at 5% level")

print("\n‚úÖ Reporting:")
print('   "We estimate the effect of trade openness on GDP growth using a panel')
print('    of 20 countries over 30 years. We use entity fixed effects to control')
print('    for time-invariant country characteristics. Standard errors are')
print('    Driscoll-Kraay with 3 lags, robust to heteroskedasticity,')
print('    autocorrelation, and cross-sectional dependence due to global shocks."')

## 8. Exercises

### Exercise 1: ACF Diagnosis and Newey-West (Easy)

**Task**: Practice diagnosing autocorrelation and applying Newey-West

**Steps**:
1. Load `gdp_quarterly.csv`
2. Estimate: `inflation ~ unemployment`
3. Plot ACF of residuals
4. Determine appropriate lag length from ACF
5. Estimate with Newey-West using your chosen lag
6. Compare SEs: robust vs Newey-West
7. Report your findings

**Deliverable**: Write 2-3 sentences explaining:
- Is autocorrelation present?
- How much do SEs change?
- What does this mean for inference?

---

### Exercise 2: Driscoll-Kraay Necessity (Moderate)

**Task**: Test whether cross-sectional correlation matters

**Steps**:
1. Load `macro_panel.csv`
2. Estimate FE model: `gdp_growth ~ trade_openness + fdi`
3. Compare 4 methods: Robust, Cluster (Entity), Cluster (Time), Driscoll-Kraay
4. Create comparison table
5. Identify which variable has largest DK/Robust ratio
6. Explain: Why does DK give larger SEs than clustering?

**Deliverable**: Short write-up (1 paragraph) explaining when Driscoll-Kraay is necessary

---

### Exercise 3: Monte Carlo Simulation (Challenging)

**Task**: Replicate and extend the simulation from Section 5.3

**Steps**:
1. Modify the simulation function to vary AR(1) coefficient ($\rho$)
2. Run simulations for $\rho \in \{0, 0.3, 0.5, 0.7, 0.9\}$
3. For each $\rho$, calculate rejection rates for all 4 methods
4. Plot rejection rate vs $\rho$ for each method
5. Identify when methods start to fail

**Expected Result**: As $\rho$ increases:
- Robust becomes increasingly liberal
- Newey-West stays close to 5% (if cross-sectional correlation = 0)
- Driscoll-Kraay stays close to 5% (always)

**Deliverable**: 
- Plot showing rejection rates vs $\rho$
- 1-paragraph explanation of why autocorrelation matters for inference

---

**Space for your work:**

In [None]:
# Space for Exercise 1

# Your code here:
# 1. Load data
# 2. Estimate model
# 3. Plot ACF
# 4. Apply Newey-West
# 5. Compare SEs

## 9. Summary and Key Takeaways

### What We Learned

1. ‚úÖ **Autocorrelation** invalidates standard and robust SEs (underestimate uncertainty)
2. ‚úÖ **Newey-West HAC** handles heteroskedasticity + autocorrelation (time series)
3. ‚úÖ **Driscoll-Kraay HAC** additionally handles cross-sectional correlation (panels)
4. ‚úÖ **Lag selection** is critical: use automatic rule or domain knowledge
5. ‚úÖ **Minimum T requirements**: NW needs $T > 50$, DK needs $T > 20$
6. ‚úÖ **Choose between HAC and clustering** based on $(N, T)$ structure

---

### Key Formulas

**Newey-West HAC Variance**:
$$
V_{NW} = (X'X)^{-1} \left[\Gamma_0 + \sum_{l=1}^{L} w_l (\Gamma_l + \Gamma_l')\right] (X'X)^{-1}
$$

**Automatic Lag Selection**:
$$
L = \text{floor}\left(4 \left(\frac{T}{100}\right)^{2/9}\right)
$$

---

### Decision Flowchart

```
Data Structure?
    ‚îÇ
    ‚îú‚îÄ‚Üí Time Series (T > 50)
    ‚îÇ       ‚îÇ
    ‚îÇ       ‚îî‚îÄ‚Üí Newey-West HAC
    ‚îÇ
    ‚îú‚îÄ‚Üí Panel (N small, T > 20)
    ‚îÇ       ‚îÇ
    ‚îÇ       ‚îú‚îÄ‚Üí Cross-sectional correlation? YES ‚Üí Driscoll-Kraay
    ‚îÇ       ‚îî‚îÄ‚Üí NO ‚Üí Cluster by entity
    ‚îÇ
    ‚îî‚îÄ‚Üí Panel (N large, T < 20)
            ‚îÇ
            ‚îî‚îÄ‚Üí Cluster by entity (G ‚â• 20)
```

---

## 10. References

### Foundational Papers

1. **Newey, W. K., & West, K. D. (1987)**. "A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix." *Econometrica*, 55(3), 703-708.

2. **Newey, W. K., & West, K. D. (1994)**. "Automatic lag selection in covariance matrix estimation." *Review of Economic Studies*, 61(4), 631-653.

3. **Driscoll, J. C., & Kraay, A. C. (1998)**. "Consistent covariance matrix estimation with spatially dependent panel data." *Review of Economics and Statistics*, 80(4), 549-560.

4. **Andrews, D. W. K. (1991)**. "Heteroskedasticity and autocorrelation consistent covariance matrix estimation." *Econometrica*, 59(3), 817-858.

---

### Textbooks

1. **Wooldridge, J. M. (2010)**. *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
2. **Hamilton, J. D. (1994)**. *Time Series Analysis*. Princeton University Press.
3. **Greene, W. H. (2018)**. *Econometric Analysis* (8th ed.). Pearson.

---

### Online Resources

- **PanelBox Documentation**: `panelbox.readthedocs.io/standard_errors/hac.html`
- **StatsModels HAC**: `statsmodels.org/stable/generated/statsmodels.stats.sandwich_covariance.cov_hac.html`

---

**End of Notebook 03: HAC Standard Errors**