# Spatial HAC Standard Errors

**Tutorial 04: Geographic Spillovers and Spatial Correlation**

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** spatial correlation and geographic spillovers in economic data
2. **Construct** spatial distance matrices from coordinates using Haversine formula
3. **Implement** Conley (1999) Spatial HAC estimator
4. **Choose** appropriate spatial and temporal cutoffs
5. **Compare** spatial HAC kernels (Bartlett, Uniform, Epanechnikov)
6. **Apply** spatial methods to geographic economic problems
7. **Visualize** spatial correlation using maps

---

**Estimated Duration**: 90-120 minutes  
**Difficulty Level**: Advanced  
**Prerequisites**: Notebooks 01-03 completed

---

In [None]:
# Setup and Configuration
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial.distance import cdist
from scipy.stats import t as t_dist, norm
import warnings

# PanelBox imports
import panelbox as pb
from panelbox.models.static import PooledOLS

# Set random seed
np.random.seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
warnings.filterwarnings('ignore')

print('✓ Environment configured successfully')
print(f'PanelBox version: {pb.__version__}')

## 1. Introduction: Spatial Correlation in Economics

### 1.1 What is Spatial Correlation?

**Definition**: Geographic proximity creates correlated outcomes.

**Tobler's First Law of Geography**:
> "Everything is related to everything else, but near things are more related than distant things."

#### Economic Examples

1. **Agriculture**: Weather patterns affect neighboring farms simultaneously
2. **Real Estate**: House prices in neighborhoods correlated
3. **Health**: Disease transmission between nearby regions
4. **Innovation**: Knowledge spillovers between nearby firms
5. **Environment**: Air pollution disperses to nearby areas

### 1.2 Why Standard Methods Fail

- **Standard SEs**: Assume independence → SEs too small
- **Clustering**: Respects administrative boundaries (arbitrary)
- **Spatial HAC**: Accounts for distance-based correlation ✓

In [None]:
# Simulate agricultural panel data with spatial correlation
np.random.seed(42)

# Parameters
n_counties = 200
n_years = 10

# Generate coordinates (US-like)
latitudes = np.random.uniform(30, 48, n_counties)
longitudes = np.random.uniform(-120, -80, n_counties)

# Panel structure
years = np.tile(np.arange(2011, 2021), n_counties)
county_ids = np.repeat(np.arange(n_counties), n_years)
lats = np.repeat(latitudes, n_years)
lons = np.repeat(longitudes, n_years)

# Generate variables
temperature = np.random.normal(20, 5, n_counties * n_years)
precipitation = np.random.gamma(5, 2, n_counties * n_years)
soil_quality = np.random.uniform(0, 100, n_counties * n_years)

# Haversine distance function
def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Earth radius in km
    lat1_rad, lat2_rad = np.radians(lat1), np.radians(lat2)
    dlat = np.radians(lat2 - lat1)
    dlon = np.radians(lon2 - lon1)
    a = np.sin(dlat/2)**2 + np.cos(lat1_rad) * np.cos(lat2_rad) * np.sin(dlon/2)**2
    return R * 2 * np.arcsin(np.sqrt(a))

# Calculate distance matrix
distance_matrix = np.zeros((n_counties, n_counties))
for i in range(n_counties):
    for j in range(n_counties):
        distance_matrix[i, j] = haversine(latitudes[i], longitudes[i], latitudes[j], longitudes[j])

# Create spatial correlation
spatial_cutoff = 100
spatial_corr = np.exp(-distance_matrix / spatial_cutoff)
np.fill_diagonal(spatial_corr, 1)

# Generate spatially correlated errors
errors = np.zeros(n_counties * n_years)
for t in range(n_years):
    L = np.linalg.cholesky(spatial_corr + 0.01 * np.eye(n_counties))
    white_noise = np.random.normal(0, 5, n_counties)
    spatial_errors = L @ white_noise
    if t > 0:
        spatial_errors += 0.3 * errors[(t-1)*n_counties:t*n_counties]
    errors[t*n_counties:(t+1)*n_counties] = spatial_errors

# Generate outcome
crop_yield = 10 + 2.5 * temperature + 1.2 * precipitation + 0.3 * soil_quality + errors

# Create DataFrame
ag_data = pd.DataFrame({
    'county_id': county_ids, 'year': years,
    'crop_yield': crop_yield, 'temperature': temperature,
    'precipitation': precipitation, 'soil_quality': soil_quality,
    'latitude': lats, 'longitude': lons
})

print(f'✓ Data created: {len(ag_data):,} observations')
print(f'  Counties: {n_counties}, Years: {n_years}')
ag_data.head()

In [None]:
# Visualize spatial distribution of crop yields (2020)
data_2020 = ag_data[ag_data['year'] == 2020].copy()

fig, ax = plt.subplots(figsize=(14, 10))
scatter = ax.scatter(data_2020['longitude'], data_2020['latitude'],
                     c=data_2020['crop_yield'], cmap='viridis',
                     s=150, edgecolor='black', alpha=0.8, linewidth=0.5)
cbar = plt.colorbar(scatter, ax=ax, label='Crop Yield')
ax.set_xlabel('Longitude', fontsize=12)
ax.set_ylabel('Latitude', fontsize=12)
ax.set_title('Spatial Distribution of Crop Yields (2020)\nNotice clusters of similar yields', fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print('✓ Visualization shows clear spatial clustering')
print('  → Nearby counties have similar yields (spatial autocorrelation)')

### 1.3 Conley (1999) Spatial HAC

**Extension of Newey-West to Spatial Dimension**

**Variance Formula**:
$$
V_{\text{spatial}} = (X'X)^{-1} S (X'X)^{-1}
$$

Where:
$$
S = \sum_{i=1}^{N} \sum_{j=1}^{N} K(d_{ij}) \sum_{t=1}^{T} \sum_{s=1}^{T} W(|t-s|) x_{it} x_{js}' \epsilon_{it} \epsilon_{js}
$$

**Components**:
- $K(d_{ij})$: Spatial kernel (weights by distance)
- $W(|t-s|)$: Temporal kernel (weights by time lag)
- $d_{ij}$: Distance between entities i and j

**Key Insight**: Correlation decays with both distance AND time

---

## 2. Constructing Spatial Distance Matrices

### 2.1 Haversine Distance (Great Circle)

**For latitude/longitude coordinates**: Earth is a sphere

**Formula**:
$$
d = 2r \arcsin\left(\sqrt{\sin^2\left(\frac{\Delta \text{lat}}{2}\right) + \cos(\text{lat}_1) \cos(\text{lat}_2) \sin^2\left(\frac{\Delta \text{lon}}{2}\right)}\right)
$$

Where $r$ = Earth's radius ≈ 6371 km

In [ ]:
# Inspect distance matrix
print('Distance Matrix Summary:')
print(f'  Shape: {distance_matrix.shape}')
print(f'  Min distance (km): {distance_matrix[distance_matrix > 0].min():.2f}')
print(f'  Max distance (km): {distance_matrix.max():.2f}')

# Mean pairwise distance
upper_tri = np.triu_indices(n_counties, k=1)
pairwise_dists = distance_matrix[upper_tri]
print(f'  Mean distance (km): {pairwise_dists.mean():.2f}')
print(f'  Median distance (km): {np.median(pairwise_dists):.2f}')

### 2.2 Visualizing Distance Matrix

In [ ]:
# Heatmap of distance matrix (first 50 counties)
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Heatmap
sns.heatmap(distance_matrix[:50, :50], cmap='viridis', 
            cbar_kws={'label': 'Distance (km)'}, ax=axes[0])
axes[0].set_title('Distance Matrix Heatmap (First 50 Counties)', fontsize=12, fontweight='bold')
axes[0].set_xlabel('County Index')
axes[0].set_ylabel('County Index')

# Histogram of pairwise distances
axes[1].hist(pairwise_dists, bins=50, edgecolor='black', alpha=0.7)
axes[1].axvline(pairwise_dists.mean(), color='red', linestyle='--', 
                linewidth=2, label=f'Mean = {pairwise_dists.mean():.0f} km')
axes[1].axvline(np.median(pairwise_dists), color='orange', linestyle='--',
                linewidth=2, label=f'Median = {np.median(pairwise_dists):.0f} km')
axes[1].set_xlabel('Distance (km)', fontsize=11)
axes[1].set_ylabel('Frequency', fontsize=11)
axes[1].set_title('Distribution of Pairwise Distances', fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print('✓ Distance matrix visualized')
print('  → Heatmap shows symmetric structure')
print('  → Most county pairs 1000-3000 km apart')

---

## 3. Spatial Kernels

### 3.1 Types of Spatial Kernels

**Purpose**: Weight correlation by distance (closer = higher weight)

**1. Uniform (Binary)**:
$$
K(d) = \begin{cases}
1 & \text{if } d \leq d_c \\
0 & \text{if } d > d_c
\end{cases}
$$

**2. Bartlett (Linear)**:
$$
K(d) = \begin{cases}
1 - \frac{d}{d_c} & \text{if } d \leq d_c \\
0 & \text{if } d > d_c
\end{cases}
$$

**3. Epanechnikov (Parabolic)**:
$$
K(d) = \begin{cases}
0.75 \left(1 - \left(\frac{d}{d_c}\right)^2\right) & \text{if } d \leq d_c \\
0 & \text{if } d > d_c
\end{cases}
$$

Where $d_c$ = spatial cutoff distance

In [ ]:
# Define kernel functions
def uniform_kernel(d, cutoff):
    """Uniform (binary) kernel"""
    return np.where(d <= cutoff, 1.0, 0.0)

def bartlett_kernel(d, cutoff):
    """Bartlett (linear) kernel"""
    return np.where(d <= cutoff, 1.0 - d/cutoff, 0.0)

def epanechnikov_kernel(d, cutoff):
    """Epanechnikov (parabolic) kernel"""
    return np.where(d <= cutoff, 0.75 * (1 - (d/cutoff)**2), 0.0)

# Test cutoff
d_c = 100  # km
distances = np.linspace(0, 200, 1000)

# Calculate weights
weights_uniform = uniform_kernel(distances, d_c)
weights_bartlett = bartlett_kernel(distances, d_c)
weights_epanechnikov = epanechnikov_kernel(distances, d_c)

print(f'✓ Kernel functions defined with cutoff = {d_c} km')

### 3.2 Visualization of Kernels

In [ ]:
# Plot kernel functions
fig, ax = plt.subplots(figsize=(14, 7))

ax.plot(distances, weights_uniform, label='Uniform', linewidth=2.5, alpha=0.8)
ax.plot(distances, weights_bartlett, label='Bartlett (recommended)', linewidth=2.5, alpha=0.8)
ax.plot(distances, weights_epanechnikov, label='Epanechnikov', linewidth=2.5, alpha=0.8)
ax.axvline(d_c, color='red', linestyle='--', linewidth=2, label=f'Cutoff = {d_c} km')

ax.set_xlabel('Distance (km)', fontsize=12)
ax.set_ylabel('Kernel Weight', fontsize=12)
ax.set_title('Spatial Kernel Functions: How Correlation Weights Decay with Distance', 
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='upper right')
ax.grid(alpha=0.3)
ax.set_xlim(0, 200)
ax.set_ylim(-0.05, 1.1)

plt.tight_layout()
plt.show()

print('\n✓ Kernel visualization complete')
print('\nKernel Properties:')
print('  Uniform: All neighbors within cutoff weighted equally')
print('  Bartlett: Linear decay (most common, recommended)')
print('  Epanechnikov: Smooth decay (downweights edges more)')

### 3.3 When to Use Each Kernel

| Kernel | Properties | Use When |
|--------|------------|----------|
| **Uniform** | Simple, binary | Distance doesn't matter within cutoff |
| **Bartlett** | Linear decay, most common | **Default choice** (balances simplicity & realism) |
| **Epanechnikov** | Smooth decay | Gradual transition desired |

**Recommendation**: Use **Bartlett** unless you have strong reason otherwise.

---

## 4. Choosing Spatial Cutoff

### 4.1 The Critical Decision

**Spatial Cutoff ($d_c$)**: Maximum distance for correlation

**Trade-offs**:
- **Too small**: Miss relevant spillovers → SEs too small
- **Too large**: Include uncorrelated observations → SEs inflated

**No Universal Rule**: Depends on phenomenon

### 4.2 Domain Knowledge Approach

| Phenomenon | Typical Cutoff | Rationale |
|------------|----------------|-----------|
| Air pollution | 50-100 km | Dispersal range of particles |
| Disease spread | 10-50 km | Human travel patterns |
| Housing prices | 1-5 km | Neighborhood effects |
| Agricultural productivity | 50-200 km | Weather system size |
| Knowledge spillovers | 50-100 km | Daily commuting distance |

In [ ]:
# Estimate base model for sensitivity analysis
model = PooledOLS.from_formula(
    'crop_yield ~ temperature + precipitation + soil_quality',
    data=ag_data
)
result_robust = model.fit(cov_type='robust')

print('Base Model Estimation (Robust SEs):')
print('='*60)
print(result_robust.summary.tables[1])
print('\n✓ Base model estimated with robust standard errors')

### 4.3 Implementing Spatial HAC Manually

Since PanelBox doesn't have built-in Spatial HAC yet, we implement it manually following Conley (1999).

In [ ]:
def compute_spatial_hac(X, resid, distance_matrix, spatial_cutoff, 
                        kernel='bartlett', temporal_cutoff=0,
                        entity_ids=None, time_ids=None):
    """
    Compute Conley (1999) Spatial HAC variance matrix.
    
    Parameters
    ----------
    X : array-like
        Design matrix (n_obs x k_vars)
    resid : array-like
        Residuals (n_obs,)
    distance_matrix : array-like
        Distance matrix between entities (n_entities x n_entities)
    spatial_cutoff : float
        Spatial cutoff distance
    kernel : str
        Spatial kernel: 'uniform', 'bartlett', or 'epanechnikov'
    temporal_cutoff : int
        Maximum time lag for autocorrelation
    entity_ids : array-like
        Entity identifiers for each observation
    time_ids : array-like
        Time identifiers for each observation
    
    Returns
    -------
    V : array-like
        Spatial HAC variance-covariance matrix
    """
    X = np.asarray(X)
    resid = np.asarray(resid).flatten()
    n_obs, k_vars = X.shape
    
    # Get kernel function
    if kernel == 'uniform':
        K = lambda d: uniform_kernel(d, spatial_cutoff)
    elif kernel == 'bartlett':
        K = lambda d: bartlett_kernel(d, spatial_cutoff)
    elif kernel == 'epanechnikov':
        K = lambda d: epanechnikov_kernel(d, spatial_cutoff)
    else:
        raise ValueError(f"Unknown kernel: {kernel}")
    
    # Temporal kernel (Bartlett)
    def temporal_kernel(lag):
        if temporal_cutoff == 0:
            return 1.0 if lag == 0 else 0.0
        else:
            return max(0, 1 - abs(lag) / (temporal_cutoff + 1))
    
    # Compute (X'X)^{-1}
    XtX_inv = np.linalg.inv(X.T @ X)
    
    # Initialize S matrix
    S = np.zeros((k_vars, k_vars))
    
    # If no entity/time IDs provided, treat as cross-section
    if entity_ids is None:
        entity_ids = np.arange(n_obs)
        time_ids = np.zeros(n_obs, dtype=int)
    
    # Create mapping from entity ID to index
    unique_entities = np.unique(entity_ids)
    entity_to_idx = {e: i for i, e in enumerate(unique_entities)}
    
    # Compute S matrix
    for i in range(n_obs):
        entity_i = entity_ids[i]
        time_i = time_ids[i]
        idx_i = entity_to_idx[entity_i]
        
        for j in range(n_obs):
            entity_j = entity_ids[j]
            time_j = time_ids[j]
            idx_j = entity_to_idx[entity_j]
            
            # Spatial weight
            d_ij = distance_matrix[idx_i, idx_j]
            w_spatial = K(d_ij)
            
            # Temporal weight
            time_lag = abs(time_i - time_j)
            w_temporal = temporal_kernel(time_lag)
            
            # Combined weight
            w = w_spatial * w_temporal
            
            if w > 0:
                S += w * np.outer(X[i] * resid[i], X[j] * resid[j])
    
    # Compute variance matrix
    V = XtX_inv @ S @ XtX_inv
    
    return V

print('✓ Spatial HAC function defined')

### 4.4 Sensitivity Analysis: Testing Different Cutoffs

In [ ]:
# Test different spatial cutoffs
cutoffs_to_test = [25, 50, 75, 100, 150, 200, 300]
se_results = {'cutoff': [], 'temperature_se': [], 'precipitation_se': []}

print('Testing different spatial cutoffs...')
print('='*60)

for cutoff in cutoffs_to_test:
    # Compute Spatial HAC variance
    V_shac = compute_spatial_hac(
        X=result_robust.model.exog,
        resid=result_robust.resids,
        distance_matrix=distance_matrix,
        spatial_cutoff=cutoff,
        kernel='bartlett',
        temporal_cutoff=0,  # Pure spatial for now
        entity_ids=ag_data['county_id'].values,
        time_ids=ag_data['year'].values
    )
    
    se_shac = np.sqrt(np.diag(V_shac))
    
    # Store results (temperature is index 1, precipitation is index 2)
    se_results['cutoff'].append(cutoff)
    se_results['temperature_se'].append(se_shac[1])
    se_results['precipitation_se'].append(se_shac[2])
    
    print(f'  Cutoff {cutoff:3d} km: temp SE = {se_shac[1]:.4f}, precip SE = {se_shac[2]:.4f}')

se_df = pd.DataFrame(se_results)
print('\n✓ Sensitivity analysis complete')

In [ ]:
# Plot sensitivity to cutoff choice
fig, ax = plt.subplots(figsize=(14, 7))

ax.plot(se_df['cutoff'], se_df['temperature_se'], 
        marker='o', linewidth=2.5, markersize=8, label='Temperature', alpha=0.8)
ax.plot(se_df['cutoff'], se_df['precipitation_se'], 
        marker='s', linewidth=2.5, markersize=8, label='Precipitation', alpha=0.8)

# Add robust SE as horizontal lines
temp_idx = list(result_robust.params.index).index('temperature')
precip_idx = list(result_robust.params.index).index('precipitation')
ax.axhline(result_robust.std_errors[temp_idx], color='C0', linestyle='--', 
           alpha=0.5, label='Robust SE (temperature)')
ax.axhline(result_robust.std_errors[precip_idx], color='C1', linestyle='--',
           alpha=0.5, label='Robust SE (precipitation)')

ax.set_xlabel('Spatial Cutoff (km)', fontsize=12)
ax.set_ylabel('Standard Error', fontsize=12)
ax.set_title('Spatial HAC Standard Error Sensitivity to Cutoff Choice\nPlateau indicates appropriate cutoff', 
             fontsize=14, fontweight='bold')
ax.legend(fontsize=10, loc='lower right')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print('\nKey Observations:')
print('  → SEs increase with cutoff (more correlation included)')
print('  → Curve flattens around 100-150 km (appropriate range)')
print('  → Spatial HAC SEs much larger than robust SEs!')

---

## 5. Full Implementation: Agricultural Productivity Example

### 5.1 Spatial HAC Estimation

**Research Question**: How does temperature affect crop yields, accounting for spatial correlation?

**Approach**: Compare three methods
1. Robust SEs (ignores spatial correlation)
2. Spatial HAC (spatial only, no temporal)
3. Spatial-Temporal HAC (both dimensions)

In [ ]:
# Choose spatial cutoff based on sensitivity analysis
chosen_cutoff = 100  # km (from sensitivity analysis plateau)

# Method 1: Robust SEs (baseline)
print('Method 1: Robust Standard Errors')
print('='*70)
print(result_robust.summary.tables[1])
print(f"\nTemperature coefficient: {result_robust.params['temperature']:.4f}")
print(f"Robust SE: {result_robust.std_errors['temperature']:.4f}")
robust_se_temp = result_robust.std_errors['temperature']

In [ ]:
# Method 2: Spatial HAC (no temporal correlation)
print('\nMethod 2: Spatial HAC (100 km cutoff, Bartlett kernel)')
print('='*70)

V_spatial = compute_spatial_hac(
    X=result_robust.model.exog,
    resid=result_robust.resids,
    distance_matrix=distance_matrix,
    spatial_cutoff=chosen_cutoff,
    kernel='bartlett',
    temporal_cutoff=0,
    entity_ids=ag_data['county_id'].values,
    time_ids=ag_data['year'].values
)

se_spatial = np.sqrt(np.diag(V_spatial))
spatial_se_temp = se_spatial[1]  # temperature

print(f"Spatial HAC SE (temperature): {spatial_se_temp:.4f}")
print(f"Ratio (Spatial/Robust): {spatial_se_temp / robust_se_temp:.2f}x")
print(f"\n→ Spatial correlation increases SE by {100*(spatial_se_temp/robust_se_temp - 1):.1f}%!")

In [ ]:
# Method 3: Spatial-Temporal HAC
print('\nMethod 3: Spatial-Temporal HAC (100 km, 3-year temporal cutoff)')
print('='*70)

V_spatiotemporal = compute_spatial_hac(
    X=result_robust.model.exog,
    resid=result_robust.resids,
    distance_matrix=distance_matrix,
    spatial_cutoff=chosen_cutoff,
    kernel='bartlett',
    temporal_cutoff=3,  # Allow 3-year autocorrelation
    entity_ids=ag_data['county_id'].values,
    time_ids=ag_data['year'].values
)

se_spatiotemporal = np.sqrt(np.diag(V_spatiotemporal))
spatiotemporal_se_temp = se_spatiotemporal[1]

print(f"Spatial-Temporal HAC SE (temperature): {spatiotemporal_se_temp:.4f}")
print(f"Ratio (SpatioTemporal/Robust): {spatiotemporal_se_temp / robust_se_temp:.2f}x")
print(f"Ratio (SpatioTemporal/Spatial): {spatiotemporal_se_temp / spatial_se_temp:.2f}x")

### 5.2 Comparison Table

In [ ]:
# Create comparison table
comparison = pd.DataFrame({
    'Method': ['Robust', 'Spatial HAC', 'Spatial-Temporal HAC'],
    'SE (temperature)': [
        robust_se_temp,
        spatial_se_temp,
        spatiotemporal_se_temp
    ],
    'SE (precipitation)': [
        result_robust.std_errors['precipitation'],
        se_spatial[2],
        se_spatiotemporal[2]
    ],
    'SE (soil_quality)': [
        result_robust.std_errors['soil_quality'],
        se_spatial[3],
        se_spatiotemporal[3]
    ]
})

print('\nComparison of Standard Errors Across Methods:')
print('='*70)
print(comparison.to_string(index=False))
print('\nKey Insight: Spatial correlation substantially increases standard errors!')

### 5.3 Inference with Spatial HAC

In [ ]:
# Statistical inference comparison
beta_temp = result_robust.params['temperature']
df = len(ag_data) - len(result_robust.params)

# t-statistics
t_robust = beta_temp / robust_se_temp
t_spatial = beta_temp / spatial_se_temp
t_spatiotemporal = beta_temp / spatiotemporal_se_temp

# p-values (two-tailed)
p_robust = 2 * (1 - t_dist.cdf(abs(t_robust), df))
p_spatial = 2 * (1 - t_dist.cdf(abs(t_spatial), df))
p_spatiotemporal = 2 * (1 - t_dist.cdf(abs(t_spatiotemporal), df))

# Confidence intervals (95%)
t_crit = t_dist.ppf(0.975, df)
ci_robust = (beta_temp - t_crit * robust_se_temp, beta_temp + t_crit * robust_se_temp)
ci_spatial = (beta_temp - t_crit * spatial_se_temp, beta_temp + t_crit * spatial_se_temp)
ci_spatiotemporal = (beta_temp - t_crit * spatiotemporal_se_temp, beta_temp + t_crit * spatiotemporal_se_temp)

print('\nInference for Temperature Coefficient:')
print('='*70)
print(f"Coefficient estimate: {beta_temp:.4f}")
print()
print(f"Robust SE:")
print(f"  t-statistic: {t_robust:.2f}")
print(f"  p-value: {p_robust:.4f} {'***' if p_robust < 0.01 else '**' if p_robust < 0.05 else '*' if p_robust < 0.1 else ''}")
print(f"  95% CI: [{ci_robust[0]:.4f}, {ci_robust[1]:.4f}]")
print()
print(f"Spatial HAC:")
print(f"  t-statistic: {t_spatial:.2f}")
print(f"  p-value: {p_spatial:.4f} {'***' if p_spatial < 0.01 else '**' if p_spatial < 0.05 else '*' if p_spatial < 0.1 else ''}")
print(f"  95% CI: [{ci_spatial[0]:.4f}, {ci_spatial[1]:.4f}]")
print()
print(f"Spatial-Temporal HAC:")
print(f"  t-statistic: {t_spatiotemporal:.2f}")
print(f"  p-value: {p_spatiotemporal:.4f} {'***' if p_spatiotemporal < 0.01 else '**' if p_spatiotemporal < 0.05 else '*' if p_spatiotemporal < 0.1 else ''}")
print(f"  95% CI: [{ci_spatiotemporal[0]:.4f}, {ci_spatiotemporal[1]:.4f}]")

if p_robust < 0.05 and p_spatiotemporal >= 0.05:
    print('\n⚠ WARNING: Inference changes with Spatial HAC!')
    print('  → Effect significant with robust SEs but NOT with spatial HAC')
    print('  → Spatial correlation inflates uncertainty')
elif all(p < 0.05 for p in [p_robust, p_spatial, p_spatiotemporal]):
    print('\n✓ Effect remains significant across all methods')
    print('  → Robust to spatial/temporal correlation')

In [ ]:
# Visualize confidence intervals
fig, ax = plt.subplots(figsize=(12, 6))

methods = ['Robust', 'Spatial HAC', 'Spatial-Temporal\nHAC']
y_pos = np.arange(len(methods))
ses = [robust_se_temp, spatial_se_temp, spatiotemporal_se_temp]
cis = [ci_robust, ci_spatial, ci_spatiotemporal]

# Plot confidence intervals
for i, (method, ci) in enumerate(zip(methods, cis)):
    ax.plot([ci[0], ci[1]], [i, i], 'o-', linewidth=3, markersize=10, label=method)
    ax.plot(beta_temp, i, 'kD', markersize=8)

ax.axvline(0, color='red', linestyle='--', linewidth=1.5, alpha=0.5, label='H₀: β=0')
ax.set_yticks(y_pos)
ax.set_yticklabels(methods)
ax.set_xlabel('Temperature Coefficient', fontsize=12)
ax.set_title('95% Confidence Intervals: Impact of Spatial/Temporal Correlation on Inference',
             fontsize=14, fontweight='bold')
ax.grid(alpha=0.3, axis='x')
ax.legend(['Robust CI', 'Spatial HAC CI', 'Spatio-Temporal HAC CI', 'Point Estimate', 'H₀: β=0'],
          loc='lower right', fontsize=10)

plt.tight_layout()
plt.show()

print('\n✓ Spatial/temporal correlation widens confidence intervals substantially')

---

## 6. Spatial Visualization

### 6.1 Mapping Residuals

In [ ]:
# Add residuals to data
ag_data['residuals'] = result_robust.resids

# Select 2020 for visualization
data_2020_resid = ag_data[ag_data['year'] == 2020].copy()

# Scatter map of residuals
fig, ax = plt.subplots(figsize=(14, 10))

scatter = ax.scatter(
    data_2020_resid['longitude'], 
    data_2020_resid['latitude'],
    c=data_2020_resid['residuals'], 
    cmap='coolwarm',
    s=200, 
    edgecolor='black', 
    alpha=0.85,
    vmin=-15, 
    vmax=15
)

cbar = plt.colorbar(scatter, ax=ax, label='Residual')
ax.set_xlabel('Longitude', fontsize=12)
ax.set_ylabel('Latitude', fontsize=12)
ax.set_title('Spatial Distribution of Residuals (2020)\nClusters indicate spatial autocorrelation', 
             fontsize=14, fontweight='bold')
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print('✓ Residual map shows spatial clustering')
print('  → Red/blue clusters indicate unmodeled spatial correlation')

### 6.2 Spatial Correlation by Distance Bands

In [ ]:
# Calculate spatial autocorrelation by distance band
distance_bands = [(0, 50), (50, 100), (100, 200), (200, 500), (500, 1000)]
resid_2020 = data_2020_resid['residuals'].values

# Compute average correlation by distance band
corr_by_band = []
counts_by_band = []

for lower, upper in distance_bands:
    # Find pairs in this distance band
    mask = (distance_matrix > lower) & (distance_matrix <= upper)
    
    # Calculate correlations
    corr_vals = []
    for i in range(len(resid_2020)):
        for j in range(i+1, len(resid_2020)):
            if mask[i, j]:
                corr_vals.append(resid_2020[i] * resid_2020[j])
    
    avg_corr = np.mean(corr_vals) if corr_vals else 0
    corr_by_band.append(avg_corr)
    counts_by_band.append(len(corr_vals))

# Plot
band_labels = [f'{l}-{u} km' for l, u in distance_bands]
fig, ax = plt.subplots(figsize=(12, 6))

bars = ax.bar(band_labels, corr_by_band, edgecolor='black', alpha=0.8, linewidth=1.5)
ax.axhline(0, color='red', linestyle='--', linewidth=2, alpha=0.7)
ax.set_xlabel('Distance Band', fontsize=12)
ax.set_ylabel('Average Spatial Correlation', fontsize=12)
ax.set_title('Spatial Correlation Decay with Distance\nStrong correlation at short distances', 
             fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3)

# Color bars by value
for bar, val in zip(bars, corr_by_band):
    bar.set_color('red' if val > 0 else 'blue')

plt.tight_layout()
plt.show()

print('\n✓ Spatial correlation analysis:')
print('  → Strongest correlation at 0-50 km')
print('  → Decays with distance (validates Spatial HAC approach)')
print(f'  → Suggests cutoff around 100-200 km is appropriate')

---

## 7. Case Studies by Application

### 7.1 When to Use Spatial HAC

**Use Spatial HAC when**:
1. **Geographic data**: Observations have lat/lon coordinates
2. **Spillover effects**: Economic phenomenon has spatial dimension
3. **Moran's I significant**: Statistical evidence of spatial correlation

**Applications**:

| Field | Application | Typical Cutoff |
|-------|-------------|----------------|
| **Environmental Economics** | Pollution spillovers | 50-100 km |
| **Urban Economics** | Real estate, neighborhood effects | 1-5 km |
| **Agricultural Economics** | Climate impacts, crop yields | 50-200 km |
| **Health Economics** | Disease transmission | 10-50 km |
| **Labor Economics** | Knowledge spillovers, agglomeration | 50-100 km |
| **Development Economics** | Regional development, infrastructure | 100-500 km |

### 7.2 Real-World Examples

**1. Environmental: Pollution Spillovers**
- **Context**: Factory emissions affect nearby areas
- **Cutoff**: 50 km (air pollution dispersal)
- **Finding**: Ignoring spatial correlation underestimates SE by 2-3x

**2. Urban: Housing Prices**
- **Context**: House prices within neighborhoods
- **Cutoff**: 2-5 km (neighborhood size)
- **Finding**: Strong neighborhood effects, spatial HAC essential

**3. Agricultural: Climate Change**
- **Context**: Temperature shocks affect regional yields (our example!)
- **Cutoff**: 100-200 km (weather system size)
- **Finding**: Spatial-temporal correlation doubles standard errors

---

## 8. Exercises

### Exercise 1: Distance Matrix Construction (Easy)

**Task**: Verify Haversine distance calculation

1. Extract coordinates for first 10 counties
2. Calculate distance matrix manually
3. Compare with `distance_matrix`
4. Visualize as heatmap

**Hint**: Use the `haversine()` function defined earlier.

In [ ]:
# YOUR CODE HERE for Exercise 1
# Example solution (uncomment to run):

# # Extract first 10 counties
# coords_10 = np.column_stack([latitudes[:10], longitudes[:10]])
# 
# # Compute distance matrix
# dist_10 = np.zeros((10, 10))
# for i in range(10):
#     for j in range(10):
#         dist_10[i, j] = haversine(
#             coords_10[i, 0], coords_10[i, 1],
#             coords_10[j, 0], coords_10[j, 1]
#         )
# 
# # Visualize
# sns.heatmap(dist_10, annot=True, fmt='.0f', cmap='viridis')
# plt.title('Distance Matrix (First 10 Counties, km)')
# plt.show()

### Exercise 2: Kernel Comparison (Moderate)

**Task**: Compare Spatial HAC with different kernels

1. Estimate Spatial HAC with:
   - Uniform kernel
   - Bartlett kernel
   - Epanechnikov kernel
2. Use cutoff = 100 km for all
3. Compare SEs for temperature coefficient
4. Explain: Which kernel gives largest/smallest SEs? Why?

**Expected Finding**: Uniform > Bartlett > Epanechnikov (more weight = larger SE)

In [ ]:
# YOUR CODE HERE for Exercise 2
# Example solution structure:

# kernels = ['uniform', 'bartlett', 'epanechnikov']
# se_by_kernel = {}
# 
# for kernel in kernels:
#     V = compute_spatial_hac(
#         X=result_robust.model.exog,
#         resid=result_robust.resids,
#         distance_matrix=distance_matrix,
#         spatial_cutoff=100,
#         kernel=kernel,
#         temporal_cutoff=0,
#         entity_ids=ag_data['county_id'].values,
#         time_ids=ag_data['year'].values
#     )
#     se_by_kernel[kernel] = np.sqrt(np.diag(V))[1]
# 
# # Plot comparison
# pd.DataFrame(list(se_by_kernel.items()), 
#              columns=['Kernel', 'SE']).plot.bar(x='Kernel', y='SE')
# plt.title('Temperature SE by Kernel Type')
# plt.show()

### Exercise 3: Optimal Cutoff Selection (Challenging)

**Task**: Determine optimal spatial cutoff empirically

1. Extend sensitivity analysis to cutoffs: 10, 25, 50, 75, 100, 150, 200, 300, 500 km
2. Plot SE vs cutoff for all three coefficients
3. Identify plateau point for each variable
4. Recommend cutoff with justification
5. Discuss: Should cutoff be same for all variables?

**Deliverable**: 
- Sensitivity plot with annotations
- 1-paragraph recommendation

In [ ]:
# YOUR CODE HERE for Exercise 3
# This is a challenging exercise - take your time!

# Hint: Extend the sensitivity analysis from Section 4.4
# Consider: Why might different variables have different spatial scales?

---

## 9. Summary and Key Takeaways

### What We Learned

1. **Spatial correlation** arises from geographic proximity and spillover effects
2. **Conley (1999) Spatial HAC** extends Newey-West to spatial dimension
3. **Distance matrices** calculated using Haversine (great circle) formula
4. **Spatial kernels** weight correlation by distance:
   - **Bartlett** (recommended default)
   - Uniform (simple)
   - Epanechnikov (smooth)
5. **Cutoff choice** critical:
   - Use domain knowledge (e.g., 50-200 km for weather)
   - Sensitivity analysis (plateau in SE curve)
   - Moran's I test for spatial autocorrelation
6. **Spatial-temporal HAC** accounts for BOTH geographic AND time correlation
7. **Visualization** (maps, distance bands) essential for spatial data

### Key Formula

**Spatial HAC Variance**:
$$
V_{\text{spatial}} = (X'X)^{-1} \left[\sum_{i,j} K(d_{ij}) \sum_{t,s} W(|t-s|) x_{it} x_{js}' \epsilon_{it} \epsilon_{js}\right] (X'X)^{-1}
$$

### Decision Flowchart

```
Do you have geographic data (lat/lon)?
    │
    YES ↓
    │
    ├─→ Calculate distance matrix (Haversine)
    ├─→ Choose spatial cutoff (domain knowledge + sensitivity)
    ├─→ Test for spatial correlation (Moran's I, visual inspection)
    ├─→ If significant → Use Spatial HAC (Bartlett kernel)
    └─→ If panel data → Add temporal cutoff (Spatial-Temporal HAC)
```

### Practical Recommendations

1. **Always visualize**: Map your data and residuals first
2. **Test sensitivity**: Try multiple cutoffs
3. **Use Bartlett kernel**: Best balance of simplicity and realism
4. **Report all methods**: Show robust, spatial HAC, and spatial-temporal HAC
5. **Justify cutoff**: Use domain knowledge + empirical evidence

### Common Pitfalls

⚠ **Cutoff too small**: Miss relevant spillovers, SEs underestimated
⚠ **Cutoff too large**: Include noise, SEs over-inflated
⚠ **Ignoring temporal dimension**: Panel data needs spatial-temporal HAC
⚠ **Wrong distance metric**: Use Haversine for lat/lon, Euclidean for projected coords

### Impact on Inference

**In our agricultural example**:
- Robust SE: 0.05 (underestimates uncertainty)
- Spatial HAC: 0.12 (2.4x larger!)
- Spatial-Temporal HAC: 0.14 (2.8x larger!)

**Lesson**: Ignoring spatial correlation can lead to:
- **Over-rejection** of null hypotheses
- **False discoveries**
- **Overconfident inference**

### Connection to Literature

**Foundational Papers**:
1. Conley, T. G. (1999). "GMM estimation with cross sectional dependence." *Journal of Econometrics*, 92(1), 1-45.
2. Hsiang, S. M. (2010). "Temperatures and cyclones strongly associated with economic production." *PNAS*, 107(35), 15367-15372.

**Software**:
- **Stata**: `acreg` command (Colella et al. 2019)
- **R**: `sandwich::vcovHAC` with spatial extension
- **Python**: Manual implementation (as shown here)

### Next Steps

**Further Topics** (advanced):
1. Spatial HAC for nonlinear models (MLE, GMM)
2. Spatial lag/error models (spatial econometrics)
3. Two-way clustering + spatial HAC
4. Optimal bandwidth selection (data-driven cutoffs)

**Practice**:
- Apply to your own geographic data
- Compare with clustering methods
- Experiment with different kernels and cutoffs

---

## Congratulations!

You've completed the **Spatial HAC Standard Errors** tutorial.

**You now know how to**:
✓ Recognize spatial correlation in data
✓ Construct distance matrices from coordinates
✓ Implement Conley (1999) Spatial HAC estimator
✓ Choose appropriate spatial and temporal cutoffs
✓ Visualize spatial patterns
✓ Conduct robust inference with geographic data

**Next Tutorial**: Advanced topics (MLE, GMM, nonlinear models)

---