# Solutions: Tutorial 04 - Spatial Fundamentals

**Series**: PanelBox - Fundamentals (Solutions)
**Level**: Intermediate
**Tutorial**: 04_spatial_fundamentals.ipynb

This notebook contains complete solutions to the exercises in Tutorial 04.

---

## Setup and Imports

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial import distance_matrix
import scipy
from IPython.display import display

# PanelBox library
import sys
sys.path.append('/home/guhaase/projetos/panelbox')
import panelbox as pb

# Configuration
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 8)
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)

print(f"PanelBox version: {pb.__version__}")
print("Setup complete!")

## Create Synthetic Spatial Data

In [None]:
# Create 5x5 grid
n_rows, n_cols = 5, 5
n_regions = n_rows * n_cols

coords = []
region_ids = []
for i in range(n_rows):
    for j in range(n_cols):
        coords.append([j, i])
        region_ids.append(f"R{i*n_cols + j + 1:02d}")

coords = np.array(coords)
spatial_data = pd.DataFrame({
    'region_id': region_ids,
    'x': coords[:, 0],
    'y': coords[:, 1]
})

# Add GDP per capita with spatial pattern
np.random.seed(42)
center_x, center_y = n_cols / 2, n_rows / 2
distance_from_center = np.sqrt((spatial_data['x'] - center_x)**2 + 
                                (spatial_data['y'] - center_y)**2)
spatial_data['gdp_pc'] = 100 - 5 * distance_from_center + np.random.normal(0, 5, n_regions)

print(f"Created {n_regions} regions in {n_rows}×{n_cols} grid")
display(spatial_data.head())

---

## Exercise 1: Compare Weight Matrices

**Task**: Compare rook, queen, and KNN (k=6) weight matrices.

In [None]:
print("="*70)
print("SOLUTION 1: COMPARING WEIGHT MATRICES")
print("="*70)

# Helper functions
def create_rook_contiguity(n_rows, n_cols):
    n = n_rows * n_cols
    W = np.zeros((n, n))
    
    for i in range(n):
        row_i = i // n_cols
        col_i = i % n_cols
        
        neighbors = [
            (row_i - 1, col_i),
            (row_i + 1, col_i),
            (row_i, col_i - 1),
            (row_i, col_i + 1)
        ]
        
        for row_j, col_j in neighbors:
            if 0 <= row_j < n_rows and 0 <= col_j < n_cols:
                j = row_j * n_cols + col_j
                W[i, j] = 1
    
    return W

def create_queen_contiguity(n_rows, n_cols):
    n = n_rows * n_cols
    W = np.zeros((n, n))
    
    for i in range(n):
        row_i = i // n_cols
        col_i = i % n_cols
        
        for dr in [-1, 0, 1]:
            for dc in [-1, 0, 1]:
                if dr == 0 and dc == 0:
                    continue
                
                row_j = row_i + dr
                col_j = col_i + dc
                
                if 0 <= row_j < n_rows and 0 <= col_j < n_cols:
                    j = row_j * n_cols + col_j
                    W[i, j] = 1
    
    return W

def create_knn_weights(D, k):
    n = D.shape[0]
    W = np.zeros((n, n))
    
    for i in range(n):
        distances = D[i, :]
        nearest_indices = np.argsort(distances)[1:k+1]
        W[i, nearest_indices] = 1
    
    return W

In [None]:
# Step 1: Create all three matrices
W_rook = create_rook_contiguity(n_rows, n_cols)
W_queen = create_queen_contiguity(n_rows, n_cols)

# For KNN, need distance matrix
coords_matrix = spatial_data[['x', 'y']].values
D = distance_matrix(coords_matrix, coords_matrix)
W_knn = create_knn_weights(D, k=6)

print("\nWeight matrices created:")
print(f"  Rook: {W_rook.shape}")
print(f"  Queen: {W_queen.shape}")
print(f"  KNN (k=6): {W_knn.shape}")

In [None]:
# Step 2: Calculate statistics
print("\n" + "="*70)
print("WEIGHT MATRIX STATISTICS")
print("="*70)

def matrix_stats(W, name):
    total_connections = int(W.sum())
    avg_neighbors = W.sum(axis=1).mean()
    density = total_connections / (W.shape[0] * (W.shape[0] - 1))
    
    print(f"\n{name}:")
    print(f"  Total connections: {total_connections}")
    print(f"  Average neighbors per region: {avg_neighbors:.2f}")
    print(f"  Density: {density:.4f} ({100*density:.2f}%)")
    print(f"  Min neighbors: {int(W.sum(axis=1).min())}")
    print(f"  Max neighbors: {int(W.sum(axis=1).max())}")
    
    return total_connections, avg_neighbors, density

rook_stats = matrix_stats(W_rook, "Rook Contiguity")
queen_stats = matrix_stats(W_queen, "Queen Contiguity")
knn_stats = matrix_stats(W_knn, "KNN (k=6)")

In [None]:
# Step 3: Which is most/least connected?
print("\n" + "-"*70)
print("COMPARISON")
print("-"*70)

stats_df = pd.DataFrame({
    'Type': ['Rook', 'Queen', 'KNN (k=6)'],
    'Total Connections': [rook_stats[0], queen_stats[0], knn_stats[0]],
    'Avg Neighbors': [rook_stats[1], queen_stats[1], knn_stats[1]],
    'Density': [rook_stats[2], queen_stats[2], knn_stats[2]]
})

display(stats_df)

most_connected = stats_df.loc[stats_df['Total Connections'].idxmax(), 'Type']
least_connected = stats_df.loc[stats_df['Total Connections'].idxmin(), 'Type']

print(f"\nMost connected: {most_connected}")
print(f"Least connected: {least_connected}")

print("\nInterpretation:")
print("  - Queen > KNN(6) > Rook in connectivity")
print("  - Queen includes diagonal neighbors (more inclusive)")
print("  - KNN ensures each region has exactly k=6 neighbors")
print("  - Rook only considers edge-sharing neighbors (most restrictive)")

In [None]:
# Step 4: Visualize network for one focal region
focal_idx = 12  # Center region (R13)

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

matrices = [W_rook, W_queen, W_knn]
titles = ['Rook', 'Queen', 'KNN (k=6)']

for ax, W, title in zip(axes, matrices, titles):
    # Plot all regions
    ax.scatter(spatial_data['x'], spatial_data['y'], s=150,
              c='lightgray', edgecolors='black', linewidth=1.5, zorder=2)
    
    # Highlight focal region
    ax.scatter(spatial_data.loc[focal_idx, 'x'],
              spatial_data.loc[focal_idx, 'y'],
              s=300, c='red', edgecolors='black', linewidth=2, zorder=3,
              label='Focal region')
    
    # Draw connections
    num_neighbors = 0
    for j in range(n_regions):
        if W[focal_idx, j] == 1:
            ax.plot([spatial_data.loc[focal_idx, 'x'], spatial_data.loc[j, 'x']],
                   [spatial_data.loc[focal_idx, 'y'], spatial_data.loc[j, 'y']],
                   'b-', linewidth=2, alpha=0.6, zorder=1)
            ax.scatter(spatial_data.loc[j, 'x'], spatial_data.loc[j, 'y'],
                      s=200, c='lightblue', edgecolors='black', linewidth=2, zorder=2)
            num_neighbors += 1
    
    ax.set_xlabel('X Coordinate', fontsize=11, fontweight='bold')
    ax.set_ylabel('Y Coordinate', fontsize=11, fontweight='bold')
    ax.set_title(f'{title}\n({num_neighbors} neighbors)', fontsize=12, fontweight='bold')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nFocal region: {spatial_data.loc[focal_idx, 'region_id']} (center of grid)")
print(f"  Rook neighbors: 4 (up, down, left, right)")
print(f"  Queen neighbors: 8 (rook + diagonals)")
print(f"  KNN neighbors: 6 (closest regions by distance)")

---

## Exercise 2: Custom Distance Function

**Task**: Create distance-based weight matrix with exponential decay.

In [None]:
print("="*70)
print("SOLUTION 2: EXPONENTIAL DISTANCE DECAY WEIGHTS")
print("="*70)

# Step 1: Create exponential decay weights
alpha = 0.3
W_exp = np.exp(-alpha * D)
np.fill_diagonal(W_exp, 0)  # No self-connection

print(f"\nFormula: w_ij = exp(-{alpha} × d_ij)")
print(f"\nWeight matrix created: {W_exp.shape}")
print(f"Total connections (non-zero): {np.sum(W_exp > 0)}")

print("\nFirst 5×5 block:")
display(pd.DataFrame(W_exp[:5, :5],
                    index=spatial_data['region_id'][:5],
                    columns=spatial_data['region_id'][:5]))

In [None]:
# Step 2: Row-normalize
def row_normalize(W):
    row_sums = W.sum(axis=1, keepdims=True)
    row_sums[row_sums == 0] = 1
    return W / row_sums

W_exp_row = row_normalize(W_exp)

print("\n" + "-"*70)
print("ROW-NORMALIZED EXPONENTIAL WEIGHTS")
print("-"*70)

print("\nRow-normalized matrix (first 5×5):")
display(pd.DataFrame(W_exp_row[:5, :5],
                    index=spatial_data['region_id'][:5],
                    columns=spatial_data['region_id'][:5]))

# Verify row sums
row_sums = W_exp_row.sum(axis=1)
print(f"\nRow sums (first 10): {row_sums[:10]}")
print(f"All row sums = 1? {np.allclose(row_sums, 1)}")

In [None]:
# Step 3: Compare with inverse distance weights
W_inv = 1 / (D + np.eye(n_regions))
np.fill_diagonal(W_inv, 0)
W_inv_row = row_normalize(W_inv)

print("\n" + "-"*70)
print("COMPARISON: EXPONENTIAL vs INVERSE DISTANCE")
print("-"*70)

# Compare weights for one region
focal_idx = 12
comparison_df = pd.DataFrame({
    'Distance': D[focal_idx, :],
    'Exp Decay': W_exp_row[focal_idx, :],
    'Inv Distance': W_inv_row[focal_idx, :]
}, index=spatial_data['region_id'])

comparison_df = comparison_df[comparison_df['Distance'] > 0].sort_values('Distance')
print(f"\nWeights from {spatial_data.loc[focal_idx, 'region_id']} (sorted by distance):")
display(comparison_df.head(10))

# Visualize decay functions
fig, ax = plt.subplots(figsize=(10, 6))

distances = comparison_df['Distance'].values
ax.scatter(distances, comparison_df['Exp Decay'], label='Exponential decay', 
          alpha=0.7, s=60, marker='o')
ax.scatter(distances, comparison_df['Inv Distance'], label='Inverse distance', 
          alpha=0.7, s=60, marker='^')

# Plot smooth curves
d_range = np.linspace(distances.min(), distances.max(), 100)
ax.plot(d_range, np.exp(-alpha * d_range) / np.exp(-alpha * d_range).sum(), 
       'b-', linewidth=2, alpha=0.5, label='Exp (normalized)')
ax.plot(d_range, (1/d_range) / (1/d_range).sum(), 
       'r--', linewidth=2, alpha=0.5, label='Inv (normalized)')

ax.set_xlabel('Distance', fontsize=12, fontweight='bold')
ax.set_ylabel('Weight', fontsize=12, fontweight='bold')
ax.set_title('Distance Decay Comparison', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("  - Exponential decay: Weights decline faster initially, slower at distance")
print("  - Inverse distance: More gradual decline")
print("  - Choice depends on economic theory about spatial spillovers")

---

## Exercise 3: Spatial Lag with Real Data

**Task**: Use Grunfeld dataset to compute spatial lag of investment.

In [None]:
print("="*70)
print("SOLUTION 3: SPATIAL LAG WITH GRUNFELD DATA")
print("="*70)

# Load Grunfeld data
data_path = '/home/guhaase/projetos/panelbox/examples/datasets/grunfeld.csv'
grunfeld = pd.read_csv(data_path)

print(f"\nGrunfeld dataset loaded: {grunfeld.shape[0]} observations")
print(f"Variables: {list(grunfeld.columns)}")
print(f"Firms: {grunfeld['firm'].nunique()}")
print(f"Years: {grunfeld['year'].min()} to {grunfeld['year'].max()}")

In [None]:
# Step 1: Assume firms can be ordered by ID
# Get unique firms
firms = sorted(grunfeld['firm'].unique())
n_firms = len(firms)
print(f"\nFirms: {firms}")
print(f"Number of firms: {n_firms}")

In [None]:
# Step 2: Create KNN weight matrix based on value similarity
# Use firm averages across years
firm_avg = grunfeld.groupby('firm')['value'].mean().sort_index()

print("\nFirm average values:")
display(firm_avg)

# Create distance matrix based on value differences
value_matrix = firm_avg.values.reshape(-1, 1)
D_value = np.abs(value_matrix - value_matrix.T)

print(f"\nValue-based distance matrix: {D_value.shape}")
print("\nDistance matrix (first 5×5):")
display(pd.DataFrame(D_value[:5, :5],
                    index=firm_avg.index[:5],
                    columns=firm_avg.index[:5]))

# Create KNN weights (k=3)
k = 3
W_value_knn = create_knn_weights(D_value, k)

# Symmetrize
W_value_knn = np.maximum(W_value_knn, W_value_knn.T)

# Row-normalize
W_value_knn_row = row_normalize(W_value_knn)

print(f"\nKNN weight matrix (k={k}) created and symmetrized")
print(f"Row-normalized: {np.allclose(W_value_knn_row.sum(axis=1), 1)}")

In [None]:
# Step 3: Compute spatial lag of investment
# Use firm averages for this exercise
invest_avg = grunfeld.groupby('firm')['invest'].mean().sort_index()

# Spatial lag: W × y
invest_lag = W_value_knn_row @ invest_avg.values

results_df = pd.DataFrame({
    'Firm': firm_avg.index,
    'Investment': invest_avg.values,
    'Investment_Lag': invest_lag,
    'Value': firm_avg.values
})

print("\n" + "="*70)
print("INVESTMENT AND SPATIAL LAG")
print("="*70)
display(results_df)

print("\nInterpretation:")
print("  Investment_Lag = weighted average of investment in similar firms")
print("  (similar = close in terms of firm value)")

In [None]:
# Step 4: Plot investment vs spatial lag
fig, ax = plt.subplots(figsize=(10, 6))

ax.scatter(results_df['Investment'], results_df['Investment_Lag'], 
          alpha=0.7, s=100, edgecolors='k', linewidth=1.5)

# Add firm labels
for idx, row in results_df.iterrows():
    ax.annotate(row['Firm'], 
               xy=(row['Investment'], row['Investment_Lag']),
               xytext=(5, 5), textcoords='offset points',
               fontsize=8, alpha=0.7)

# Add regression line
from scipy.stats import linregress
slope, intercept, r_value, p_value, std_err = linregress(
    results_df['Investment'], results_df['Investment_Lag'])

x_line = np.linspace(results_df['Investment'].min(), 
                     results_df['Investment'].max(), 100)
y_line = slope * x_line + intercept
ax.plot(x_line, y_line, 'r--', linewidth=2, 
       label=f'Slope = {slope:.3f} (p={p_value:.4f})')

ax.set_xlabel('Investment', fontsize=12, fontweight='bold')
ax.set_ylabel('Spatial Lag of Investment', fontsize=12, fontweight='bold')
ax.set_title('Moran Scatterplot: Investment vs Spatial Lag\n(Neighbors = Similar Firms by Value)', 
            fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

if slope > 0 and p_value < 0.05:
    print("\nResult: Positive spatial autocorrelation")
    print("  → Firms with similar values tend to have similar investment levels")
    print("  → Suggests clustering in investment behavior")
else:
    print("\nResult: No significant spatial autocorrelation")
    print("  → Investment patterns not strongly related to firm value similarity")

---

## Exercise 4: Test for Spatial Autocorrelation

**Task**: Test if investment exhibits spatial autocorrelation.

In [None]:
print("="*70)
print("SOLUTION 4: MORAN'S I TEST")
print("="*70)

# Use firm averages (N=10)
print(f"\nData: Firm averages across years")
print(f"N = {n_firms} firms")
print(f"Variable: Investment (average)")

In [None]:
# Step 2: Create contiguity matrix
# Assume firms 1-5 in one group, 6-10 in another
W_groups = np.zeros((n_firms, n_firms))

# Group 1: firms 0-4 (first 5)
for i in range(5):
    for j in range(5):
        if i != j:
            W_groups[i, j] = 1

# Group 2: firms 5-9 (last 5)
for i in range(5, 10):
    for j in range(5, 10):
        if i != j:
            W_groups[i, j] = 1

print("\nContiguity matrix (group-based):")
print("  Group 1: Firms 1-5 (all connected within group)")
print("  Group 2: Firms 6-10 (all connected within group)")
print("  Between groups: No connections")

print(f"\nWeight matrix:")
display(pd.DataFrame(W_groups,
                    index=firm_avg.index,
                    columns=firm_avg.index))

# Row-normalize
W_groups_row = row_normalize(W_groups)

In [None]:
# Step 3: Calculate Moran's I
def morans_i(y, W):
    """Calculate Moran's I statistic."""
    n = len(y)
    y_mean = y.mean()
    y_dev = y - y_mean
    
    # Numerator
    numerator = np.sum(W * np.outer(y_dev, y_dev))
    
    # Denominator
    denominator = np.sum(y_dev**2)
    
    # Normalization
    S0 = W.sum()
    
    I = (n / S0) * (numerator / denominator)
    
    return I

y = invest_avg.values
I_observed = morans_i(y, W_groups_row)

print("\n" + "="*70)
print("MORAN'S I STATISTIC")
print("="*70)

print(f"\nMoran's I = {I_observed:.4f}")

# Expected value under null
E_I = -1 / (n_firms - 1)
print(f"Expected I (H₀: no autocorrelation) = {E_I:.4f}")

# Interpretation
print("\n" + "-"*70)
print("INTERPRETATION")
print("-"*70)

if I_observed > E_I:
    print(f"\nI = {I_observed:.4f} > E(I) = {E_I:.4f}")
    print("→ Positive spatial autocorrelation detected")
    print("→ Firms in same group have similar investment levels")
    print("→ Investment exhibits spatial clustering")
elif I_observed < E_I:
    print(f"\nI = {I_observed:.4f} < E(I) = {E_I:.4f}")
    print("→ Negative spatial autocorrelation detected")
    print("→ Firms in same group have dissimilar investment levels")
    print("→ Investment exhibits spatial dispersion")
else:
    print(f"\nI ≈ {I_observed:.4f} ≈ E(I) = {E_I:.4f}")
    print("→ No spatial autocorrelation")
    print("→ Investment is randomly distributed")

In [None]:
# Visualize groups
results_df['Group'] = ['Group 1' if i < 5 else 'Group 2' for i in range(n_firms)]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Investment by group
for group in ['Group 1', 'Group 2']:
    group_data = results_df[results_df['Group'] == group]
    axes[0].scatter(group_data.index, group_data['Investment'],
                   label=group, s=100, alpha=0.7)
    axes[0].plot(group_data.index, group_data['Investment'],
                alpha=0.3, linewidth=2)

axes[0].set_xlabel('Firm Index', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Investment', fontsize=12, fontweight='bold')
axes[0].set_title('Investment by Group', fontsize=13, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Box plot
group1_inv = results_df[results_df['Group'] == 'Group 1']['Investment']
group2_inv = results_df[results_df['Group'] == 'Group 2']['Investment']

axes[1].boxplot([group1_inv, group2_inv], labels=['Group 1', 'Group 2'])
axes[1].set_ylabel('Investment', fontsize=12, fontweight='bold')
axes[1].set_title('Investment Distribution by Group', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nGroup statistics:")
print(f"\nGroup 1 (Firms 1-5):")
print(f"  Mean investment: {group1_inv.mean():.2f}")
print(f"  Std deviation: {group1_inv.std():.2f}")

print(f"\nGroup 2 (Firms 6-10):")
print(f"  Mean investment: {group2_inv.mean():.2f}")
print(f"  Std deviation: {group2_inv.std():.2f}")

if I_observed > E_I:
    print("\nPositive Moran's I indicates firms within same group are more similar")
    print("than firms between groups → Spatial clustering!")

---

## Summary

In these exercises, you practiced:

✅ **Exercise 1**: Comparing different weight matrix types (rook, queen, KNN)
✅ **Exercise 2**: Creating custom distance-based weights with exponential decay
✅ **Exercise 3**: Computing spatial lags using real panel data
✅ **Exercise 4**: Testing for spatial autocorrelation with Moran's I

### Key Skills Acquired

1. **Weight matrix construction**: Contiguity, distance, KNN approaches
2. **Normalization**: Row and spectral normalization methods
3. **Spatial lags**: Computing weighted averages of neighbors
4. **Autocorrelation testing**: Moran's I statistic and interpretation

### Weight Matrix Decision Guide

| Scenario | Recommended Weight Matrix | Reason |
|----------|---------------------------|--------|
| Regular grid (states, counties) | Rook or Queen contiguity | Natural geographic neighbors |
| Irregular boundaries | Queen contiguity | More inclusive definition |
| Point locations | Distance-based or KNN | No clear boundaries |
| Economic spillovers | Inverse distance or exponential | Theory of distance decay |
| Uneven spatial distribution | KNN | Ensures balanced connectivity |
| Trade/network data | Economic distance | Based on actual flows |

### Normalization Guide

- **Row normalization**: Most common; weights sum to 1 per row
  - Interpretation: Share of influence
  - Spatial lag = weighted average

- **Spectral normalization**: For theoretical models
  - Ensures eigenvalues ≤ 1
  - Required for some spatial econometric estimators

---

### Next Steps

You are now ready for:

**Module 4: Spatial Panel Models**
- Spatial Lag Model (SAR)
- Spatial Error Model (SEM)
- Spatial Durbin Model (SDM)
- Direct and indirect effects decomposition

Or continue with:

**Module 2: Classical Panel Estimators**
- Fixed Effects, Random Effects
- Then return to spatial models later

---

In [None]:
print("="*70)
print("SOLUTIONS COMPLETED!")
print("="*70)
print("\nYou've successfully completed all exercises in Tutorial 04.")
print("You now understand spatial weight matrices and autocorrelation!")
print("\nNext: Module 2 (Classical Estimators) or Module 4 (Spatial Models)")
print("\nExcellent work!")