# Spatial Durbin Model (SDM)

**Duration**: 150-170 minutes | **Level**: Advanced

---

## Objective

Master the **Spatial Durbin Model (SDM)** - the most flexible spatial model that combines:
- **Endogenous spillovers** (œÅWy) like SAR
- **Exogenous spillovers** (WXŒ∏) from neighbors' characteristics

Learn when SDM is superior to SAR, estimate via QML/ML, interpret Œ∏ coefficients, and test model restrictions.

---

## Prerequisites

- Notebooks 01-04 completed (SAR and SEM understood)
- Matrix algebra for marginal effects
- Understanding of likelihood ratio tests

## 1. Introduction to SDM

### Model Specification

$$
y = \rho Wy + X\beta + WX\theta + \alpha + \varepsilon
$$

**Components**:
- **œÅWy**: Endogenous spatial spillover (like SAR)
- **XŒ≤**: Direct effect of own characteristics
- **WXŒ∏**: Exogenous spatial spillover (neighbors' characteristics)
- **Œ±**: Fixed or random effects
- **Œµ**: Error term

### Why SDM?

1. **Flexibility**: Nests SAR (Œ∏=0) and SDEM (œÅ=0)
2. **Realism**: Both types of spillovers likely in real data
3. **Testable**: Can test restrictions to simplify
4. **Complete**: Captures all spatial channels

### Economic Example: Regional Growth

- **Œ≤_invest**: Effect of own investment on own growth
- **Œ∏_invest**: Effect of neighbors' investment on own growth (knowledge spillovers)
- **œÅ**: Contagion/imitation in growth rates

**Interpretation**: A region's growth depends on:
1. Its own investment (Œ≤)
2. Neighbors' investment (Œ∏) - technology spillovers, infrastructure links
3. Neighbors' growth (œÅ) - demand linkages, policy imitation

In [None]:
# Setup
import sys
from pathlib import Path
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import chi2
import warnings
warnings.filterwarnings('ignore')

# PanelBox path
panelbox_path = Path("/home/guhaase/projetos/panelbox")
sys.path.insert(0, str(panelbox_path))

from panelbox.models.spatial import SpatialDurbin, SpatialLag
from libpysal.weights import Queen

# Plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úì Imports successful")
print(f"‚úì PanelBox path: {panelbox_path}")

### Load Regional Growth Data

We'll use European NUTS-2 regional data with:
- GDP growth rates
- Investment rates
- Education levels
- R&D spending

In [None]:
# Generate synthetic regional growth data
np.random.seed(42)

n_regions = 100
n_years = 10
n_obs = n_regions * n_years

# Generate grid for spatial structure
grid_size = int(np.sqrt(n_regions))
coords = [(i, j) for i in range(grid_size) for j in range(grid_size)]

# Generate covariates with spatial correlation
investment = np.random.uniform(15, 35, n_regions)  # Investment rate (%)
education = np.random.uniform(20, 80, n_regions)    # Tertiary education (%)
rd_spending = np.random.uniform(0.5, 3.5, n_regions)  # R&D (% GDP)

# Create panel data
data_list = []
for year in range(n_years):
    for idx, region_id in enumerate(range(n_regions)):
        # Add time variation
        inv_t = investment[idx] + np.random.normal(0, 2)
        edu_t = education[idx] + np.random.normal(0, 5)
        rd_t = rd_spending[idx] + np.random.normal(0, 0.3)
        
        data_list.append({
            'region_id': region_id,
            'year': 2010 + year,
            'x_coord': coords[region_id][0],
            'y_coord': coords[region_id][1],
            'investment': inv_t,
            'education': edu_t,
            'rd_spending': rd_t,
            'region_name': f'Region_{region_id:03d}'
        })

regions_df = pd.DataFrame(data_list)

# Generate GDP growth with spatial structure
# Will add this after creating W matrix

print(f"‚úì Created panel data: {len(regions_df):,} observations")
print(f"  {n_regions} regions √ó {n_years} years")
print(f"\nSample data:")
print(regions_df.head(10))

In [None]:
# Create GeoDataFrame for spatial weights
from shapely.geometry import Point

# Get unique regions for spatial structure
unique_regions = regions_df.drop_duplicates('region_id')[['region_id', 'region_name', 'x_coord', 'y_coord']].copy()
unique_regions['geometry'] = unique_regions.apply(
    lambda row: Point(row['x_coord'], row['y_coord']), axis=1
)

gdf = gpd.GeoDataFrame(unique_regions, geometry='geometry')

# Build spatial weights matrix (Queen contiguity on grid)
W = Queen.from_dataframe(gdf)
W.transform = 'r'  # Row-standardized

print(f"‚úì Spatial weights matrix created")
print(f"  {W.n} regions")
print(f"  Average neighbors: {W.cardinalities.values().mean():.1f}")
print(f"  Min neighbors: {min(W.cardinalities.values())}")
print(f"  Max neighbors: {max(W.cardinalities.values())}")

In [None]:
# Generate GDP growth with SDM structure
# y = œÅWy + XŒ≤ + WXŒ∏ + Œ± + Œµ

rho_true = 0.35      # Endogenous spillover
beta_true = np.array([0.15, 0.08, 0.25])  # [investment, education, rd_spending]
theta_true = np.array([0.10, 0.05, 0.15]) # Spillovers from neighbors' X

# For each year
gdp_growth_list = []

for year in range(n_years):
    year_data = regions_df[regions_df['year'] == 2010 + year].sort_values('region_id')
    
    X = year_data[['investment', 'education', 'rd_spending']].values
    
    # Compute WX
    W_dense = W.full()[0]
    WX = W_dense @ X
    
    # Region fixed effects
    region_fe = np.random.normal(2.0, 0.5, n_regions)
    
    # Error term
    epsilon = np.random.normal(0, 1.0, n_regions)
    
    # Solve for y: y = œÅWy + XŒ≤ + WXŒ∏ + Œ± + Œµ
    # (I - œÅW)y = XŒ≤ + WXŒ∏ + Œ± + Œµ
    # y = (I - œÅW)^{-1}(XŒ≤ + WXŒ∏ + Œ± + Œµ)
    
    I = np.eye(n_regions)
    A_inv = np.linalg.inv(I - rho_true * W_dense)
    
    y = A_inv @ (X @ beta_true + WX @ theta_true + region_fe + epsilon)
    
    gdp_growth_list.extend(y)

regions_df['gdp_growth'] = gdp_growth_list

print("‚úì Generated GDP growth with SDM structure")
print(f"  True œÅ = {rho_true}")
print(f"  True Œ≤ = {beta_true}")
print(f"  True Œ∏ = {theta_true}")
print(f"\nGDP growth summary:")
print(regions_df['gdp_growth'].describe())

## 2. Estimating SDM

The SDM model extends SAR by adding spatial lags of exogenous variables (WX).

### Estimation Strategy

1. **Quasi-Maximum Likelihood (QML)**: Robust to non-normality
2. **Maximum Likelihood (ML)**: Efficient under normality
3. **Fixed Effects**: Control for region heterogeneity

In [None]:
# Prepare panel data
regions_df['entity_id'] = regions_df['region_id']
regions_df['time'] = regions_df['year']

print("‚úì Panel structure prepared")
print(f"  Entity column: entity_id")
print(f"  Time column: time")
print(f"  Unique entities: {regions_df['entity_id'].nunique()}")
print(f"  Time periods: {regions_df['time'].nunique()}")

In [None]:
# Estimate SDM
print("Estimating Spatial Durbin Model (SDM)...\n")

sdm_model = SpatialDurbin(
    formula="gdp_growth ~ investment + education + rd_spending",
    data=regions_df,
    entity_col='entity_id',
    time_col='time',
    W=W
)

sdm_results = sdm_model.fit(effects='fixed', method='qml')

print("‚úì SDM estimation complete\n")
print(sdm_results.summary())

### Interpret SDM Results

SDM provides three sets of parameters:
- **œÅ (rho)**: Endogenous spatial spillover
- **Œ≤**: Direct effects of own characteristics
- **Œ∏ (theta)**: Exogenous spillovers from neighbors' characteristics

In [None]:
print("\n" + "="*70)
print("SDM COEFFICIENT INTERPRETATION")
print("="*70)

# Endogenous spillover
print(f"\n1. ENDOGENOUS SPILLOVER (œÅ)")
print(f"   œÅ = {sdm_results.rho:.4f}")
if hasattr(sdm_results, 'rho_pvalue'):
    print(f"   p-value = {sdm_results.rho_pvalue:.4f}")
print(f"\n   Interpretation: {abs(sdm_results.rho)*100:.1f}% of neighbors' growth")
print(f"   transmits to own growth (demand linkages, policy spillovers)")

# Direct effects (Œ≤)
print(f"\n2. DIRECT EFFECTS (Œ≤)")
print(f"   Effect of OWN characteristics on OWN growth:")
print()
for var in ['investment', 'education', 'rd_spending']:
    if var in sdm_results.params.index:
        coef = sdm_results.params.loc[var]
        se = sdm_results.std_errors.loc[var] if hasattr(sdm_results, 'std_errors') else np.nan
        t_stat = coef / se if not np.isnan(se) else np.nan
        
        sig = "***" if abs(t_stat) > 2.576 else "**" if abs(t_stat) > 1.96 else "*" if abs(t_stat) > 1.645 else ""
        
        print(f"   {var:15s}: Œ≤ = {coef:7.4f} {sig}")
        print(f"   {'':15s}   (SE = {se:.4f}, t = {t_stat:.2f})")
        print()

# Exogenous spillovers (Œ∏)
print(f"\n3. EXOGENOUS SPILLOVERS (Œ∏)")
print(f"   Effect of NEIGHBORS' characteristics on OWN growth:")
print()

wx_vars = [col for col in sdm_results.params.index if col.startswith('W_')]
if wx_vars:
    for wx_var in wx_vars:
        orig_var = wx_var.replace('W_', '')
        coef = sdm_results.params.loc[wx_var]
        se = sdm_results.std_errors.loc[wx_var] if hasattr(sdm_results, 'std_errors') else np.nan
        t_stat = coef / se if not np.isnan(se) else np.nan
        
        sig = "***" if abs(t_stat) > 2.576 else "**" if abs(t_stat) > 1.96 else "*" if abs(t_stat) > 1.645 else ""
        
        print(f"   W¬∑{orig_var:13s}: Œ∏ = {coef:7.4f} {sig}")
        print(f"   {'':15s}   (SE = {se:.4f}, t = {t_stat:.2f})")
        print()
else:
    print("   [WX terms not found in results - check model specification]")

print("="*70)
print("\n‚ö†Ô∏è  NOTE: These are NOT marginal effects!")
print("   Marginal effects account for feedback loops (see Notebook 06)")

## 3. Testing SDM vs SAR

### Likelihood Ratio Test

**H‚ÇÄ**: Œ∏ = 0 (SDM reduces to SAR)

**Test Statistic**: LR = 2(‚Ñì_SDM - ‚Ñì_SAR) ~ œá¬≤(k)

where k = number of WX variables

In [None]:
# Estimate SAR for comparison
print("Estimating Spatial Lag Model (SAR) for comparison...\n")

sar_model = SpatialLag(
    formula="gdp_growth ~ investment + education + rd_spending",
    data=regions_df,
    entity_col='entity_id',
    time_col='time',
    W=W
)

sar_results = sar_model.fit(effects='fixed', method='qml')

print("‚úì SAR estimation complete\n")

In [None]:
# Likelihood Ratio Test
print("\n" + "="*70)
print("LIKELIHOOD RATIO TEST: SDM vs SAR")
print("="*70)

ll_sdm = sdm_results.log_likelihood
ll_sar = sar_results.log_likelihood

lr_statistic = 2 * (ll_sdm - ll_sar)
df = 3  # Number of WX variables added (investment, education, rd_spending)
p_value = 1 - chi2.cdf(lr_statistic, df)

print(f"\nH‚ÇÄ: Œ∏ = 0 (SDM reduces to SAR)")
print(f"H‚ÇÅ: Œ∏ ‚â† 0 (SDM is necessary)\n")

print(f"Log-likelihood (SAR): {ll_sar:.2f}")
print(f"Log-likelihood (SDM): {ll_sdm:.2f}")
print(f"\nLR statistic: {lr_statistic:.3f}")
print(f"Degrees of freedom: {df}")
print(f"p-value: {p_value:.4f}")

print(f"\nCritical value (Œ±=0.05): {chi2.ppf(0.95, df):.3f}")
print(f"Critical value (Œ±=0.01): {chi2.ppf(0.99, df):.3f}")

print("\n" + "-"*70)
if p_value < 0.01:
    print("‚úì‚úì STRONGLY REJECT H‚ÇÄ at Œ±=0.01")
    print("   ‚Üí SDM is SIGNIFICANTLY superior to SAR")
    print("   ‚Üí Exogenous spillovers (WX) are highly significant")
    print("   ‚Üí Use SDM for inference")
elif p_value < 0.05:
    print("‚úì REJECT H‚ÇÄ at Œ±=0.05")
    print("   ‚Üí SDM is superior to SAR")
    print("   ‚Üí Exogenous spillovers (WX) are significant")
else:
    print("‚úó FAIL TO REJECT H‚ÇÄ")
    print("   ‚Üí SAR is sufficient")
    print("   ‚Üí No evidence of exogenous spillovers")
    print("   ‚Üí Simpler SAR model preferred (parsimony)")

print("="*70)

## 4. Interpreting Œ∏ Coefficients

### Key Distinction: Œ∏ ‚â† Marginal Effect

- **Œ∏**: Direct exogenous spillover from neighbors' X
- **Marginal Effect**: Total effect including feedback through œÅWy

### Economic Interpretation Framework

For each variable, we compare:
1. **Œ≤**: Own characteristic ‚Üí Own outcome
2. **Œ∏**: Neighbors' characteristic ‚Üí Own outcome
3. **Economic mechanism**: Why Œ∏ ‚â† 0?

In [None]:
print("\n" + "="*70)
print("ECONOMIC INTERPRETATION OF Œ∏ COEFFICIENTS")
print("="*70)

# Helper function
def interpret_spillover(var_name, beta, theta, beta_se, theta_se):
    """
    Interpret exogenous spillover coefficient
    """
    t_beta = beta / beta_se
    t_theta = theta / theta_se
    
    sig_beta = "***" if abs(t_beta) > 2.576 else "**" if abs(t_beta) > 1.96 else "*" if abs(t_beta) > 1.645 else ""
    sig_theta = "***" if abs(t_theta) > 2.576 else "**" if abs(t_theta) > 1.96 else "*" if abs(t_theta) > 1.645 else ""
    
    print(f"\n{var_name.upper()}")
    print("-" * 70)
    print(f"  Œ≤ (direct):            {beta:7.4f} {sig_beta} (t = {t_beta:6.2f})")
    print(f"  Œ∏ (exogenous spillover): {theta:7.4f} {sig_theta} (t = {t_theta:6.2f})")
    
    if abs(t_theta) > 1.96:
        print(f"\n  ‚úì Significant exogenous spillover detected")
    else:
        print(f"\n  ‚úó No significant exogenous spillover")
    
    return sig_beta, sig_theta

# Investment
if 'investment' in sdm_results.params.index:
    beta_inv = sdm_results.params.loc['investment']
    se_beta_inv = sdm_results.std_errors.loc['investment']
    
    wx_inv = 'W_investment'
    if wx_inv in sdm_results.params.index:
        theta_inv = sdm_results.params.loc[wx_inv]
        se_theta_inv = sdm_results.std_errors.loc[wx_inv]
        
        interpret_spillover('investment', beta_inv, theta_inv, se_beta_inv, se_theta_inv)
        
        if theta_inv > 0:
            print(f"\n  Interpretation:")
            print(f"    ‚Üí 1 percentage point increase in NEIGHBORS' investment rate")
            print(f"      raises OWN growth by {theta_inv:.3f} percentage points DIRECTLY")
            print(f"    ‚Üí This is BEFORE accounting for œÅWy feedback loops")
            print(f"\n  Economic mechanisms:")
            print(f"    ‚Ä¢ Knowledge spillovers (technology diffusion)")
            print(f"    ‚Ä¢ Infrastructure complementarities (roads, utilities)")
            print(f"    ‚Ä¢ Agglomeration economies")
        elif theta_inv < 0:
            print(f"\n  Interpretation:")
            print(f"    ‚Üí Neighbors' investment REDUCES own growth")
            print(f"\n  Possible mechanisms:")
            print(f"    ‚Ä¢ Competition for capital/labor")
            print(f"    ‚Ä¢ Market stealing effects")

# Education
if 'education' in sdm_results.params.index:
    beta_edu = sdm_results.params.loc['education']
    se_beta_edu = sdm_results.std_errors.loc['education']
    
    wx_edu = 'W_education'
    if wx_edu in sdm_results.params.index:
        theta_edu = sdm_results.params.loc[wx_edu]
        se_theta_edu = sdm_results.std_errors.loc[wx_edu]
        
        interpret_spillover('education', beta_edu, theta_edu, se_beta_edu, se_theta_edu)
        
        if theta_edu > 0:
            print(f"\n  Economic mechanisms:")
            print(f"    ‚Ä¢ Knowledge networks across regions")
            print(f"    ‚Ä¢ Labor mobility (skilled workers commute)")
            print(f"    ‚Ä¢ Collaboration spillovers")

# R&D spending
if 'rd_spending' in sdm_results.params.index:
    beta_rd = sdm_results.params.loc['rd_spending']
    se_beta_rd = sdm_results.std_errors.loc['rd_spending']
    
    wx_rd = 'W_rd_spending'
    if wx_rd in sdm_results.params.index:
        theta_rd = sdm_results.params.loc[wx_rd]
        se_theta_rd = sdm_results.std_errors.loc[wx_rd]
        
        interpret_spillover('r&d spending', beta_rd, theta_rd, se_beta_rd, se_theta_rd)
        
        if theta_rd > 0:
            print(f"\n  Economic mechanisms:")
            print(f"    ‚Ä¢ Technology spillovers (patents, innovations)")
            print(f"    ‚Ä¢ R&D collaborations across borders")
            print(f"    ‚Ä¢ Supply chain linkages")

print("\n" + "="*70)

## 5. Model Comparison: OLS vs SAR vs SDM

Compare three models:
1. **OLS**: No spatial dependence
2. **SAR**: Endogenous spillovers only (œÅWy)
3. **SDM**: Both endogenous and exogenous spillovers (œÅWy + WXŒ∏)

In [None]:
# Estimate OLS for baseline
from sklearn.linear_model import LinearRegression

X_vars = ['investment', 'education', 'rd_spending']
X = regions_df[X_vars].values
y = regions_df['gdp_growth'].values

ols = LinearRegression().fit(X, y)

print("‚úì OLS estimation complete")

In [None]:
# Create comparison table
print("\n" + "="*80)
print("COEFFICIENT COMPARISON: OLS vs SAR vs SDM")
print("="*80)

comparison_data = []

for var in X_vars:
    idx = X_vars.index(var)
    
    ols_coef = ols.coef_[idx]
    sar_coef = sar_results.params.loc[var] if var in sar_results.params.index else np.nan
    sdm_beta = sdm_results.params.loc[var] if var in sdm_results.params.index else np.nan
    
    wx_var = f'W_{var}'
    sdm_theta = sdm_results.params.loc[wx_var] if wx_var in sdm_results.params.index else np.nan
    
    comparison_data.append({
        'Variable': var,
        'OLS': ols_coef,
        'SAR': sar_coef,
        'SDM (Œ≤)': sdm_beta,
        'SDM (Œ∏)': sdm_theta
    })

comparison_df = pd.DataFrame(comparison_data)

print("\n" + comparison_df.to_string(index=False, float_format=lambda x: f'{x:7.4f}'))

print("\n" + "-"*80)
print("Notes:")
print("  ‚Ä¢ OLS ignores spatial dependence ‚Üí Biased if spatial effects present")
print("  ‚Ä¢ SAR captures endogenous spillovers (œÅWy) ‚Üí Better than OLS")
print("  ‚Ä¢ SDM captures BOTH endogenous (œÅWy) AND exogenous (WXŒ∏) spillovers")
print("  ‚Ä¢ SDM Œ≤ ‚â† SAR coefficient (SDM controls for WX)")
print("="*80)

In [None]:
# Model fit comparison
print("\n" + "="*70)
print("MODEL FIT COMPARISON")
print("="*70)

# OLS fit metrics
ols_pred = ols.predict(X)
ols_resid = y - ols_pred
ols_sse = np.sum(ols_resid**2)
ols_n = len(y)
ols_k = len(X_vars) + 1
ols_sigma2 = ols_sse / (ols_n - ols_k)
ols_ll = -0.5 * ols_n * (np.log(2*np.pi) + np.log(ols_sigma2) + 1)
ols_aic = -2*ols_ll + 2*ols_k
ols_bic = -2*ols_ll + np.log(ols_n)*ols_k

print(f"\n{'Model':<15} {'AIC':>12} {'BIC':>12} {'Log-Lik':>14} {'œÅ':>8}")
print("-" * 70)
print(f"{'OLS':<15} {ols_aic:>12.1f} {ols_bic:>12.1f} {ols_ll:>14.2f} {'‚Äî':>8}")
print(f"{'SAR':<15} {sar_results.aic:>12.1f} {sar_results.bic:>12.1f} {sar_results.log_likelihood:>14.2f} {sar_results.rho:>8.3f}")
print(f"{'SDM':<15} {sdm_results.aic:>12.1f} {sdm_results.bic:>12.1f} {sdm_results.log_likelihood:>14.2f} {sdm_results.rho:>8.3f}")

print("\n" + "-"*70)
print("Interpretation:")

if sdm_results.aic < sar_results.aic and sdm_results.aic < ols_aic:
    print("  ‚úì SDM has LOWEST AIC ‚Üí Best model")
    print("    ‚Üí Both endogenous and exogenous spillovers are important")
elif sar_results.aic < sdm_results.aic and sar_results.aic < ols_aic:
    print("  ‚úì SAR has lowest AIC ‚Üí Preferred over SDM")
    print("    ‚Üí Exogenous spillovers not strong enough to justify extra parameters")
else:
    print("  ‚úó OLS has lowest AIC ‚Üí No spatial dependence detected")

if sdm_results.bic < sar_results.bic and sdm_results.bic < ols_bic:
    print("  ‚úì SDM has LOWEST BIC ‚Üí Best model (BIC penalizes complexity more)")
elif sar_results.bic < sdm_results.bic:
    print("  ‚ö† SAR has lower BIC ‚Üí BIC prefers simpler SAR model")
    print("    ‚Üí Trade-off between fit and parsimony")

print("="*70)

## 6. Case Study: Regional Economic Growth

### Research Question

Do neighbors' investments create spillovers in regional growth beyond the contagion effect?

### Hypotheses

1. **H1**: Œ≤_invest > 0 (own investment boosts growth)
2. **H2**: Œ∏_invest > 0 (neighbors' investment boosts growth via spillovers)
3. **H3**: œÅ > 0 (growth is contagious)

### Economic Implications

- If H2 confirmed ‚Üí Coordinated regional investment policies beneficial
- If H2 rejected ‚Üí Regional policies can be independent

In [None]:
print("="*70)
print("CASE STUDY: REGIONAL GROWTH SPILLOVERS")
print("="*70)
print("\nResearch Question:")
print("  Do neighbors' investments create spillovers beyond the")
print("  contagion effect of growth itself?")

print("\n" + "-"*70)
print("HYPOTHESIS TESTS")
print("-"*70)

# H1: Œ≤_investment > 0
if 'investment' in sdm_results.params.index:
    beta_inv = sdm_results.params.loc['investment']
    se_inv = sdm_results.std_errors.loc['investment']
    t_beta = beta_inv / se_inv
    p_beta = 2 * (1 - stats.t.cdf(abs(t_beta), df=sdm_results.nobs - sdm_results.k_params))
    
    print(f"\nH1: Œ≤_investment > 0 (Own investment boosts own growth)")
    print(f"    Œ≤ = {beta_inv:.4f} (SE = {se_inv:.4f})")
    print(f"    t = {t_beta:.2f}, p-value = {p_beta:.4f}")
    
    if p_beta < 0.05 and beta_inv > 0:
        print(f"    ‚úì CONFIRMED: Own investment significantly boosts growth")
        print(f"      ‚Üí 1 p.p. increase in investment ‚Üí {beta_inv:.3f} p.p. growth (direct)")
    elif beta_inv > 0:
        print(f"    ~ SUGGESTIVE but not significant at Œ±=0.05")
    else:
        print(f"    ‚úó REJECTED: No positive effect detected")

# H2: Œ∏_investment > 0
wx_inv = 'W_investment'
if wx_inv in sdm_results.params.index:
    theta_inv = sdm_results.params.loc[wx_inv]
    se_theta = sdm_results.std_errors.loc[wx_inv]
    t_theta = theta_inv / se_theta
    p_theta = 2 * (1 - stats.t.cdf(abs(t_theta), df=sdm_results.nobs - sdm_results.k_params))
    
    print(f"\nH2: Œ∏_investment > 0 (Neighbors' investment creates spillovers)")
    print(f"    Œ∏ = {theta_inv:.4f} (SE = {se_theta:.4f})")
    print(f"    t = {t_theta:.2f}, p-value = {p_theta:.4f}")
    
    if p_theta < 0.05 and theta_inv > 0:
        print(f"    ‚úì‚úì STRONGLY CONFIRMED: Exogenous spillovers are significant")
        print(f"      ‚Üí 1 p.p. increase in NEIGHBORS' investment")
        print(f"        ‚Üí {theta_inv:.3f} p.p. boost to OWN growth")
        print(f"\n    Economic mechanisms:")
        print(f"      ‚Ä¢ Knowledge spillovers (technology diffusion)")
        print(f"      ‚Ä¢ Infrastructure complementarities")
        print(f"      ‚Ä¢ Agglomeration economies")
        print(f"\n    Policy implications:")
        print(f"      ‚Üí Coordinated regional investment strategies beneficial")
        print(f"      ‚Üí Invest in neighbors to boost own growth")
        print(f"      ‚Üí Regional cooperation creates positive-sum outcomes")
    elif theta_inv > 0:
        print(f"    ~ SUGGESTIVE but not significant at Œ±=0.05")
    else:
        print(f"    ‚úó REJECTED: No positive spillovers detected")
        print(f"      ‚Üí Regional investment policies can be independent")

# H3: œÅ > 0
rho = sdm_results.rho
rho_pval = sdm_results.rho_pvalue if hasattr(sdm_results, 'rho_pvalue') else np.nan

print(f"\nH3: œÅ > 0 (Growth is spatially contagious)")
print(f"    œÅ = {rho:.4f}")
if not np.isnan(rho_pval):
    print(f"    p-value = {rho_pval:.4f}")

if not np.isnan(rho_pval) and rho_pval < 0.05 and rho > 0:
    print(f"    ‚úì CONFIRMED: Growth is spatially contagious")
    print(f"      ‚Üí {abs(rho)*100:.1f}% of neighbors' growth transmits to own region")
    print(f"\n    Mechanisms:")
    print(f"      ‚Ä¢ Demand linkages (trade)")
    print(f"      ‚Ä¢ Policy imitation")
    print(f"      ‚Ä¢ Market integration")
elif rho > 0:
    print(f"    ~ Positive but check significance")
else:
    print(f"    ‚úó REJECTED: No spatial contagion")

print("\n" + "="*70)

# Overall conclusion
if 'investment' in sdm_results.params.index and wx_inv in sdm_results.params.index:
    if p_beta < 0.05 and p_theta < 0.05 and beta_inv > 0 and theta_inv > 0:
        print("\nüéØ MAIN FINDING:")
        print("   Both OWN and NEIGHBORS' investment matter for regional growth.")
        print("   SDM is the appropriate model for this phenomenon.")
        print("\nüí° POLICY RECOMMENDATION:")
        print("   Regional investment coordination can create positive spillovers.")
        print("   Investing in neighboring regions benefits all parties.")
    elif p_beta < 0.05 and beta_inv > 0:
        print("\nüéØ MAIN FINDING:")
        print("   Own investment matters, but no evidence of exogenous spillovers.")
        print("   SAR model may be sufficient.")

print("="*70)

## 7. Visualizations

### Coefficient Comparison Plot

Visualize direct effects (Œ≤) vs exogenous spillovers (Œ∏) for each variable.

In [None]:
# Prepare data for visualization
vars_plot = ['investment', 'education', 'rd_spending']
var_labels = ['Investment', 'Education', 'R&D Spending']

betas = []
thetas = []
beta_ses = []
theta_ses = []

for var in vars_plot:
    if var in sdm_results.params.index:
        betas.append(sdm_results.params.loc[var])
        beta_ses.append(sdm_results.std_errors.loc[var])
    else:
        betas.append(0)
        beta_ses.append(0)
    
    wx_var = f'W_{var}'
    if wx_var in sdm_results.params.index:
        thetas.append(sdm_results.params.loc[wx_var])
        theta_ses.append(sdm_results.std_errors.loc[wx_var])
    else:
        thetas.append(0)
        theta_ses.append(0)

betas = np.array(betas)
thetas = np.array(thetas)
beta_ses = np.array(beta_ses)
theta_ses = np.array(theta_ses)

# Create plot
fig, ax = plt.subplots(figsize=(12, 7))

x = np.arange(len(var_labels))
width = 0.35

# Bar plots with error bars (95% CI)
bars1 = ax.bar(x - width/2, betas, width, 
               label='Œ≤ (Direct Effect)',
               yerr=1.96*beta_ses, 
               capsize=5, 
               alpha=0.8,
               color='steelblue',
               edgecolor='black',
               linewidth=1.5)

bars2 = ax.bar(x + width/2, thetas, width, 
               label='Œ∏ (Exogenous Spillover)',
               yerr=1.96*theta_ses, 
               capsize=5, 
               alpha=0.8,
               color='coral',
               edgecolor='black',
               linewidth=1.5)

ax.set_xlabel('Variable', fontsize=13, fontweight='bold')
ax.set_ylabel('Coefficient', fontsize=13, fontweight='bold')
ax.set_title('SDM Coefficients: Direct Effects (Œ≤) vs Exogenous Spillovers (Œ∏)',
             fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x)
ax.set_xticklabels(var_labels, fontsize=11)
ax.axhline(0, color='red', linestyle='--', linewidth=2, alpha=0.7)
ax.legend(fontsize=11, loc='upper left')
ax.grid(True, axis='y', alpha=0.3, linestyle=':')

# Add value labels on bars
for i, (b, t) in enumerate(zip(betas, thetas)):
    if abs(b) > 0.001:
        ax.text(i - width/2, b + 0.01 if b > 0 else b - 0.01, 
                f'{b:.3f}', ha='center', va='bottom' if b > 0 else 'top',
                fontsize=9, fontweight='bold')
    if abs(t) > 0.001:
        ax.text(i + width/2, t + 0.01 if t > 0 else t - 0.01, 
                f'{t:.3f}', ha='center', va='bottom' if t > 0 else 'top',
                fontsize=9, fontweight='bold')

plt.tight_layout()

# Save figure
output_dir = Path('../outputs/figures')
output_dir.mkdir(parents=True, exist_ok=True)
plt.savefig(output_dir / 'nb05_sdm_coefficients.png', dpi=300, bbox_inches='tight')

plt.show()

print("‚úì Figure saved: ../outputs/figures/nb05_sdm_coefficients.png")

In [None]:
# Model comparison visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Panel 1: Investment coefficients across models
models = ['OLS', 'SAR', 'SDM (Œ≤)', 'SDM (Œ∏)']
idx_inv = X_vars.index('investment')
coeffs = [
    ols.coef_[idx_inv],
    sar_results.params.loc['investment'],
    sdm_results.params.loc['investment'],
    sdm_results.params.loc['W_investment'] if 'W_investment' in sdm_results.params.index else 0
]

colors = ['gray', 'steelblue', 'darkblue', 'coral']
bars = ax1.bar(models, coeffs, color=colors, alpha=0.7, edgecolor='black', linewidth=1.5)

ax1.axhline(0, color='red', linestyle='--', linewidth=2, alpha=0.7)
ax1.set_ylabel('Coefficient', fontsize=12, fontweight='bold')
ax1.set_title('Investment Coefficients Across Models', fontsize=13, fontweight='bold')
ax1.grid(True, axis='y', alpha=0.3, linestyle=':')

# Add value labels
for bar, coef in zip(bars, coeffs):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 0.005 if height > 0 else height - 0.005,
             f'{coef:.3f}', ha='center', va='bottom' if height > 0 else 'top',
             fontsize=10, fontweight='bold')

# Panel 2: Information criteria
ic_models = ['OLS', 'SAR', 'SDM']
aics = [ols_aic, sar_results.aic, sdm_results.aic]
bics = [ols_bic, sar_results.bic, sdm_results.bic]

x_pos = np.arange(len(ic_models))
width = 0.35

ax2.bar(x_pos - width/2, aics, width, label='AIC', alpha=0.8, color='teal', edgecolor='black')
ax2.bar(x_pos + width/2, bics, width, label='BIC', alpha=0.8, color='orange', edgecolor='black')

ax2.set_xlabel('Model', fontsize=12, fontweight='bold')
ax2.set_ylabel('Information Criterion', fontsize=12, fontweight='bold')
ax2.set_title('Model Fit Comparison (Lower = Better)', fontsize=13, fontweight='bold')
ax2.set_xticks(x_pos)
ax2.set_xticklabels(ic_models)
ax2.legend(fontsize=11)
ax2.grid(True, axis='y', alpha=0.3, linestyle=':')

plt.tight_layout()
plt.savefig(output_dir / 'nb05_model_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úì Figure saved: ../outputs/figures/nb05_model_comparison.png")

## 8. Summary

### Key Takeaways

1. ‚úì **SDM = SAR + WXŒ∏**: Most flexible spatial model
   - Captures both endogenous (œÅWy) and exogenous (WXŒ∏) spillovers

2. ‚úì **Three types of effects**:
   - **Œ≤**: Direct effect of own characteristics
   - **Œ∏**: Exogenous spillover from neighbors' characteristics
   - **œÅ**: Endogenous spillover (contagion)

3. ‚úì **Test SDM vs SAR**: Use likelihood ratio test
   - H‚ÇÄ: Œ∏ = 0 (SDM reduces to SAR)
   - Reject ‚Üí SDM necessary

4. ‚úì **Œ∏ ‚â† Marginal Effect**: 
   - Œ∏ is DIRECT exogenous spillover
   - Marginal effects account for feedback (Notebook 06)

5. ‚úì **When to use SDM**:
   - Theory predicts both types of spillovers
   - LR test rejects SAR
   - Interested in exogenous spillover mechanisms

### What We Learned

- ‚úì Estimate SDM using PanelBox
- ‚úì Interpret Œ≤, Œ∏, and œÅ parameters
- ‚úì Test SDM vs SAR using LR test
- ‚úì Explain economic mechanisms behind Œ∏
- ‚úì Compare OLS, SAR, and SDM

### Next Steps

**Notebook 06**: Marginal Effects in Spatial Models
- Compute direct, indirect, and total effects
- Account for feedback loops
- Understand the true magnitude of spillovers

---

### Additional Resources

1. **LeSage & Pace (2009)**: *Introduction to Spatial Econometrics*
   - Chapter 2: Spatial Durbin Model
   - Chapter 5: Marginal effects interpretation

2. **Elhorst (2014)**: *Spatial Econometrics*
   - Section 2.3: SDM specification
   - Section 3.4: Model selection

3. **Halleck Vega & Elhorst (2015)**: "The SLX Model"
   - Journal of Regional Science
   - Compares SDM with SLX (no œÅWy)

---

**Notebook complete!** ‚úì

Continue to **Notebook 06** to learn about marginal effects.