# Spatial Lags: Concept and Applications

Spatial lags represent a way to incorporate spatial relationships into the analysis of spatial data. A spatial lag for a variable at a location is the weighted average of the values of that variable at neighboring locations. This concept is rooted in the idea of spatial dependence, where the value of a variable in one location depends on values in nearby locations. 

## **Why Use Spatial Lags?**
#### Spatial lags are used to:
 - Capture spatial dependencies in data (e.g., areas with similar socio-economic characteristics are often geographically clustered).
 - Include spatial effects in regression models to improve model accuracy and interpretability.
 - Measure the influence of neighbors' characteristics on outcomes at a specific location.

## **When Should Spatial Lags Be Used?**
## Spatial lags are relevant when:
 - There is a clear spatial structure in the data (e.g., geographic proximity, adjacency).
 - Spatial autocorrelation exists, indicated by tests like Moran's I or Geary's C.
 - Spatial spillovers or diffusion effects are hypothesized (e.g., policies, economic trends, or environmental effects).

## **Utilities of Spatial Lags**
 1. **Descriptive Analysis**: Understanding spatial patterns and clusters.
 2. **Regression Models**: Including spatial lags as independent variables helps model spatial spillover effects (Spatial Lag Model).
 3. **Policy Implications**: Assessing the impact of interventions or neighboring regions' influence.
 4. **Visualization**: Creating maps of spatial lag variables to identify local trends.

## **Key References**:
 - Anselin, L. (1988). *Spatial Econometrics: Methods and Models*. Springer Science & Business Media.
 - LeSage, J. P., & Pace, R. K. (2009). *Introduction to Spatial Econometrics*. CRC Press.
 - Griffith, D. A. (1987). *Spatial Autocorrelation: A Primer*. Resource Publications in Geography.

## The following code demonstrates the creation, use, and analysis of spatial lags, including visualization and regression models, using Python and freely available datasets.


In [None]:
import geopandas as gpd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from libpysal.weights import Queen, W
from pysal.model import spreg
from pysal.explore.esda import Moran
from pysal.explore.esda.moran import Moran_Local

# Load a sample dataset: the Columbus dataset
# This dataset contains socio-economic data and geometry for neighborhoods in Columbus, Ohio.
map_data = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))  # Replace with Columbus dataset path
print(map_data.head())

# Step 1: Create a neighbors list
# A neighbors list defines which spatial units are considered "neighbors".
# Here, we use the Queen contiguity method (shares at least one boundary or corner).
queen_weights = Queen.from_dataframe(map_data)

# Step 2: Generate a spatial weights matrix
# Convert the neighbors list into a spatial weights matrix.
# The "W" style standardizes rows to sum to 1 (row-standardized matrix).
w_matrix = W.from_dataframe(map_data, ids=map_data.index)

# Step 3: Create a spatial lag
# A spatial lag is the weighted average of a variable for each spatial unit's neighbors.
# Here, we calculate the spatial lag of the "pop_est" variable (example).
map_data['spatial_lagged_pop'] = w_matrix.sparse.dot(map_data['pop_est'])

# Inspect the new spatial lag column
print(map_data[['pop_est', 'spatial_lagged_pop']].head())

# Step 4: Visualize the spatial lag
# Plot the original variable and its spatial lag side-by-side
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
map_data.plot(column='pop_est', ax=axes[0], legend=True, cmap='OrRd')
axes[0].set_title('Original Population Estimate')
map_data.plot(column='spatial_lagged_pop', ax=axes[1], legend=True, cmap='OrRd')
axes[1].set_title('Spatial Lag of Population')
plt.show()

# Step 5: Regression analysis with spatial lags
# Simulate a dependent variable (e.g., crime rate, CRIME) for illustration
np.random.seed(123)
map_data['CRIME'] = 50 + 0.5 * map_data['pop_est'] - 0.3 * map_data['spatial_lagged_pop'] + np.random.normal(0, 10, len(map_data))

# Fit a linear regression model
X = map_data[['pop_est', 'spatial_lagged_pop']]
X = np.hstack((np.ones((len(X), 1)), X))  # Add intercept
Y = map_data['CRIME'].values
model = spreg.OLS(Y, X)
print(model.summary)

# Step 6: Explore spatial autocorrelation
# Moran's I measures spatial autocorrelation in a variable.
# It is calculated as:
#   I = (N / W) * (sum(i, j) w_ij (x_i - x_bar)(x_j - x_bar)) / (sum(i) (x_i - x_bar)^2)
#
# Where:
# - N: Number of spatial units.
# - W: Sum of all spatial weights.
# - x_i, x_j: Observed values at locations i and j.
# - x_bar: Mean of the observed values.
# - w_ij: Spatial weight between locations i and j.
#
# Interpretation:
# - Positive Moran’s I indicates clustering of similar values.
# - Negative Moran’s I indicates a dispersed pattern.
# - Values near zero suggest spatial randomness.
moran_test = Moran(map_data['pop_est'], w_matrix)
print(f"Moran's I: {moran_test.I}, p-value: {moran_test.p_sim}")

# Step 7: Advanced: Simulate data for further examples
# If no suitable dataset is available, simulate spatial data
def simulate_spatial_data(n_units=50, seed=42):
    np.random.seed(seed)
    coords = np.random.rand(n_units, 2)  # Random coordinates
    nb = libpysal.weights.KNN.from_array(coords, k=4)  # Nearest neighbors
    wmat = nb.sparse  # Sparse weights matrix
    X = np.random.randn(n_units)  # Simulate independent variable
    Y = 0.6 * X + wmat.dot(X) + np.random.randn(n_units)  # Generate response with spatial lag
    return pd.DataFrame({'ID': np.arange(n_units), 'X': X, 'Y': Y, 'spatial_lagged_X': wmat.dot(X)})

# Simulate data and analyze
simulated_data = simulate_spatial_data()
sim_X = simulated_data[['X', 'spatial_lagged_X']]
sim_X = np.hstack((np.ones((len(sim_X), 1)), sim_X))  # Add intercept
sim_Y = simulated_data['Y'].values
sim_model = spreg.OLS(sim_Y, sim_X)
print(sim_model.summary)