# **Module 1: Spatial Dependence and Regression**

#### Data
For this workshop, data are created and saved to the directory `./data-module-1/`.
- `mnp.shp` -  a pseudo dataset representing hypothetical pest stress for selected Minnesota counties.

#### Software
To execute the code you will need a Python environment with the packages imported below. The default environment does not have all required packages to execute this script. Therefore, run the following command beforehand to install `PySAL` (Python Spatial Analysis Library: http://pysal.org/pysal/index.html):
- `pip install pysal --user`

In [None]:
# general use packages
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as statsmodels

# geospatial packages
import geopandas as gpd
from libpysal import weights
import esda
import spreg
from splot.esda import plot_moran, plot_local_autocorrelation
from splot.libpysal import plot_spatial_weights

import os
os.environ['PROJ_LIB'] = '/opt/conda/envs/user_default/share/proj'

### **Explore input data**

In [None]:
mnp = gpd.read_file("./data-module-1/mnp.shp")
print (f"Coordinate reference system is {mnp.crs}")
print (f"Number of records is {len(mnp)}")
mnp.head()

In [None]:
fig, ax = plt.subplots(figsize=(14,8), tight_layout=True)

mnp.plot(ax=ax, column="PEST", legend=True, scheme="User_Defined", cmap="YlOrBr", 
         edgecolor="grey", classification_kwds=dict(bins=[40,60,80,100,120]),
         legend_kwds={"labels": ["< 40", "40 - 60", "60 - 80", "80 - 100", "100 - 120", "> 120"]})
mnp["coords"] = mnp["geometry"].apply(lambda x: x.representative_point().coords[:])
mnp["coords"] = [coords[0] for coords in mnp["coords"]]
for idx, row in mnp.iterrows():
    ax.annotate(text=idx, xy=row["coords"], horizontalalignment="center")
ax.set_title("Minnesota Pest Pressure for selected counties", weight="bold")

### **Spatial weights**

#### How to define a neighbourhood in the form of spatial weights?

In [None]:
# calculate neighboring using Queen's case (contiguity)
mnp_nbq = weights.contiguity.Queen.from_dataframe(mnp, use_index=False)
# calculate neighboring using Rook's case (contiguity)
mnp_nbr = weights.contiguity.Rook.from_dataframe(mnp, use_index=False)
# calculate neighboring using K-nearest neighbors (distance-based)
mnp_nbk3 = weights.distance.KNN.from_dataframe(mnp, k=3)
# calculate neighboring by distance (distance-based)
mnp_nbd = weights.distance.DistanceBand.from_dataframe(mnp, 80000)

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True)

plot_spatial_weights(mnp_nbq, mnp, ax=axs[0, 0])
axs[0, 0].set_title("Queen's Case Contiguity")

plot_spatial_weights(mnp_nbr, mnp, ax=axs[0, 1])
axs[0, 1].set_title("Rook's Case Contiguity")

plot_spatial_weights(mnp_nbk3, mnp, ax=axs[1, 0])
axs[1, 0].set_title("K-nearest Neighbors (k=3)")

plot_spatial_weights(mnp_nbd, mnp, ax=axs[1, 1])
axs[1, 1].set_title("Distance (80,000)")

plt.show()

#### Characterize a spatial weights matrix

In [None]:
print (f"Number of units: {mnp_nbd.n}")
print (f"Number of nonzero weights: {mnp_nbd.nonzero}")
print (f"Percentage of nonzero weights: {mnp_nbd.pct_nonzero}")
print (f"Average number of neighbors: {mnp_nbd.mean_neighbors}")
print (f"Largest number of neighbors is {mnp_nbd.max_neighbors}")
print (f"Minimum number of neighbors is {mnp_nbd.min_neighbors}")
print (f"Number of units without any neighbors {len(mnp_nbd.islands)}")
print (f"Neighbour list: {mnp_nbd.neighbors}")

#### Spatial weights transformation and weights summary
In this examples, we are setting transformations of weights and then computing an adjacency list representation of a weights object. Two different transforms are presented: `B` – Binary and `R` – Row-standardization.

In [None]:
# Spatial Weights Summary - Binary
mnp_nbd.set_transform("B")
mnp_nbd_lw_r = mnp_nbd.to_adjlist()
print (f"Weights: {mnp_nbd.weights}")
print ("Weights summary: ")
print (mnp_nbd_lw_r["weight"].describe())

In [None]:
# Spatial Weights Summary - Row Standardized
mnp_nbd.set_transform("R")
mnp_nbd_lw_r = mnp_nbd.to_adjlist()
print (f"Weights: {mnp_nbd.weights}")
print ("Weights summary: ")
print (mnp_nbd_lw_r["weight"].describe())

#### Spatial lag

In [None]:
mnp["PEST_lag"] = weights.spatial_lag.lag_spatial(mnp_nbd, mnp["PEST"])

In [None]:
mnp.head()

In [None]:
fig, axs = plt.subplots(1,2, figsize=(12,3), tight_layout=True)

mnp.plot(ax=axs[0], column="PEST", cmap="plasma", legend=True, vmin=20, vmax=130)
axs[0].set_title("Pest Pressure", weight="bold")

mnp.plot(ax=axs[1], column="PEST_lag", cmap="plasma", legend=True, vmin=20, vmax=130)
axs[1].set_title("Pest Pressure - Spatial Lag", weight="bold")

for idx, row in mnp.iterrows():
    axs[0].annotate(text=idx, xy=row["coords"], horizontalalignment="center")
    axs[1].annotate(text=idx, xy=row["coords"], horizontalalignment="center")

### **Spatial Autocorrelation**

Spatial autocorrelation measures the correlation of a variable with itself across space. Moran's I statistic is one of the most common measures of spatial autocorrelation. It allows to evaluate whether the pattern presented by the features is clustered, dispersed, or random.

#### Global Moran's I

In [None]:
mi = esda.Moran(mnp["PEST"], mnp_nbd)
print ("Moran's I statistic: {}".format(mi.I))
print ("p-value of I under randomization assumption: {}".format(mi.p_rand))
print ("variance of I under randomization assumption: {}".format(mi.VI_rand))
print ("Expected value under normality assumption: {}".format(mi.EI))

In [None]:
plot_moran(mi)
plt.show()

#### Local Moran's I

In [None]:
mi_loc = esda.Moran_Local(mnp["PEST"], mnp_nbd)
print ("Local Moran's I values: {}".format(mi_loc.Is))

In [None]:
plot_local_autocorrelation(mi_loc, mnp, "PEST")
plt.show()

### **Spatial Regression Models**

#### Ordinary Least Squares model - NOT Spatial

In [None]:
# define dependent (response, or y) and independent (explanatory, or x) variables
y = mnp["PEST"].to_numpy()
x = mnp[["HOST"]].values

In [None]:
# adding the constant term
x_ = statsmodels.add_constant(x)
# performing the regression
# and fitting the model
result = statsmodels.OLS(y, x_).fit()
 # printing the summary table
print(result.summary())

#### Ordinary Least Squares model - WITH Spatial Diagnostics

In [None]:
mnp_ols = spreg.OLS(y, x, w=mnp_nbd, name_w="Distance based", name_x=["HOST"], name_y="PEST", 
                 name_ds="MN Pest Pressure", white_test=True, spat_diag=True, moran=True)
print(mnp_ols.summary)

In [None]:
mi_ols = esda.Moran(mnp_ols.u, mnp_nbd)
print ("Moran's I statistic: {}".format(mi_ols.I))

In [None]:
plot_moran(mi_ols)
plt.show()

#### ML estimation of the spatial error model

In [None]:
mnp_sem = spreg.ML_Error(y, x, w=mnp_nbd, name_w="Distance based", name_x=["HOST"], name_y="PEST", 
                   name_ds="MN Pest Pressure")
print(mnp_sem.summary)

#### ML estimation of the spatial lag model

In [None]:
mnp_slm = spreg.ML_Lag(y, x, w=mnp_nbd, name_w="Distance based", name_x=["HOST"], name_y="PEST", 
                 name_ds="MN Pest Pressure")
print ("Estimate of spatial autoregressive coefficient rho: {}".format(mnp_slm.rho))
print(mnp_slm.summary)

#### Spatial Durbin model
Although some models are not directly offered by PySal APIs, they can be derived from existing standard models. For example, a spatial Durbin model can be estimated by computing a spatial lag of  independent variables and then adding the set of lagged variables to the original independent variables to run a spatial lag model.  

In [None]:
lag_x = weights.lag_spatial(mnp_nbd, x)
new_x = np.hstack((x,lag_x))

In [None]:
mnp_sdm = spreg.ML_Lag(y, new_x, w=mnp_nbd, name_w="Distance based", name_x=["HOST", "HOST_lag"], name_y="PEST", 
                 name_ds="MN Pest Pressure")
print ("Estimate of spatial autoregressive coefficient rho: {}".format(mnp_sdm.rho))
print(mnp_sdm.summary)

### **Exercises**
#### Data
For the exercies, data are created and saved to the directory `./data-module-1/`.
- `mwi.shp` -  a dataset downloaded from the Malawi Living Standard Measurement Survey Integrated Household Sample (LSMS-IHS) Data Wave 5 Data (available from https://microdata.worldbank.org/index.php/catalog/3818).

**Question 1. Read the vector dataset `mwi.shp` into a `GeoDataFrame`. Print its Coordinate Reference System. Explore the attributes of this dataset.**

**Question 2. Calculate neighboring using Queen's case (contiguity), Rook's case (contiguity), K-nearest neighbors (k=3), and distance (200,000 m). Visualize and compare all 4 weights networks. What differences do you see?** 

**Question 3.  Print the properties for Distance based spatial weights matrix, such as number of units, number of nonzero weights, etc.**

**Question 4. Apply row-standardized transform to your Distance based neighbourhood.**

**Question 5. Run the Moran's I statistic to test the spatial autocorrelation for `poverty` variable. Use Distance based neighbouring structure. Visualize Moran's I plot for `poverty` variable.**

**Question 6. Compute and visualize Local Moran's I for `poverty` variable.**

**Question 7. Run the Ordinary least squares model with Spatial Diagnostics. Use Distance based neighbouring structure. Predict `poverty` as a function of cropland cultivated `croplnd`, livestock owned `livstck`, share of off-farm income `income`, years of education `edu`, female head of household `female` and tobacco growing household `tobccHH`.**

**Question 8. Run ML estimation of the spatial lag model. Use the same neighbouring structure, `x`, and `y` formulation as for the previous question.**