# **Module 1: Spatial Dependence and Regression**

### **Exercises**
#### Data
For the exercies, data are created and saved to the directory `./data-module-1/`.
- `mwi.shp` -  a dataset downloaded from the Malawi Living Standard Measurement Survey Integrated Household Sample (LSMS-IHS) Data Wave 5 Data (available from https://microdata.worldbank.org/index.php/catalog/3818).

In [None]:
# general use packages
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as statsmodels

# geospatial packages
import geopandas as gpd
from libpysal import weights
import esda
import spreg
from splot.esda import plot_moran, plot_local_autocorrelation
from splot.libpysal import plot_spatial_weights

import os
os.environ['PROJ_LIB'] = '/opt/conda/envs/user_default/share/proj'

**Question 1. Read the vector dataset `mwi.shp` into a `GeoDataFrame`. Print its Coordinate Reference System. Explore the attributes of this dataset.**

In [None]:
mwi = gpd.read_file("./data-module-1/mwi.shp")
print (mwi.crs)
mwi.head()

**Question 2. Calculate neighboring using Queen's case (contiguity), Rook's case (contiguity), K-nearest neighbors (k=3), and distance (200,000 m). Visualize and compare all 4 weights networks. What differences do you see?** 

In [None]:
# calculate neighboring using Queen's case (contiguity)
mwi_nbq = weights.contiguity.Queen.from_dataframe(mwi, use_index=False)
# calculate neighboring using Rook's case (contiguity)
mwi_nbr = weights.contiguity.Rook.from_dataframe(mwi, use_index=False)
# calculate neighboring using K-nearest neighbors (distance-based)
mwi_nbk3 = weights.distance.KNN.from_dataframe(mwi, k=3)
# calculate neighboring by distance (distance-based)
mwi_nbd = weights.distance.DistanceBand.from_dataframe(mwi, 200000)

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(12, 8), tight_layout=True)

plot_spatial_weights(mwi_nbq, mwi, ax=axs[0])
axs[0].set_title("Queen's Case Contiguity")

plot_spatial_weights(mwi_nbr, mwi, ax=axs[1])
axs[1].set_title("Rook's Case Contiguity")

plot_spatial_weights(mwi_nbk3, mwi, ax=axs[2])
axs[2].set_title("K-nearest Neighbors (k=3)")

plot_spatial_weights(mwi_nbd, mwi, ax=axs[3])
axs[3].set_title("Distance (200,000)")

plt.show()

**Question 3.  Print the properties for Distance based spatial weights matrix, such as number of units, number of nonzero weights, etc.**

In [None]:
print (f"Number of units: {mwi_nbd.n}")
print (f"Number of nonzero weights: {mwi_nbd.nonzero}")
print (f"Percentage of nonzero weights: {mwi_nbd.pct_nonzero}")
print (f"Average number of neighbors: {mwi_nbd.mean_neighbors}")
print (f"Largest number of neighbors is {mwi_nbd.max_neighbors}")
print (f"Minimum number of neighbors is {mwi_nbd.min_neighbors}")
print (f"Number of units without any neighbors {len(mwi_nbd.islands)}")
print (f"Neighbour list: {mwi_nbd.neighbors}")

**Question 4. Apply row-standardized transform to your Distance based neighbourhood.**

In [None]:
# Spatial Weights Summary - Row Standardized
mwi_nbq.set_transform("R")
mwi_nbq_lw_r = mwi_nbq.to_adjlist()
print (f"Weights: {mwi_nbq.weights}")
print ("Weights summary: ")
print (mwi_nbq_lw_r["weight"].describe())

**Question 5. Run the Moran's I statistic to test the spatial autocorrelation for `poverty` variable. Use Distance based neighbouring structure. Visualize Moran's I plot for `poverty` variable.**

In [None]:
mi = esda.moran.Moran(mwi["poverty"], mwi_nbd)
print("Moran's I statistic: {}".format(mi.I))
print("p-value of I under randomization assumption: {}".format(mi.p_rand))
print ("variance of I under randomization assumption: {}".format(mi.VI_rand))
print ("Expected value under normality assumption: {}".format(mi.EI))

In [None]:
plot_moran(mi)
plt.show()

**Question 6. Compute and visualize Local Moran's I for `poverty` variable.**

In [None]:
mi_loc = esda.Moran_Local(mwi["poverty"], mwi_nbd)

In [None]:
plot_local_autocorrelation(mi_loc, mwi, "poverty")
plt.show()

**Question 7. Run the Ordinary least squares model with Spatial Diagnostics. Use Distance based neighbouring structure. Predict `poverty` as a function of cropland cultivated `croplnd`, livestock owned `livstck`, share of off-farm income `income`, years of education `edu`, female head of household `female` and tobacco growing household `tobccHH`.**

In [None]:
y = mwi["poverty"].to_numpy()
x = mwi[["croplnd", "livstck", "income", "edu", "female", "tobccHH"]].values

In [None]:
mwi_ols = spreg.OLS(y, x, w=mwi_nbd, name_w="Distance based", 
                    name_x=["croplnd", "livstck", "income", "edu", "female", "tobccHH"], 
                    name_y="poverty", name_ds="Malawi Poverty function",
                    white_test=True, spat_diag=True, moran=True)

**Question 8. Run ML estimation of the spatial lag model. Use the same neighbouring structure, `x`, and `y` formulation as for the previous question.**

In [None]:
mwi_slm = spreg.ML_Lag(y, x, w=mwi_nbd, name_w="Distance based", 
                       name_x=["croplnd", "livstck", "income", "edu", "female", "tobccHH"],
                       name_y="poverty", name_ds="Malawi Poverty function")
print(mwi_slm.summary)