<div class="frontmatter text-center">
<h1>Geospatial Data Science</h1>
<h2>Exercise 5: Spatial Autocorrelation</h2>
<h3>IT University of Copenhagen, Spring 2022</h3>
<h3>Instructor: Anastassia Vybornova & Ane Rahbek Vierø</h3>
</div>

# Source
This notebook was adapted from:
* A course on geographic data science: https://darribas.org/gds_course/content/bF/diy_F.html


## Data preparation

In this exercise we will try to detect *spatial autocorrelation* in the voting behaviour in Denmark.
The dataset is from the parliament election in 2019 on the most detailed spatial resolution called 'afstemningsområder' (*voting areas*).

You can find the necessary data in the data subfolder. The original data source is https://valgdatabase.dst.dk/

- The data set with number of votes for each party in each area does not include the geometries for the area.
To plot the election data with the geometries you therefore first need to join the them with a spatial dataset of all voting areas in DK (just like we did in the exercise with LSOAS and the AHAH-index).

The dataset with the geometries is from https://dataforsyningen.dk/ (a very good source to public spatial data in DK).


## Task I: get the data ready!

To join the two datasets you need to use the column 'ValgstedId' in the csv file. For the data set with the geometries, you need to create a corresponding ID from the column with the municipal ID ('Kommunekode') and the column with the area number ('Afstemningsomraadenummer').

*Tip: Make sure that the two ID columns have the same length. It might also be a good idea to check for NA/non-matched geometries.*

Since there are no geometries in the csv-data, there is no need to reproject the data before joining - but it is still good practice to check the crs and see if it is the one we want.
In our case the data is in a specific version of the UTM crs usually used for Denmark.

When the data are ready, complete all the other bits required for the ESDA analysis of spatial autocorrelation:

- Make sure your geography does not have islands (either by removing them or using a method which does not create islands in our data set)
- Create a spatial weights matrix
- Standardise the spatial weights matrix
- Create a standardised version of the data/column you are working with
- Create the spatial lag of the number of votes in the voting areas

**When creating your spatial weights matrix, think of one criterium to build it that you think would fit this variable (e.g. contiguity, distance-based, etc.), and apply it.**

**The data set on votes in each area contains columns with votes for each party, number of people entitled to vote etc. You have to choose yourself which part of the data you want to do your analysis on.**

In [None]:
import seaborn as sns
import pandas as pd
import esda
from pysal.lib import weights
from splot.esda import (
    moran_scatterplot, lisa_cluster, plot_local_autocorrelation
)
import geopandas as gpd
import contextily as ctx
import matplotlib.pyplot as plt

In [None]:
# Filepaths

fp_votes = './data/electiondata.csv'
fp_geom = './data/afstemningsomraader_dk.gpkg'

## Task II: global spatial autocorrelation

Let's move on to the analytics:

- Visualise your data score with a Moran Plot
- Calculate Moran's I
- *What conclusions can you reach from the Moran Plot and Moran's I? What's the main spatial pattern?*

## Task III: local spatial autocorrelation

Now that you have a good sense of the overall pattern in the dataset, let's move to the local scale:

- Calculate LISA (Local Indicators of Spatial Association) statistics for the voting areas.
- Make a map of significant clusters at the 5%
- Can you identify hotspots or coldspots? If so, what do they mean? What about spatial outliers?
- Create cluster maps for significance levels 1% and 10%; compare them with each other. *What are the main changes? Why?*