<div class="frontmatter text-center">
<h1>Geospatial Data Science</h1>
<h2>Exercise 7: Point Pattern Analysis</h2>
<h3>IT University of Copenhagen, Spring 2022</h3>
<h3>Instructor: Anastassia Vybornova & Ane Rahbek Vierø</h3>
</div>

# Source
This notebook was adapted from:
* A course on geographic data science: https://darribas.org/gds_course/content/bH/diy_H.html


In [None]:
import pandas as pd
import geopandas as gpd
import contextily as cx
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.cluster import DBSCAN
from ipywidgets import interact, fixed

## Task I: AirBnb distribution in Copenhagen

In this task, you will explore patterns in the distribution of the location of AirBnb properties in Copenhagen from [Inside AirBnb](http://insideairbnb.com). We are going to read a file with the locations of the properties available as of December 2021:




**Make sure you are connected to the internet when you run this cell :)**


In [None]:
url_abb = 'http://data.insideairbnb.com/denmark/hovedstaden/copenhagen/2021-12-28/visualisations/listings.csv'

abb_df = pd.read_csv(url_abb)

abb_df.info()

This gives us a table with the following information:

In [None]:
abb_df.head()

The dataset contains geometries in the form of long, lat coordinates - but you have to construct the geometry object before we continue working with them:

In [None]:
# ADD YOUR CODE HERE AND READ THE DATA INTO THE 'abb' GEODATAFRAME

abb = gpd.GeoDataFrame()

Also, for an ancillary geography, we will use the neighbourhoods provided by the same source. 
First we need to make sure that the CRS match and is suitable for analysis in DK (i.e. use **EPSG:25832**). 
Some of the data might not have a CRS defined, but based on the lat long coordinates, we can assume that it is in WGS84 / EPSG:4326.

**Make sure you are connected to the internet when you run this cell as well:**

In [None]:
# Filepath to neighbourhood geometries
url_neis = 'http://data.insideairbnb.com/denmark/hovedstaden/copenhagen/2021-12-28/visualisations/neighbourhoods.geojson'

neis = gpd.read_file(url_neis)

neis.info()

In [None]:
# ADD YOUR CODE HERE


**When reading files with 'æøå' we often end up with encoding errors (have a look at the neighbourhood names). To make sure that the names will work for a later join, run the cell below to rename the neighbourhoods.** 

Because of this it is always recommended to use a numerical ID rather than name if available.

In [None]:
# Fix neighbourhood names

rename_dict = {'Brnshj-Husum' : 'Broenshoej-Husum', 'sterbro' : 'Oesterbro', 'Nrrebro': 'Noerrebro', 'Amager st' : 'Amager Oest', 'Vanlse': 'Vanloese'}

abb.replace(rename_dict, inplace=True)
neis.replace(rename_dict, inplace=True)

In [None]:
# Lets try to plot the two data sets together and see what it looks like


# ADD YOUR CODE HERE

### With these at hand, get to work with the following challenges:

1. Create a Hex binning map of the property locations
2. Compute and display a kernel density estimate (KDE) of the distribution of the properties-
3. Using the neighbourhood layer:
    - Obtain a count of property by neighbourhood (nothe the neighbourhood name is present in the property table and you can connect the two tables through that).
    - Create a raw count choropleth.
    - Create a choropleth of the density of properties by polygon (for this step you thus need to find the number of airbnb-points per area for each polygon/neighbourhood.
    


* **Think about the pros and cons of these different ways of visualising point density. Which one do you prefer? Why?**
    

## Task II: Clusters of Danish tourist attractions

For this part, we are going to use a dataset on attractions ('Seværdigheder') in Denmark.
The original data can be found here: https://dataforsyningen.dk/data/1038

The data set covers all of Denmark, but if you want to, you can select a specific region to analyse.

*Tip: The geometries in the data are polygons - but very small ones. It is a good idea to use their centroids instead of the original geometries.*


In [None]:
fp = 'data/pois.gpkg'

pois = gpd.read_file(fp)

In [None]:
# ADD YOUR CODE HERE


This is what we have to work with then:

In [None]:
ax = pois.plot(
    color="xkcd:bright yellow", figsize=(9, 9)
)
cx.add_basemap(
    ax, 
    crs=pois.crs,
    source=cx.providers.CartoDB.DarkMatter
)

### With this at hand, get to work:

- Use the DBSCAN algorithm to identify clusters.
- Start with the following parameters: at least 3 sites for a cluster (`min_samples`) and a maximum of 1 Km (`eps`).
- Obtain the clusters and plot them on a map. *Does it pick up any interesting pattern?*.
- Based on the results above, tweak the values of both parameters to find the parameter values you think makes sense.
- What clusters can you identify?

#### Challenge 1
- Create a function that identifies clusters and plots then. Parameters should be the dataframe, min sample size and maximum distance.

#### Challenge 2
- Use [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/examples/Using%20Interact.html) and your function from Challenge 1 to create an interactive plot that changes as you modify the cluster parameters (i.e. min sample size and max distance)

In [None]:
# ADD YOUR CODE HERE

