<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Point-Pattern-Analysis" data-toc-modified-id="Point-Pattern-Analysis-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Point Pattern Analysis</a></span><ul class="toc-item"><li><span><a href="#Point-pattern-file-import" data-toc-modified-id="Point-pattern-file-import-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Point pattern file import</a></span></li><li><span><a href="#Preparing-Information" data-toc-modified-id="Preparing-Information-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Preparing Information</a></span></li><li><span><a href="#Plotting-dependencies-in-graph" data-toc-modified-id="Plotting-dependencies-in-graph-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Plotting dependencies in graph</a></span></li><li><span><a href="#Hex-binning" data-toc-modified-id="Hex-binning-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Hex-binning</a></span></li><li><span><a href="#Kernel-density-estimation-(KDE)" data-toc-modified-id="Kernel-density-estimation-(KDE)-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Kernel density estimation (KDE)</a></span></li></ul></li></ul></div>

# Point Pattern Analysis

## Point pattern file import
There are a lot of bus stops in Aachen, the AVV provides information of their bus system. First we need to load the CSV file. As the table show, one get little information, imported for us are the coordinates. Having a look at the README file provided by the AVV, one know why the coordinates look like this, they are multiplied by 1000000. Before using them we need to divide all coordinate content by 1000000.

In [1]:
import numpy
import pandas as pd
import geopandas
import pysal
import seaborn
import contextily
import os
import matplotlib.pyplot as plt 
avv_halt = pd.read_csv(r"./data/AVV_Haltestellen.csv", sep=',', engine='python')
avv_halt.head()

Unnamed: 0,HSTNUMSYS,HSTNUM,HSTNAME,AGS,KOMMUNE,ORTSTEIL,WGS84_RW_X_1000000,WGS84_HW_X_1000000,GLOBAL_ID
0,AVV,1001,"Aachen, Bushof",5334002,Aachen,Aachen,6089768.0,50776453.0,de:05334:1001
1,AVV,1002,"Aachen, Kaiser-Friedrich-Park",5334002,Aachen,Aachen,6075421.0,50759141.0,de:05334:1002
2,AVV,1003,"Aachen, Ehrenmal / Lousberg",5334002,Aachen,Aachen,6083328.0,50782274.0,de:05334:1003
3,AVV,1004,"Aachen, STAWAG",5334002,Aachen,Aachen,6100047.0,50783114.0,de:05334:1004
4,AVV,1005,"Aachen, Misereor",5334002,Aachen,Aachen,6083912.0,50768538.0,de:05334:1005


## Preparing Information
Let us get all existing elements and divide the first column "WGS84_RW_X_1000000" than we do the second one. To make it easier, one could implement a function instead, to keep it clean and simple.

In [None]:
ort = avv_halt['WGS84_RW_X_1000000'] 
old = []
new = []
for elem in ort:
    old.append(elem)
    elem = elem / 1000000
    new.append(elem)

In [None]:
avv_halt['WGS84_RW_X_1000000'].replace(old, new, inplace = True)
avv_halt.head()

In [None]:
ort2 = avv_halt['WGS84_HW_X_1000000'] 
old2 = []
new2 = []
for elem2 in ort2:
    old2.append(elem2)
    elem2 = elem2 / 1000000
    new2.append(elem2)

In [None]:
avv_halt['WGS84_HW_X_1000000'].replace(old2, new2, inplace = True)
avv_halt.head()

As the existing column name is a little long we change it to longitude and latitude, as they are not just shorter but also more intuitive.

In [None]:
avv_halt = avv_halt.rename(columns = {'WGS84_HW_X_1000000' : 'latitude'})
avv_halt = avv_halt.rename(columns = {'WGS84_RW_X_1000000' : 'longitude'})
avv_halt.head()

## Plotting dependencies in graph
We use seaborn here to plot our points in a dependent graph. As one may see the points are ordered and the point dataset get a dimension. One can see point concentration in different areas.

In [None]:
seaborn.jointplot(x='longitude', y='latitude', data=avv_halt, s=0.5);

Next, we use the library contextily, because one can use the **add_basemap()** method to add a base map to provide a context. 

In [None]:
# Generate scatter plot
joint_axes = seaborn.jointplot(
    x='longitude', y='latitude', data=avv_halt, s=0.5
)
contextily.add_basemap(
    joint_axes.ax_joint,
    crs="EPSG:4326",
    source=contextily.providers.CartoDB.PositronNoLabels
);

## Hex-binning
When many bus stops (points) are concentrated in some area it can become hard to explore the patterns' nature. To simplify and improve the visibility one generate a regular grid (hexagonal). Depending on how many points fall in one grid cell, the colour is chosen. 

In [None]:
# Set up figure and axis
f, ax = plt.subplots(1, figsize=(12, 9))
# Generate and add hexbin with 50 hexagons in each 
# dimension, no borderlines, half transparency,
hb = ax.hexbin(
    avv_halt['longitude'],
    avv_halt['latitude'], 
    gridsize=50, 
    linewidths=0,
    alpha=0.5, 
    cmap='viridis_r'
)
# Add basemap
contextily.add_basemap(
    ax, 
    crs="EPSG:4326",
    source=contextily.providers.CartoDB.Positron
)
# Add colorbar
plt.colorbar(hb)
# Remove axes
ax.set_axis_off()

## Kernel density estimation (KDE)
The grids are the spatial equivalent of a histogram. An alternative is the kernel density estimation (an empirical approximation of the probability density function). The most common kernel function is the Gaussian one. Here, a normal distribution to weight points in applied. The result is a continuous surface. The following function creates a Gaussian kernel. 

In [None]:
# Set up figure and axis
f, ax = plt.subplots(1, figsize=(9, 9))
# Generate and add KDE with a shading of 50 gradients 
# coloured contours, 75% of transparency,
seaborn.kdeplot(
    x = avv_halt['longitude'],
    y = avv_halt['latitude'], 
    n_levels=50, 
    shade=True,
    alpha=0.55, 
    cmap='viridis_r'
)
# Add basemap
contextily.add_basemap(
    ax,
    crs="EPSG:4326",
    source=contextily.providers.CartoDB.Positron,
    
)
# Remove axes
ax.set_axis_off()