# Hello GeoPandas, Alaaf Aachen!


In [None]:
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Let us load a dataset containing the statistical neighbourhoods of Aachen. It is located in the `/data/aachen` directory from where this notebook is located. GeoPandas supports a wide range of GIS data formats including Shape file (usually ending on .shp).

In [None]:
bezirke = gpd.read_file("./data/aachen/StatistischeBezirkeAachen.shp")

GeoPandas now has loaded the data from the shapefile with its geometric information and the data on the attributes of the shape into a table. We can inspect the table using the `plot()` function offered by the GeoPandas DataFrame.

In [None]:
bezirke.head()

In [None]:
bezirke.plot()

So far, the data contained in the data set is not very interesting: just the shapes of the borders, an identificator  ("STATBEZ") to identify the area and the name stored in the column ("ST_NAME").

In [None]:
statistics = pd.read_csv("./data/aachen/einwohnerstatistik-31.12.2020.csv")
statistics.head()
statistics.plot()

This is a plain CSV Table that has no geographic information attached to it. However, the two DataFrames have a feature in common: the **`id`** of the area. The column names differ, they are the same entity though. Let's order the tables by the respective id and see whether they match.


In [None]:
sorted_stats = statistics.sort_values(by='StatBezName')
sorted_stats.head()

In [None]:
sorted_bezirke = bezirke.sort_values(by='ST_NAME')
sorted_bezirke.head()

With this information we can add the information from the extended statistics DataFrame we have assigned in `statistics` to the GeoDataFrame. Here, every column of the `statistics` data set is added to the table. With tabular data, this concept is referred to as **merging** or **joining** data. 

See a general explanation of merging data frames here
See an explanation on merging GeoDataFrames here https://geopandas.org/docs/user_guide/mergingdata.html 

Here, we are joining the `statistics` data frame to the `bezirke` data from the left

```
bezirke <-- statistics
```


First however, we have to rename the column name `Bez` from the `statistics` DataFrame to match the respective name `STATBEZ` in our `bezirke` DataFrame

In [None]:
statistics = statistics.rename(columns = {'Bez' : 'STATBEZ'})
statistics.head(3)



**inital attempt will result in an error. This is expected, see below for explanation and solution**  

bezirke_stats  = bezirke.merge(statistics, on='STATBEZ', how='left')

In [None]:
#pd.to_numeric(statistics["STATBEZ"])
import numpy as np
statistics = statistics.replace(r'^\s*$', np.nan, regex=True)
index = statistics[statistics['STATBEZ'] == np.nan].index
#statistics.drop(34, inplace=True)
statistics.head()

Next we have an example for using NumPy dtype. Here we define the column content for __STATBEZ__ as type integer64. Integer, as we want them to be hole numbers not floating point numbers and 64 defines the size of date, here 64 bytes.

In [None]:
statistics["STATBEZ"] = statistics["STATBEZ"].fillna(0)
statistics["STATBEZ"] = statistics["STATBEZ"].astype('int64')

statistics["STATBEZ"].tail()

In [None]:
# inital attempt will result in an error. This is expected, see below for explanation and solution
bezirke_stats  = bezirke.merge(statistics, on='STATBEZ', how='left')
# bezirke_stats = bezirke.set_index('STATBEZ').join(statistics.set_index('STATBEZ'))
bezirke_stats.tail(3)
bezirke_stats = gpd.GeoDataFrame(bezirke_stats)
type(bezirke_stats)
bezirke_stats.head()

In [None]:
# set to min and max of data
vmin, vmax = 0, 133623

# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(14,6))
# add a title and annotation
ax.set_title('Total population of Aachen', fontdict={'fontsize': '25', 'fontweight' : '3'})

ax = bezirke_stats.plot(column='Pers', cmap = 'YlGnBu', ax=ax,   
                                legend =  
                                  True)

In [None]:
print(bezirke_stats[1:5]["geometry"].area)

In [None]:
bezirke_stats['popdens'] = bezirke_stats["Pers"] / bezirke_stats["geometry"].area*1000

In [None]:
bezirke_stats.head()

In [None]:
# set to min and max of data
vmin, vmax = 0, 133623

# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(14,6))
# add a title and annotation
ax.set_title('Population Density of Aachen in 1000 inhabitants per km^2', fontdict={'fontsize': '25', 'fontweight' : '3'})

ax = bezirke_stats.plot(column='popdens', cmap = 'YlGnBu', ax=ax,   
                                legend =  
                                  True)

In [None]:
from ipyleaflet import Map, GeoData, basemaps, LayersControl
m = Map(center=(46.91, 7.43), zoom=15, basemap= basemaps.Esri.WorldTopoMap)
m


In [None]:
m = Map(center=(52.3,8.0), zoom = 3, basemap= basemaps.Esri.WorldTopoMap)

In [None]:
m
