# Prepare German city boundaries

As in the [Newspaper report](https://interaktiv.morgenpost.de/gruenste-staedte-deutschlands/), we'd like to select large German cities, and look at the green fraction within their boundaries. 

So for this we need:


*   City polygons: download Level-4 GeoJSON file for Germany from [GADM](https://gadm.org/download_country.html)
*   City information: download .xlsx table with information on German cities for 31.12.2020, available from [DESTATIS](https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/Administrativ/05-staedte.html)


In the notebook below, the following steps are done:


1.   Read city .xlsx, select city names that have population > POP_SIZE_THRES
2.   Extract the same cities from GeoJSON
3.   Merge two tables into one
4.   Clean the table, and save as shapefile to *output/* on your drive folder


In the Newspaper article, they explore cities with POP_SIZE_THRES > 100.000

Let's do the same here!

In [None]:
# First load your Google Drive.
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# After mounting, check folders in left side-bar
# If this code does not work, you can use the "Mount Drive" button in the left side-bar

In [None]:
!pip install geopandas

In [None]:
import pandas as pd
import geopandas as gpd

In [None]:
# Set relevant directories
import os

# Your own baseline directory - if mounted under MyDrive, do not change.
BASE_DIR = os.path.join('/content/drive/MyDrive/BUCSS22')
print('BASE_DIR: ',BASE_DIR)

# Create a link to notebooks directory
YOUR_NAME = "Matthias_Lecturer" # Name of your own folder
NOTEBOOKS_DIR = os.path.join(BASE_DIR, YOUR_NAME, 'notebooks')
print('NOTEBOOKS_DIR: ',NOTEBOOKS_DIR)

# Other relevant folders
GEE_DIR         = os.path.join(BASE_DIR, 'DATA_SHARE', 'GEE')
OUT_DIR         = os.path.join(BASE_DIR, YOUR_NAME, 'output')
print('OUT_DIR: ',GEE_DIR)
print('FIG_DIR: ',OUT_DIR)

In [None]:
# Set the population threshold to be used
POP_SIZE_THRES = 100000

In [None]:
# Define the relevant files
fn_city = os.path.join(GEE_DIR, "05-staedte.xlsx")
fn_city_json = os.path.join(GEE_DIR, "gadm41_DEU_4.json")

In [None]:
# Read the city data
df_city = pd.read_excel(fn_city, sheet_name="Städte", skiprows=6)
df_city

In [None]:
# Only cities with a population > POP_SIZE_THRES
df_city = df_city[df_city['Unnamed: 9'] > POP_SIZE_THRES][:-1]

# Initially I thought to merge the two datasets by city name
# Yet some cities / villages in Germany have the same name, eg. Oberhausen (exists 4 times)
cities = [i.split(',')[0] for i in df_city['Unnamed: 6'] if i != 'Städte insgesamt']
df_city['CITY'] = cities

In [None]:
# Read json file
df_json = gpd.read_file(fn_city_json)
print(df_json.head())

In [None]:
# Get only entries in GeoSJON that has TYPE_4 city
df_json_city = df_json[df_json['TYPE_4'] == "Stadt"]
df_json_city

In [None]:
# Join JSON with other table, based on city name
df_all = pd.merge(df_city[['Unnamed: 8', 'Unnamed: 9', 'CITY']], df_json_city, left_on='CITY', right_on='NAME_4')
df_all

In [None]:
# Clean up a bit before saving
df_all_final = gpd.GeoDataFrame(df_all.rename({'Unnamed: 8': 'Area (km2)', 'Unnamed: 9': 'Population'}, axis=1)[['CITY', 'Area (km2)', 'Population', 'geometry']])
df_all_final

In [None]:
# Save to file
OFILE = os.path.join(OUT_DIR, f"df_large_cities_{POP_SIZE_THRES}.shp")
df_all_final.to_file(OFILE)

In [None]:
print(f"Shapefile available at: {OFILE}")