# Neighborhood Analysis of a German City for Young Families
We'll investigate the neighborhood in a German City, including the venues and try to find the best spot possible to move for families with children. As measurement we choose the distance to child-important venues like schools, playgrounds and medical care.

## Description of the problem
Young families with children or plans for some are frequently in the situation to find a new place to call home, that will give their changing life as a family the best possible neighborhood.
Identification is not an easy task, as there are multiple factors to be included and not all information readily available. As a support for their decision making, we want to provide a geographical analysis of children-friendly neighboorhoods based on the distance to desired venues. For example the new home must be near a school, but also provide a playground for leasure time.
We'll focus on the German City Wermelskirchen out of curiosity.

## Description of the data
As dataset we're using a publicly available dataset of the German City Wuppertal, including it's districts and several population metrics.

https://de.wikipedia.org/wiki/Liste_der_Stadtbezirke_und_Stadtteile_von_Wuppertal

The original table is looking like:
<img src="pic1.png">

In the process of preperation we'll be translating and transforming the data. Please bear with me for now, but the relevant data is the following:
- 'Neighborhood'
- 'Borough'
- 'Residents'
- 'Size'
- 'Population_Density'
- 'Foreigner_Percentage'
- 'Unemployment_Rate'
- 'Livinghouses'
- 'Flats_thereof'
- 'Schools(Elementary_Schools)'
- 'Private_Cars'

Additionally we'll use the also publicy available GEOJSON data for these districts including their geographic boundaries.
Url: http://daten.wuppertal.de/Infrastruktur_Bauen_Wohnen/Quartiere_EPSG4326_JSON.json

<img src="pic2.png">

Furthermore we'll connect to the foursquare database and use data for venues from there.

## Methodology section

In [137]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
import geojson

### Fetch and initially clean data from Wikipedia

In [79]:
d_wiki = pd.read_html("https://de.wikipedia.org/wiki/Liste_der_Stadtbezirke_und_Stadtteile_von_Wuppertal#Die_Wohnquartiere_Wuppertals_(Stand:_31._Dezember_2007)")[2]

We'll not need some of the columns.

In [80]:
print(d_wiki.columns)
d_wiki.drop(["Karte[4]", "Nr.", "Kommunale Zuordnung vor der Eingemeindung"], axis=1, inplace=True)

Index(['Karte[4]', 'Nr.', 'Statistisches Wohnquartier', 'Stadtbezirk',
       'Kommunale Zuordnung vor der Eingemeindung', 'Einwohner-zahl',
       'Fläche( km² )', 'Bevölkerungs-dichte(Einw. / km² )',
       'Ausländer-anteil (in %)', 'Arbeitslosen-quote (in %)', 'Wohn-gebäude',
       'darin Wohnungen', 'Schulen(davonGrundschulen)', 'PrivateKFZ'],
      dtype='object')


For everyone to understand the data, we're translating it from German to English.

In [81]:
d_wiki.rename(columns={"Statistisches Wohnquartier": "Neighborhood", "Stadtbezirk": "Borough", "Einwohner-zahl": "Residents", d_wiki.columns[3]: "Size", d_wiki.columns[4]: "Population_Density", d_wiki.columns[5]: "Foreigner_Percentage", d_wiki.columns[6]: "Unemployment_Rate", "Wohn-gebäude": "Livinghouses", d_wiki.columns[8]: "Flats_thereof", "Schulen(davonGrundschulen)": "Schools(Elementary_Schools)", "PrivateKFZ": "Private_Cars"}, inplace=True)
d_wiki.columns

Index(['Neighborhood', 'Borough', 'Residents', 'Size', 'Population_Density',
       'Foreigner_Percentage', 'Unemployment_Rate', 'Livinghouses',
       'Flats_thereof', 'Schools(Elementary_Schools)', 'Private_Cars'],
      dtype='object')

The schools column actually contains two data, the number of schools in total and the number of elementary schools there of. For the sake of simplicity we'll split this into two columns, by regextracting the elementary schools.

In [82]:
d_wiki["Elementary_Schools"] = d_wiki["Schools(Elementary_Schools)"].str.extract(r"\((.)\)")
d_wiki["Elementary_Schools"] = d_wiki["Elementary_Schools"].astype(str).str.replace("-", "0").astype(int)
d_wiki["Schools(Elementary_Schools)"] = d_wiki["Schools(Elementary_Schools)"].str.extract(r"(.*)(?=\()")
d_wiki.rename(columns={"Schools(Elementary_Schools)": "Schools"}, inplace=True)

Two columns are of type float, while integer will be sufficient and easier to read. We'll convert them therefore.

In [85]:
d_wiki["Residents"] = d_wiki["Residents"].astype(int)
d_wiki["Flats_thereof"] = d_wiki["Flats_thereof"].astype(int)

The percentage values miss the point.

In [99]:
d_wiki["Foreigner_Percentage"] = d_wiki["Foreigner_Percentage"]/10
d_wiki["Unemployment_Rate"] = d_wiki["Unemployment_Rate"]/100

The final table of neighborhoods looks like this:

In [100]:
d_wiki

Unnamed: 0,Neighborhood,Borough,Residents,Size,Population_Density,Foreigner_Percentage,Unemployment_Rate,Livinghouses,Flats_thereof,Schools,Private_Cars,Elementary_Schools
0,Elberfeld-Mitte,Elberfeld,5780,108,5352,25.1,9.13,651,3718,2,1764,0
1,Nordstadt,Elberfeld,17,118,14635,27.7,9.03,1637,10,8,4926,3
2,Ostersbaum,Elberfeld,14,138,10811,24.6,9.67,1416,8807,4,4877,3
3,Südstadt,Elberfeld,9640,59,16339,18.5,7.66,771,6048,1,2977,1
4,Grifflenberg,Elberfeld,11,445,2628,10.1,3.21,1557,6289,1,5181,1
...,...,...,...,...,...,...,...,...,...,...,...,...
64,Blombach-Lohsiepen,Ronsdorf,2851,147,1939,5.9,3.68,595,1521,1,1526,1
65,Rehsiepen,Ronsdorf,2007,97,2069,13.4,7.08,169,993,-,750,0
66,Schenkstraße,Ronsdorf,3444,90,3827,4.8,3.46,663,1785,1,1769,1
67,Blutfinke,Ronsdorf,4714,396,1190,2.9,2.46,1073,2458,3,2821,2


### Initial analysis of the neighborhoods

In [104]:
d_wiki[["Borough", "Neighborhood", "Size", "Residents", "Foreigner_Percentage", "Unemployment_Rate"]].groupby('Borough').agg({'Neighborhood': "count", 
                         'Size':'sum', 
                         'Residents':'sum', 
                         'Foreigner_Percentage': "mean",
                         "Unemployment_Rate": "mean"
                    }).sort_values(by="Residents", ascending=False)

Unnamed: 0_level_0,Neighborhood,Size,Residents,Foreigner_Percentage,Unemployment_Rate
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Barmen,10,1544,59410,14.66,5.977
Uellendahl-Katernberg,7,2591,38192,5.242857,2.585714
Vohwinkel,9,2042,31578,9.2,3.795556
Oberbarmen,5,1257,29130,14.6,6.664
Elberfeld-West,7,1174,27774,13.728571,5.01
Langerfeld-Beyenburg,9,2940,25517,9.555556,5.825556
Elberfeld,6,1107,21911,20.066667,7.58
Cronenberg,7,2150,21846,4.9,2.572857
Ronsdorf,6,1605,21776,5.883333,3.526667
Heckinghausen,3,566,8687,11.633333,5.426667


In [163]:
url = r'http://daten.wuppertal.de/Infrastruktur_Bauen_Wohnen/Quartiere_EPSG4326_JSON.json' # geojson file
geojson = requests.get(url).json()

In [144]:
geojson_df = pd.read_json("http://daten.wuppertal.de/Infrastruktur_Bauen_Wohnen/Quartiere_EPSG4326_JSON.json")
geojson_df

Unnamed: 0,type,name,features
0,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
1,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
2,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
3,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
4,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
...,...,...,...
64,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
65,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
66,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."
67,FeatureCollection,Quartiere_EPSG4326_JSON,"{'type': 'Feature', 'geometry': {'type': 'Poly..."


In [161]:
for i in geojson_df["features"]:
    print(i["properties"]["NAME"])

Herbringhausen
Schöller-Dornap
Dönberg
Nächstebreck-Ost
Cronenberg
Eckbusch
Erbschlö-Linde
Siebeneick
Ehrenberg
Küllenhahn
Grifflenberg
Blutfinke
Ronsdorf-Mitte/Nord
Lichtenplatz
Sudberg
Nächstebreck-West
Hatzfeld
Cronenfeld
Beyenburg-Mitte
Uellendahl-West
Varresbeck
Uellendahl-Ost
Nevigeser Str
Friedrichsberg
Zoo
Hammesberg
Hahnerberg
Westring
Schrödersbusch
Osterholz
Heckinghausen
Kothen
Sedansberg
Beek
Oberbarmen-Schwarzbach
Heidt
Nützenberg
Blombach-Lohsiepen
Vohwinkel-Mitte
Ostersbaum
Tesche
Lüntenbeck
Clausen
Brill
Höhe
Nordstadt
Berghausen
Fleute
Kohlfurth
Elberfeld
Rauental
Fr.-Engels-Allee
Hesselnberg
Sonnborn
Loh
Rehsiepen
Wichlinghausen-Nord
Schenkstr.
Buchenhofen
Arrenberg
Löhrerlen
Wichlinghausen-Süd
Rott
Industriestr.
Langerfeld-Mitte
Jesinghauser Str.
Südstadt
Barmen-Mitte
Hilgershöhe


In [117]:
d_wiki.columns

Index(['Neighborhood', 'Borough', 'Residents', 'Size', 'Population_Density',
       'Foreigner_Percentage', 'Unemployment_Rate', 'Livinghouses',
       'Flats_thereof', 'Schools', 'Private_Cars', 'Elementary_Schools'],
      dtype='object')

In [166]:
# create a numpy array of length 6 and has linear spacing from the minium total immigration to the maximum total immigration
threshold_scale = np.linspace(d_wiki['Unemployment_Rate'].min(),
                              d_wiki['Unemployment_Rate'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater than the maximum immigration

# let Folium determine the scale.
world_map = folium.Map(location=[51.256214, 7.150764], zoom_start=11)
world_map.choropleth(
    geo_data=geojson,
    data=d_wiki,
    columns=['Neighborhood', 'Unemployment_Rate'],
    key_on='feature.properties.NAME',
    threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Residents',
    reset=True
)
world_map

## Results section

## Discussion section

## Conclusion section