# The Battle of the Neighborhoods - Week 2

## Part 1: Import and explore geo data from Munich

##### Munich has a total of 25 boroughs and 107 neighborhoods. In order to segment and explore them, we need a data set containing\
the 25 boroughs and the neighborhoods existing in each borough as well as the latitude and longitude coordinates of each neighborhood.

Unfortunately such a data set does not exist on the Internet. So I had to assign the latitudes and longitudes to the individual boroughs and neighborhoods manually. For this purpose I created an Excel table.\
The following internet page was used to determine the coordinates:  https://www.koordinaten-umrechner.de/decimal/51.000000,10.000000?karte=OpenStreetMap&zoom=8

The following internet page was used to determine all the boroughs and their neighborhoods: https://de.wikipedia.org/wiki/Stadtbezirke_M%C3%BCnchens

#### Install and download necessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # transform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

import csv # implements classes to read and write tabular data in CSV form

print('Libraries imported.')

Libraries imported.


## 1. Download and Explore Data

### Web scraping of boroughs and neighborhood data from wikipedia page using Pandas

Pandas offers with the method read_html() a very fast way to extract tables from web pages.

### Load and explore the data

In [18]:
neighbourhoods = pd.read_html("https://de.wikipedia.org/wiki/Stadtbezirke_M%C3%BCnchens")

In [19]:
type(neighbourhoods) 

list

Pandas returns a list containing all tables of the respective website.

In [20]:
len(neighbourhoods)

2

The requested website contains 2 tables. The second table is the one we are looking for.

In [21]:
neigh = neighbourhoods[1].drop("Stadt-bezirks-nr.", axis=1)
neigh.columns

Index(['Stadtbezirk', 'Stadtbezirksteile (Nr.)'], dtype='object')

### Check the imported data

In [22]:
neigh.head()

Unnamed: 0,Stadtbezirk,Stadtbezirksteile (Nr.)
0,Altstadt-Lehel,"Graggenau (1), Angerviertel (2), Hackenviertel..."
1,Ludwigsvorstadt-Isarvorstadt,"Gärtnerplatz (1), Deutsches Museum (2), Glocke..."
2,Maxvorstadt,"Königsplatz (1), Augustenstraße (2), St. Benno..."
3,Schwabing-West,"Neuschwabing (1), Am Luitpoldpark (2), Schwere..."
4,Au-Haidhausen,"Maximilianeum (1), Steinhausen (2), Haidhausen..."


#### Split the column "Stadtbezirksteile (Nr.)" in its items (= neighborhoods)

In [23]:
neighborhoods = neigh['Stadtbezirksteile (Nr.)'].str.replace("(","").str.replace(")","").str.replace("1","").str.replace("2","").str.replace("3","").str.replace("4","").str.replace("5","").str.replace("6","").str.replace("7","").str.replace("8","").str.replace("9","")
neighborhoods = neighborhoods.str.split(",", expand=True)
neighborhoods.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Graggenau,Angerviertel,Hackenviertel,Kreuzviertel,Lehel,Englischer Garten Süd,,,
1,Gärtnerplatz,Deutsches Museum,Glockenbach,Dreimühlen,Am alten südlichen Friedhof,Am Schlachthof,Ludwigsvorstadt-Kliniken,St. Paul,
2,Königsplatz,Augustenstraße,St. Benno,Marsfeld,Josephsplatz,Am alten nördlichen Friedhof,Universität,Schönfeldvorstadt,Maßmannbergl
3,Neuschwabing,Am Luitpoldpark,Schwere-Reiter-Straße,,,,,,
4,Maximilianeum,Steinhausen,Haidhausen-Nord,Haidhausen-Süd,Obere Au,Untere Au,,,


#### Merge it with "Stadtbezirk" (=borough)

In [24]:
merged = pd.merge(left = neigh["Stadtbezirk"], right = neighborhoods, right_index=True, left_index=True)
merged.head()

Unnamed: 0,Stadtbezirk,0,1,2,3,4,5,6,7,8
0,Altstadt-Lehel,Graggenau,Angerviertel,Hackenviertel,Kreuzviertel,Lehel,Englischer Garten Süd,,,
1,Ludwigsvorstadt-Isarvorstadt,Gärtnerplatz,Deutsches Museum,Glockenbach,Dreimühlen,Am alten südlichen Friedhof,Am Schlachthof,Ludwigsvorstadt-Kliniken,St. Paul,
2,Maxvorstadt,Königsplatz,Augustenstraße,St. Benno,Marsfeld,Josephsplatz,Am alten nördlichen Friedhof,Universität,Schönfeldvorstadt,Maßmannbergl
3,Schwabing-West,Neuschwabing,Am Luitpoldpark,Schwere-Reiter-Straße,,,,,,
4,Au-Haidhausen,Maximilianeum,Steinhausen,Haidhausen-Nord,Haidhausen-Süd,Obere Au,Untere Au,,,


#### Unpivot the dataframe

In [25]:
df_unpivoted = merged.melt(id_vars=['Stadtbezirk'],  value_name="Neighborhood").drop("variable", axis=1)
neighborhoods = df_unpivoted.dropna().sort_values(by="Stadtbezirk").reset_index(drop=True)
neighborhoods.columns = ["Borough","Neighborhood"]
print(neighborhoods.shape)
neighborhoods.head()

(107, 2)


Unnamed: 0,Borough,Neighborhood
0,Allach-Untermenzing,Untermenzing-Allach
1,Allach-Untermenzing,Industriebezirk
2,Altstadt-Lehel,Graggenau
3,Altstadt-Lehel,Kreuzviertel
4,Altstadt-Lehel,Lehel


### Save dataframe as csv file

In [26]:
neighborhoods.to_csv("Neighborhoods.csv", index=False)

I filled Latitude and Longitude of each neighborhood manually

### Reload the dataframe with latitude and longitude of each neighborhood

In [2]:
munich_neighborhoods = pd.read_excel("Neighborhoods.xlsx")
munich_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Allach Untermenzing,Industriebezirk,48.196839,11.476602
1,Allach Untermenzing,Untermenzing Allach,48.177715,11.472676
2,Altstadt Lehel,Graggenau,48.139168,11.581965
3,Altstadt Lehel,Angerviertel,48.13367,11.571569
4,Altstadt Lehel,Hackenviertel,48.135731,11.569955


### Let's check if the dataframe has 25 boroughs and 107 neighborhoods

In [4]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        munich_neighborhoods['Borough'].nunique(),munich_neighborhoods.shape[0]))

The dataframe has 25 boroughs and 107 neighborhoods.


### Save dataframe as csv file

In [29]:
munich_neighborhoods.to_csv("Neighborhoods.csv", index=False)

### Use geocoder to get latitude and longitude of munich.

In [5]:
geolocator = Nominatim(user_agent="munich")
location = geolocator.geocode("Munich")
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Munich are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Munich are 48.1371079, 11.5753822.


### Create a map of Munich with neighborhoods superimposed on top.

Folium is a great visualization library. It allows to zoom into the below map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

In [17]:
# create map of Munich using latitude and longitude values
map_Munich = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(munich_neighborhoods['Latitude'], munich_neighborhoods['Longitude'], munich_neighborhoods['Borough'], munich_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        location = [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Munich)  
    
map_Munich

In [11]:
for lat, lng, borough, neighborhood in zip(munich_neighborhoods['Latitude'], munich_neighborhoods['Longitude'], munich_neighborhoods['Borough'], munich_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    print(label)
        

<folium.map.Popup object at 0x00000294F59FFF48>
<folium.map.Popup object at 0x00000294F5E01F88>
<folium.map.Popup object at 0x00000294F5E12188>
<folium.map.Popup object at 0x00000294F59FF288>
<folium.map.Popup object at 0x00000294F5E02108>
<folium.map.Popup object at 0x00000294F5EEF4C8>
<folium.map.Popup object at 0x00000294F5EC9EC8>
<folium.map.Popup object at 0x00000294F59FF908>
<folium.map.Popup object at 0x00000294F5EF5F48>
<folium.map.Popup object at 0x00000294F5932808>
<folium.map.Popup object at 0x00000294F5BF2A88>
<folium.map.Popup object at 0x00000294F5EC9048>
<folium.map.Popup object at 0x00000294F5EEA0C8>
<folium.map.Popup object at 0x00000294F5E02108>
<folium.map.Popup object at 0x00000294F5934548>
<folium.map.Popup object at 0x00000294F5EBA608>
<folium.map.Popup object at 0x00000294F5EEE688>
<folium.map.Popup object at 0x00000294F59FFF48>
<folium.map.Popup object at 0x00000294F59FFF08>
<folium.map.Popup object at 0x00000294F5E01EC8>
<folium.map.Popup object at 0x00000294F5