# Analyzing the similarity of major German Cities

## Introduction

Berlin and Hamburg are two of Germany's largest cities. In this analysis I will explore how similar or dissimilar they are based on Foursquare location data.

My target audience are people moving from one city to another and wanting to know where they should rent their new apartment.

Specifically, if you move from Hamburg to Berlin which neighborhood should you move to based on your previous neighborhood?

What characteristics do these neighborhoods have?

## Data

For each Hamburg neighborhood I will create a list of recommended Berlin neighborhoods based on how similar the mix of venue types is. 

I decided to use a regularly spaced grid of locations, centered around each city center, to define the neighborhoods.

The foollowing data sources will be needed to extract/generate the required information:

* The neighborhoods will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
* The number of venues and their type and location in every neighborhood will be obtained using the Foursquare API.
* Geopy Nominatim will be used to obtain the city centers, using the Außenalster for Hamburg and the Brandenburg Gate for Berlin.

In [1]:
import numpy as np
import pandas as pd

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import re

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Generate neighborhoods

In [4]:
address = 'Brandenburg Gate, Berlin, Germany'

geolocator = Nominatim(user_agent="hamburg_explorer")
location = geolocator.geocode(address)
berlin_lat = location.latitude
berlin_lon = location.longitude
print('The geograpical coordinates of Berlin are {}, {}.'.format(berlin_lat, berlin_lon))

The geograpical coordinates of Berlin are 52.51628045, 13.37770188288172.


In [5]:
address = 'Außenalster, Hamburg, Germany'

geolocator = Nominatim(user_agent="hamburg_explorer")
location = geolocator.geocode(address)
hamburg_lat = location.latitude
hamburg_lon = location.longitude
print('The geograpical coordinates of Hamburg are {}, {}.'.format(hamburg_lat, hamburg_lon))

The geograpical coordinates of Hamburg are 53.5689488, 10.007305547125247.


In [8]:
import shapely.geometry
import pyproj
import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Hamburg center longitude={}, latitude={}'.format(hamburg_lon, hamburg_lat))
x, y = lonlat_to_xy(hamburg_lon, hamburg_lat)
print('Hamburg center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Hamburg center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Hamburg center longitude=10.007305547125247, latitude=53.5689488
Hamburg center UTM X=169483.03662988317, Y=5947163.190106782
Hamburg center longitude=10.007305547125249, latitude=53.568948799999994


In [130]:
berlin_center_x, berlin_center_y = lonlat_to_xy(berlin_lon, berlin_lat) # City center in Cartesian coordinates
hamburg_center_x, hamburg_center_y = lonlat_to_xy(hamburg_lon, hamburg_lat)

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
square_width = 10000
neigborhood_radius = 500
x_step = neigborhood_radius
y_step = neigborhood_radius * k

x_min = berlin_center_x - square_width/2
y_min = berlin_center_y - square_width/2 - (int(21/k)*k*neigborhood_radius - square_width)/2
berlin_latitudes = []
berlin_longitudes = []
berlin_distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = neigborhood_radius/2 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        berlin_distance_from_center = calc_xy_distance(berlin_center_x, berlin_center_y, x, y)
        if (berlin_distance_from_center <= square_width/2+1):
            lon, lat = xy_to_lonlat(x, y)
            berlin_latitudes.append(lat)
            berlin_longitudes.append(lon)
            berlin_distances_from_center.append(berlin_distance_from_center)
            xs.append(x)
            ys.append(y)
            
x_min = hamburg_center_x - square_width/2
y_min = hamburg_center_y - square_width/2 - (int(21/k)*k*neigborhood_radius - square_width)/2
hamburg_latitudes = []
hamburg_longitudes = []
hamburg_distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = neigborhood_radius/2 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        hamburg_distance_from_center = calc_xy_distance(hamburg_center_x, hamburg_center_y, x, y)
        if (hamburg_distance_from_center <= square_width/2+1):
            lon, lat = xy_to_lonlat(x, y)
            hamburg_latitudes.append(lat)
            hamburg_longitudes.append(lon)
            hamburg_distances_from_center.append(hamburg_distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(berlin_latitudes), 'Berlin neighborhood centers generated.')
print(len(hamburg_latitudes), 'Hamburg neighborhood centers generated.')

364 Berlin neighborhood centers generated.
364 Hamburg neighborhood centers generated.


In [131]:
map_berlin = folium.Map(location=[berlin_lat, berlin_lon], zoom_start=12)

# add markers to map
for lat, lng in zip(berlin_latitudes, berlin_longitudes):
    label = '{}, {}'.format(lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=neigborhood_radius/40,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_berlin)  
    
map_berlin

In [132]:
map_hamburg = folium.Map(location=[hamburg_lat, hamburg_lon], zoom_start=12)

# add markers to map
for lat, lng in zip(hamburg_latitudes, hamburg_longitudes):
    label = '{}, {}'.format(lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=neigborhood_radius/40,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_hamburg)  
    
map_hamburg

In [133]:
hamburg_neighborhoods = []

for i in range(0,len(hamburg_latitudes)):
    reverse = geolocator.reverse((hamburg_latitudes[i],hamburg_longitudes[i]))
    address = reverse[0] 
    address_n = re.findall(".*, (.*),.*,.*,.*", address)[0]
    geo_lat = reverse[1][0]
    geo_lon = reverse[1][1]
    hamburg_neighborhoods.append([address, geo_lat, geo_lon, address_n])

hamburg_neighborhoods = pd.DataFrame(hamburg_neighborhoods)
hamburg_neighborhoods.rename(columns={0:"Address",1:"Latitude",2:"Longitude",3:"Neighborhood"}, inplace=True)

In [150]:
berlin_neighborhoods = []

for i in range(0,len(berlin_latitudes)):
    reverse = geolocator.reverse((berlin_latitudes[i],berlin_longitudes[i]))
    address = reverse[0] 
    try:
        address_n = re.findall(".*, (.*),.*,.*,.*", address)[0]
    except:
        address_n = re.findall("(.*),.*,.*,.*", address)[0]
    
    geo_lat = reverse[1][0]
    geo_lon = reverse[1][1]
    berlin_neighborhoods.append([address, geo_lat, geo_lon, address_n])

berlin_neighborhoods = pd.DataFrame(berlin_neighborhoods)
berlin_neighborhoods.rename(columns={0:"Address",1:"Latitude",2:"Longitude",3:"Neighborhood"}, inplace=True)

In [151]:
hamburg_neighborhoods.head()

Unnamed: 0,Address,Latitude,Longitude,Neighborhood
0,"Veddeler Damm, Kleiner Grasbrook, Hamburg, 204...",53.526194,9.989926,Kleiner Grasbrook
1,"5, Am Windhukkai, Kleiner Grasbrook, Hamburg, ...",53.525975,9.995175,Kleiner Grasbrook
2,"18, Veddeler Damm, Kleiner Grasbrook, Hamburg,...",53.524195,10.004317,Kleiner Grasbrook
3,"F, Dessauer Straße, O'Swaldkai, Kleiner Grasbr...",53.526554,10.011925,Kleiner Grasbrook
4,"Schule auf der Veddel, 1-3, Slomanstieg, Vedde...",53.526718,10.019412,Veddel


In [152]:
berlin_neighborhoods.head()

Unnamed: 0,Address,Latitude,Longitude,Neighborhood
0,"Drosselsteg, Schöneberg, Tempelhof-Schöneberg,...",52.473163,13.357282,Tempelhof-Schöneberg
1,"Sachsendammsteg, A 100, Schöneberg, Tempelhof-...",52.473755,13.363731,Tempelhof-Schöneberg
2,"105, Hoeppnerstraße, Gartenstadt Neu-Tempelhof...",52.473391,13.371995,Tempelhof-Schöneberg
3,"Anwohnerparkplätze, Wiesenerstraße, Gartenstad...",52.473657,13.379431,Tempelhof-Schöneberg
4,"83, Tempelhofer Damm, Gartenstadt Neu-Tempelho...",52.473625,13.386045,Tempelhof-Schöneberg


In [153]:
hamburg_neighborhoods.to_csv("hamburg_neighborhoods.csv", index=False)
berlin_neighborhoods.to_csv("berlin_neighborhoods.csv", index=False)

### Get venue data

## Methodology

## Results

## Discussion

## Conclusion