# Analyzing the similarity of major German Cities

## Introduction

Berlin and Hamburg are two of Germany's largest cities. In this analysis I will explore how similar or dissimilar they are based on Foursquare location data.

My target audience are people moving from one city to another and wanting to know where they should rent their new apartment.

Specifically, if you move from Hamburg to Berlin which neighborhood should you move to based on your previous neighborhood?

What characteristics do these neighborhoods have?

## Data

For each Hamburg neighborhood I will create a list of recommended Berlin neighborhoods based on how similar the mix of venue types is. 

I decided to use a regularly spaced grid of locations, centered around each city center, to define the neighborhoods.

The foollowing data sources will be needed to extract/generate the required information:

* The neighborhoods will be generated algorithmically and approximate addresses of centers of those areas will be obtained using geopy Nominatim reverse geocoding.
* The number of venues and their type in every neighborhood will be obtained using the Foursquare API.
* Geopy Nominatim will be used to obtain the city centers, using the Außenalster for Hamburg and the Brandenburg Gate for Berlin.

In [80]:
import numpy as np
import pandas as pd

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import folium # map rendering library

import re # for regular expressions

# for transforming geocoordinates
import shapely.geometry
import pyproj
import math

import requests # library to handle requests

from sklearn.cluster import KMeans # for clustering

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Libraries imported.')

Libraries imported.


### Generate neighborhoods

In [77]:
address = 'Brandenburg Gate, Berlin, Germany'

geolocator = Nominatim(user_agent="hamburg_explorer")
location = geolocator.geocode(address)
berlin_lat = location.latitude
berlin_lon = location.longitude
print('The geograpical coordinates of Berlin are {}, {}.'.format(berlin_lat, berlin_lon))

The geograpical coordinates of Berlin are 52.51628045, 13.37770188288172.


In [78]:
address = 'Außenalster, Hamburg, Germany'

geolocator = Nominatim(user_agent="hamburg_explorer")
location = geolocator.geocode(address)
hamburg_lat = location.latitude
hamburg_lon = location.longitude
print('The geograpical coordinates of Hamburg are {}, {}.'.format(hamburg_lat, hamburg_lon))

The geograpical coordinates of Hamburg are 53.5689488, 10.007305547125247.


In [8]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Hamburg center longitude={}, latitude={}'.format(hamburg_lon, hamburg_lat))
x, y = lonlat_to_xy(hamburg_lon, hamburg_lat)
print('Hamburg center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Hamburg center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Hamburg center longitude=10.007305547125247, latitude=53.5689488
Hamburg center UTM X=169483.03662988317, Y=5947163.190106782
Hamburg center longitude=10.007305547125249, latitude=53.568948799999994


In [179]:
berlin_center_x, berlin_center_y = lonlat_to_xy(berlin_lon, berlin_lat) # City center in Cartesian coordinates
hamburg_center_x, hamburg_center_y = lonlat_to_xy(hamburg_lon, hamburg_lat)

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
square_width = 10000
neigborhood_radius = 1500
x_step = neigborhood_radius
y_step = neigborhood_radius * k

x_min = berlin_center_x - square_width/2
y_min = berlin_center_y - square_width/2 - (int(21/k)*k*neigborhood_radius - square_width)/2
berlin_latitudes = []
berlin_longitudes = []
berlin_distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = neigborhood_radius/2 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        berlin_distance_from_center = calc_xy_distance(berlin_center_x, berlin_center_y, x, y)
        if (berlin_distance_from_center <= square_width/2+1):
            lon, lat = xy_to_lonlat(x, y)
            berlin_latitudes.append(lat)
            berlin_longitudes.append(lon)
            berlin_distances_from_center.append(berlin_distance_from_center)
            xs.append(x)
            ys.append(y)
            
x_min = hamburg_center_x - square_width/2
y_min = hamburg_center_y - square_width/2 - (int(21/k)*k*neigborhood_radius - square_width)/2
hamburg_latitudes = []
hamburg_longitudes = []
hamburg_distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = neigborhood_radius/2 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        hamburg_distance_from_center = calc_xy_distance(hamburg_center_x, hamburg_center_y, x, y)
        if (hamburg_distance_from_center <= square_width/2+1):
            lon, lat = xy_to_lonlat(x, y)
            hamburg_latitudes.append(lat)
            hamburg_longitudes.append(lon)
            hamburg_distances_from_center.append(hamburg_distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(berlin_latitudes), 'Berlin neighborhood centers generated.')
print(len(hamburg_latitudes), 'Hamburg neighborhood centers generated.')

39 Berlin neighborhood centers generated.
39 Hamburg neighborhood centers generated.


In [249]:
map_berlin = folium.Map(location=[berlin_lat, berlin_lon], zoom_start=12)

# add markers to map
for lat, lng in zip(berlin_latitudes, berlin_longitudes):
    label = '{}, {}'.format(lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=neigborhood_radius/40,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_berlin)  
    
map_berlin

In [181]:
map_hamburg = folium.Map(location=[hamburg_lat, hamburg_lon], zoom_start=12)

# add markers to map
for lat, lng in zip(hamburg_latitudes, hamburg_longitudes):
    label = '{}, {}'.format(lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=neigborhood_radius/40,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_hamburg)  
    
map_hamburg

In [217]:
hamburg_neighborhoods = []

for i in range(0,len(hamburg_latitudes)):
    reverse = geolocator.reverse((hamburg_latitudes[i],hamburg_longitudes[i]))
    address = reverse[0] 
    address_n = re.findall(".*, (.*),.*,.*,.*", address)[0]
    geo_lat = reverse[1][0]
    geo_lon = reverse[1][1]
    city = "Hamburg"
    hamburg_neighborhoods.append([address, geo_lat, geo_lon, address_n, city])

hamburg_neighborhoods = pd.DataFrame(hamburg_neighborhoods)
hamburg_neighborhoods.rename(columns={0:"Neighborhood",1:"Latitude",2:"Longitude",3:"Borough",4:"City"}, inplace=True)

In [218]:
berlin_neighborhoods = []

for i in range(0,len(berlin_latitudes)):
    reverse = geolocator.reverse((berlin_latitudes[i],berlin_longitudes[i]))
    address = reverse[0] 
    try:
        address_n = re.findall(".*, (.*),.*,.*,.*", address)[0]
    except:
        address_n = re.findall("(.*),.*,.*,.*", address)[0]
    
    geo_lat = reverse[1][0]
    geo_lon = reverse[1][1]
    city = "Berlin"
    berlin_neighborhoods.append([address, geo_lat, geo_lon, address_n, city])

berlin_neighborhoods = pd.DataFrame(berlin_neighborhoods)
berlin_neighborhoods.rename(columns={0:"Neighborhood",1:"Latitude",2:"Longitude",3:"Borough",4:"City"}, inplace=True)

In [219]:
hamburg_neighborhoods.head()

Unnamed: 0,Address,Latitude,Longitude,Neighborhood,City
0,"Zweite Querkanalbrücke, Worthdamm, Steinwerder...",53.532424,9.981784,Steinwerder,Hamburg
1,"37, Chicagokai, Quartier Elbtorquartier, Hafen...",53.534379,10.002156,HafenCity,Hamburg
2,"12, Zweibrückenstraße, Quartier Elbbrücken, Ha...",53.534061,10.025496,HafenCity,Hamburg
3,"129e, Marckmannstraße, Rothenburgsort, Hamburg...",53.534455,10.048362,Hamburg-Mitte,Hamburg
4,"Am Altonaer Holzhafen, Altona-Altstadt, Altona...",53.54422,9.945906,Altona,Hamburg


In [220]:
berlin_neighborhoods.head()

Unnamed: 0,Address,Latitude,Longitude,Neighborhood,City
0,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin
1,"Rosenpromenade, KGA Papestraße, Tempelhof, Tem...",52.481159,13.371573,Tempelhof-Schöneberg,Berlin
2,"Flughafen Tempelhof, Werner-Loebermann-Weg, Ga...",52.480649,13.388919,Tempelhof-Schöneberg,Berlin
3,"Hasenschänke, Columbiadamm, Tempelhof, Tempelh...",52.482636,13.416267,Tempelhof-Schöneberg,Berlin
4,"24, Pommersche Straße, Wilmersdorf, Charlotten...",52.492595,13.315906,Wilmersdorf,Berlin


In [226]:
neighborhoods = pd.concat([hamburg_neighborhoods, berlin_neighborhoods])

In [227]:
hamburg_neighborhoods.to_csv("Data/hamburg_neighborhoods.csv", index=False)
berlin_neighborhoods.to_csv("Data/berlin_neighborhoods.csv", index=False)
neighborhoods.to_csv("Data/neighborhoods.csv", index=False)

### Get venue data

In [2]:
hamburg_neighborhoods = pd.read_csv("Data/hamburg_neighborhoods.csv")
hamburg_neighborhoods.head()

Unnamed: 0,Address,Latitude,Longitude,Neighborhood,City
0,"Zweite Querkanalbrücke, Worthdamm, Steinwerder...",53.532424,9.981784,Steinwerder,Hamburg
1,"37, Chicagokai, Quartier Elbtorquartier, Hafen...",53.534379,10.002156,HafenCity,Hamburg
2,"12, Zweibrückenstraße, Quartier Elbbrücken, Ha...",53.534061,10.025496,HafenCity,Hamburg
3,"129e, Marckmannstraße, Rothenburgsort, Hamburg...",53.534455,10.048362,Hamburg-Mitte,Hamburg
4,"Am Altonaer Holzhafen, Altona-Altstadt, Altona...",53.54422,9.945906,Altona,Hamburg


In [3]:
berlin_neighborhoods = pd.read_csv("Data/berlin_neighborhoods.csv")
berlin_neighborhoods.head()

Unnamed: 0,Address,Latitude,Longitude,Neighborhood,City
0,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin
1,"Rosenpromenade, KGA Papestraße, Tempelhof, Tem...",52.481159,13.371573,Tempelhof-Schöneberg,Berlin
2,"Flughafen Tempelhof, Werner-Loebermann-Weg, Ga...",52.480649,13.388919,Tempelhof-Schöneberg,Berlin
3,"Hasenschänke, Columbiadamm, Tempelhof, Tempelh...",52.482636,13.416267,Tempelhof-Schöneberg,Berlin
4,"24, Pommersche Straße, Wilmersdorf, Charlotten...",52.492595,13.315906,Wilmersdorf,Berlin


In [10]:
print("There are", berlin_neighborhoods.shape[0], "neighborhoods in Berlin and",
      hamburg_neighborhoods.shape[0], "in Hamburg.")
print("They belong to",
      berlin_neighborhoods.Borough.unique().shape[0],
      "and",
      hamburg_neighborhoods.Borough.unique().shape[0],
      "boroughs respectively."
     )

There are 39 neighborhoods in Berlin and 39 in Hamburg.
They belong to 14 and 13 boroughs respectively.


In [11]:
limit = 100
radius = 1500 # see neighbourhood radius above

VERSION = '20180605' # Foursquare API version

In [12]:
%run credentials.py # client_id and client_secret for Foursquare

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [14]:
def getNearbyVenues(neighborhoods, latitudes, longitudes, boroughs, cities, radius=500):
    
    venues_list=[]
    for neigh, lat, lng, bor, city in zip(neighborhoods, latitudes, longitudes, boroughs, cities):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            neigh,
            lat,
            lng,
            bor,
            city,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Address Latitude', 
                  'Address Longitude',
                  'Borough',
                  'City',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
hamburg_venues = getNearbyVenues(
    addresses = hamburg_neighborhoods['Neighborhood'],
    latitudes = hamburg_neighborhoods['Latitude'],
    longitudes = hamburg_neighborhoods['Longitude'],
    neighborhoods = hamburg_neighborhoods['Borough'],
    cities = hamburg_neighborhoods['City']
)

In [18]:
berlin_venues = getNearbyVenues(
    addresses = berlin_neighborhoods['Neighborhood'],
    latitudes = berlin_neighborhoods['Latitude'],
    longitudes = berlin_neighborhoods['Longitude'],
    neighborhoods = berlin_neighborhoods['Borough'],
    cities = berlin_neighborhoods['City']
)

In [39]:
all_venues = pd.concat([berlin_venues, hamburg_venues], ignore_index=True)

In [40]:
all_venues.shape

(2497, 9)

In [41]:
# exclude neighborhoods that are too small or too big
a_count = all_venues.groupby("Neighborhood").count()
a_incl = a_count[a_count["Venue"] >= 10].reset_index().Address
all_venues = all_venues[all_venues.Address.isin(a_incl)]

all_venues.shape

(2443, 9)

In [42]:
all_venues.head()

Unnamed: 0,Address,Address Latitude,Address Longitude,Neighborhood,City,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Café de Enrico,52.481014,13.349788,Café
1,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Osbili,52.479532,13.349973,Bistro
2,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Sahara Sudanesische Spezialitäten,52.479845,13.35181,African Restaurant
3,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Odeon,52.482086,13.349483,Indie Movie Theater
4,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Balkan Grill,52.480042,13.35242,Eastern European Restaurant


In [53]:
all_venues.to_csv("Data/all_venues.csv", index=False)

### Find Top 10 venue types for each neighborhood and borough

In [58]:
venue_cat_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="")
venue_cat_onehot['Neighborhood'] = all_venues['Neighborhood'] 
venue_cat_onehot['Borough'] = all_venues['Borough'] 

In [59]:
boroughs_grouped = venue_cat_onehot.groupby('Borough').mean().reset_index()
boroughs_grouped.head()

Unnamed: 0,Borough,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo Exhibit
0,Altona,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,...,0.011494,0.0,0.0,0.0,0.022989,0.0,0.0,0.0,0.0,0.0
1,Altstadt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Charlottenburg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,...,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.011905,0.0,0.0
3,Eimsbüttel,0.0,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011811,0.0,0.0,0.0,0.0,0.0
4,Friedrichshain-Kreuzberg,0.0,0.00289,0.0,0.0,0.00289,0.00289,0.0,0.00289,0.00289,...,0.0,0.00289,0.00578,0.0,0.00578,0.0,0.0,0.0,0.00289,0.0


In [60]:
neighborhoods_grouped = venue_cat_onehot.groupby('Neighborhood').mean().reset_index()
neighborhoods_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo Exhibit
0,"10, Torstraße, Spandauer Vorstadt, Mitte, 1011...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0
1,"104, Ackerstraße, Gesundbrunnen, Mitte, 13355,...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"108, Schönhauser Allee, Arnimkiez, Prenzlauer ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"12, Dorothea-Bernstein-Weg, Uhlenhorst, Hambur...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"12, Zweibrückenstraße, Quartier Elbbrücken, Ha...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [61]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [63]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neighborhoods_grouped['Neighborhood']

for ind in np.arange(neighborhoods_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neighborhoods_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"10, Torstraße, Spandauer Vorstadt, Mitte, 1011...",Coffee Shop,Art Gallery,Japanese Restaurant,Italian Restaurant,Athletics & Sports,Vegetarian / Vegan Restaurant,Cocktail Bar,Furniture / Home Store,Modern European Restaurant,Restaurant
1,"104, Ackerstraße, Gesundbrunnen, Mitte, 13355,...",Bakery,Bar,IT Services,Park,Café,German Restaurant,Pizza Place,Supermarket,Italian Restaurant,Cupcake Shop
2,"108, Schönhauser Allee, Arnimkiez, Prenzlauer ...",Café,Bakery,Organic Grocery,Italian Restaurant,Bar,Korean Restaurant,Pub,Grocery Store,Coffee Shop,Drugstore
3,"12, Dorothea-Bernstein-Weg, Uhlenhorst, Hambur...",Clothing Store,Indian Restaurant,Shopping Mall,Fast Food Restaurant,Italian Restaurant,Gym / Fitness Center,Greek Restaurant,Bar,Theater,Drugstore
4,"12, Zweibrückenstraße, Quartier Elbbrücken, Ha...",Restaurant,Nightclub,Hotel,Light Rail Station,Hotel Pool,Hotel Bar,Fast Food Restaurant,Electronics Store,Empanada Restaurant,Event Space


In [64]:
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
boroughs_venues_sorted = pd.DataFrame(columns=columns)
boroughs_venues_sorted['Borough'] = boroughs_grouped['Borough']

for ind in np.arange(boroughs_grouped.shape[0]):
    boroughs_venues_sorted.iloc[ind, 1:] = return_most_common_venues(boroughs_grouped.iloc[ind, :], num_top_venues)

boroughs_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altona,Café,Bar,Restaurant,Seafood Restaurant,Pub,Italian Restaurant,Cocktail Bar,Park,Furniture / Home Store,Fish Market
1,Altstadt,German Restaurant,Asian Restaurant,Café,Italian Restaurant,Hotel,French Restaurant,Restaurant,Bistro,Exhibit,Coffee Shop
2,Charlottenburg,Hotel,Café,Chinese Restaurant,Indian Restaurant,German Restaurant,Plaza,Bank,Cocktail Bar,Theater,Bookstore
3,Eimsbüttel,Café,Italian Restaurant,Bakery,Ice Cream Shop,Hotel,Coffee Shop,Bar,Park,Supermarket,Bookstore
4,Friedrichshain-Kreuzberg,Café,Coffee Shop,Bar,Italian Restaurant,Supermarket,Bakery,Hotel,Cocktail Bar,Nightclub,Plaza


In [156]:
boroughs = all_venues[["Borough","City"]]
boroughs.drop_duplicates(inplace=True)
boroughs.index = range(0,len(boroughs))
boroughs.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Borough,City
0,Tempelhof-Schöneberg,Berlin
1,Wilmersdorf,Berlin
2,Friedrichshain-Kreuzberg,Berlin
3,Charlottenburg,Berlin
4,Tiergarten,Berlin


In [157]:
boroughs_loc = []
geolocator = Nominatim(user_agent="hamburg_explorer")

for i in range(0,len(boroughs)):
    borough = boroughs["Borough"][i]
    city = boroughs["City"][i]
    address = '{}, {}'.format(borough, city)
    location = geolocator.geocode(address)
    lat = location.latitude
    lon = location.longitude
    boroughs_loc.append([borough, city, lat, lon])

boroughs_loc = pd.DataFrame(boroughs_loc)
boroughs_loc.rename(columns={0:"Borough",1:"City",2:"Latitude",3:"Longitude"}, inplace=True)
boroughs_loc.head()

Unnamed: 0,Borough,City,Latitude,Longitude
0,Tempelhof-Schöneberg,Berlin,52.440603,13.373703
1,Wilmersdorf,Berlin,52.487115,13.32033
2,Friedrichshain-Kreuzberg,Berlin,52.515306,13.461612
3,Charlottenburg,Berlin,52.515747,13.309683
4,Tiergarten,Berlin,52.509778,13.35726


### Cluster neighborhoods and boroughs

In [148]:
kclusters_n = 10
kclusters_b = 5

neighborhood_clustering = neighborhoods_grouped.drop('Neighborhood', 1)
borough_clustering = boroughs_grouped.drop('Borough', 1)

In [149]:
kmeans_n = KMeans(n_clusters=kclusters_n, random_state=0).fit(neighborhood_clustering)
kmeans_b = KMeans(n_clusters=kclusters_b, random_state=0).fit(borough_clustering)

In [154]:
neighborhoods_venues_sorted.drop("Cluster Labels",1, inplace=True)
boroughs_venues_sorted.drop("Cluster Labels",1, inplace=True)

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans_n.labels_)
boroughs_venues_sorted.insert(0, 'Cluster Labels', kmeans_b.labels_)

In [155]:
neighborhoods_merged = all_venues
neighborhoods_merged = neighborhoods_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), left_on='Neighborhood', right_on="Neighborhood")
neighborhoods_merged.head()

Unnamed: 0,Neighborhood,Address Latitude,Address Longitude,Borough,City,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Café de Enrico,52.481014,13.349788,Café,3,Indian Restaurant,Pub,Vietnamese Restaurant,Café,Eastern European Restaurant,Supermarket,Taverna,Grocery Store,Bistro,Falafel Restaurant
1,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Osbili,52.479532,13.349973,Bistro,3,Indian Restaurant,Pub,Vietnamese Restaurant,Café,Eastern European Restaurant,Supermarket,Taverna,Grocery Store,Bistro,Falafel Restaurant
2,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Sahara Sudanesische Spezialitäten,52.479845,13.35181,African Restaurant,3,Indian Restaurant,Pub,Vietnamese Restaurant,Café,Eastern European Restaurant,Supermarket,Taverna,Grocery Store,Bistro,Falafel Restaurant
3,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Odeon,52.482086,13.349483,Indie Movie Theater,3,Indian Restaurant,Pub,Vietnamese Restaurant,Café,Eastern European Restaurant,Supermarket,Taverna,Grocery Store,Bistro,Falafel Restaurant
4,"Pasta Bar, 1, Fritz-Reuter-Straße, Schöneberg,...",52.480966,13.349539,Tempelhof-Schöneberg,Berlin,Balkan Grill,52.480042,13.35242,Eastern European Restaurant,3,Indian Restaurant,Pub,Vietnamese Restaurant,Café,Eastern European Restaurant,Supermarket,Taverna,Grocery Store,Bistro,Falafel Restaurant


In [158]:
boroughs_merged = boroughs_loc
boroughs_merged = boroughs_merged.merge(boroughs_venues_sorted.set_index('Borough'), left_on='Borough', right_on="Borough")
boroughs_merged.head()

Unnamed: 0,Borough,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Tempelhof-Schöneberg,Berlin,52.440603,13.373703,1,Café,Bakery,German Restaurant,Vietnamese Restaurant,Supermarket,Park,Italian Restaurant,Cocktail Bar,Historic Site,Pizza Place
1,Wilmersdorf,Berlin,52.487115,13.32033,3,French Restaurant,German Restaurant,Hotel,Flea Market,Greek Restaurant,Gas Station,Supermarket,Street Food Gathering,Spa,Organic Grocery
2,Friedrichshain-Kreuzberg,Berlin,52.515306,13.461612,1,Café,Coffee Shop,Bar,Italian Restaurant,Supermarket,Bakery,Hotel,Cocktail Bar,Nightclub,Plaza
3,Charlottenburg,Berlin,52.515747,13.309683,1,Hotel,Café,Chinese Restaurant,Indian Restaurant,German Restaurant,Plaza,Bank,Cocktail Bar,Theater,Bookstore
4,Tiergarten,Berlin,52.509778,13.35726,1,Hotel,Café,Park,Coffee Shop,Plaza,Italian Restaurant,Hotel Bar,Bakery,Art Gallery,Bar


### Map neighborhood clusters

In [161]:
# set color scheme for the clusters
x = np.arange(kclusters_n)
ys = [i + x + (i*x)**2 for i in range(kclusters_n)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# create map
map_clusters = folium.Map(location=[hamburg_lat, hamburg_lon], zoom_start=12)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhoods_merged['Address Latitude'],
                                  neighborhoods_merged['Address Longitude'],
                                  neighborhoods_merged['Neighborhood'],
                                  neighborhoods_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=35, # neighborhood_radius/40
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.01,
        parse_html=False
    ).add_to(map_clusters)
       
map_clusters

In [164]:
map_clusters

### Map borough clusters

In [166]:
# set color scheme for the clusters
x = np.arange(kclusters_b)
ys = [i + x + (i*x)**2 for i in range(kclusters_b)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# create map
map_clusters_b = folium.Map(location=[hamburg_lat, hamburg_lon], zoom_start=12)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(boroughs_merged['Latitude'],
                                  boroughs_merged['Longitude'],
                                  boroughs_merged['Borough'],
                                  boroughs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=35, # neighborhood_radius/40
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.01,
        parse_html=False
    ).add_to(map_clusters_b)
       
map_clusters_b

In [167]:
map_clusters_b

### Explore neighborhood clusters

## Methodology

## Results

## Discussion

## Conclusion