# The Battle of the Neighborhoods - Week 2

## Part 4: Segmenting and clustering the neighborhoods of Munich

## Introduction

In this part, we will use the Foursquare API to explore neighborhoods in Munich. Therefore, we will use the explore function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. We will use the k-means clustering algorithm to complete this task. Finally, we will use the Folium library to visualize the neighborhoods in Munich and their emerging clusters.

## Table of Contents
1. Explore Neighborhoods in Munich with Foursquare
2. Analyze Each Neighborhood
3. Cluster and Examine Neighborhoods

#### Install and download necessary libraries

In [60]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # transform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

import csv # implements classes to read and write tabular data in CSV form

print('Libraries imported.')

Libraries imported.


## 1. Explore Neighborhoods in Munich with Foursquare

#### Reload dataframe with latitude and longitude of each neighborhood

In [61]:
munich_neighborhoods = pd.read_excel("Neighborhoods.xlsx")
munich_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Allach Untermenzing,Industriebezirk,48.196839,11.476602
1,Allach Untermenzing,Untermenzing Allach,48.177715,11.472676
2,Altstadt Lehel,Graggenau,48.139168,11.581965
3,Altstadt Lehel,Angerviertel,48.13367,11.571569
4,Altstadt Lehel,Hackenviertel,48.135731,11.569955


#### Define Foursquare Credentials, Version and Limit of Results

In [62]:
# @hidden_cell
CLIENT_ID = 'XJNAOTJVDDBQO0DAM1SJOKCRBYSOBNX5DN3VUQ211RONBDRX'
CLIENT_SECRET = 'TYOWPQKQODEI2ASDXQ1OJVTKZRTOVJ1ONINFCUTJEK52TGWZ' 
VERSION = '20200304' 
LIMIT = 200          
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XJNAOTJVDDBQO0DAM1SJOKCRBYSOBNX5DN3VUQ211RONBDRX
CLIENT_SECRET:TYOWPQKQODEI2ASDXQ1OJVTKZRTOVJ1ONINFCUTJEK52TGWZ


#### Extract Venues data for each neighborhood in Munich from Foursquare

In [63]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000): 
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Run the function above on each neighborhood

In [64]:
Munich_venues = getNearbyVenues(names=munich_neighborhoods['Neighborhood'],
                                   latitudes=munich_neighborhoods['Latitude'],
                                   longitudes=munich_neighborhoods['Longitude']
                                  )

Industriebezirk
Untermenzing Allach
Graggenau
Angerviertel
Hackenviertel
Kreuzviertel
Lehel
Englischer Garten Süd
Maximilianeum
Steinhausen
Haidhausen Nord
Haidhausen Süd
Obere Au
Untere Au
Altaubing
Aubing Süd
Lochhausen
Freiham
Echarding
Josephsburg
Berg am Laim Ost
Oberföhring
Johanneskirchen
Herzogpark
Englschalking
Daglfing
Parkstadt
Altbogenhausen
Feldmoching
Hasenbergl Lerchenau Ost
Ludwigsfeld
Lerchenau West
Blumenau
Neuhadern
Großhadern
Friedenheim
St Ulrich
Gärtnerplatz
Deutsches Museum
Glockenbach
Dreimühlen
Am alten südlichen Friedhof
Am Schlachthof
Ludwigsvorstadt Kliniken
St Paul
Königsplatz
Augustenstraße
St Benno
Marsfeld
Josephsplatz
Am alten nördlichen Friedhof
Universität
Schönfeldvorstadt
Maßmannbergl
Am Hart
Am Riesenfeld
Milbertshofen
Alt Moosach
Moosach Bahnhof
Neuhausen
Nymphenburg
Oberwiesenfeld
St Vinzenz
Alte Kaserne
Dom Pedro
Obergiesing
Südgiesing
Neupasing
Am Westbad
Pasing
Obermenzing
Ramersdorf
Balanstraße West
Altperlach
Neuperlach
Waldperlach
Freimann


#### Create a dataframe called Munich_venues containing all venues of Munich

In [65]:
print('The "Munich_venues" dataframe has {} venues and {} unique venue types.'.format(
      len(Munich_venues['Venue Category']),
      Munich_venues['Venue Category'].nunique()))

The "Munich_venues" dataframe has 5923 venues and 296 unique venue types.


#### Save this dataframe as csv file

In [66]:
Munich_venues.to_csv('Munich_venues.csv', sep=',', encoding='UTF8', index=False)

#### Let's check this dataframe

In [67]:
Munich_venues = pd.read_csv('Munich_venues.csv')
Munich_venues.columns = Munich_venues.columns.str.replace(' ', '')
Munich_venues.head()

Unnamed: 0,Neighborhood,NeighborhoodLatitude,NeighborhoodLongitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
0,Industriebezirk,48.196839,11.476602,Sport Bittl,48.191447,11.466553,Sporting Goods Shop
1,Industriebezirk,48.196839,11.476602,dm-drogerie markt,48.194118,11.46564,Drugstore
2,Industriebezirk,48.196839,11.476602,Rossmann,48.193301,11.466388,Drugstore
3,Industriebezirk,48.196839,11.476602,Lidl,48.194468,11.465456,Supermarket
4,Industriebezirk,48.196839,11.476602,REWE,48.193755,11.466,Supermarket


#### Let's check the shape

In [68]:
Munich_venues.shape

(5923, 7)

#### Let's check the number of neighborhoods found by Foursquare

In [69]:
Munich_venues['Neighborhood'].nunique()

99

##### The number of neighborhoods found by Foursquare (99) differ form the official number of neighborhoods (107). Let's nethertheless work with the Foursquare data to analyze the neighborhoods.

#### Now, let's visualize the Munich Venues

In [70]:
def Venues_Map(Borough_name, Borough_neighborhoods):
    
    # Use geopy library to get the latitude and longitude values 
    geolocator = Nominatim(user_agent="Munich")
    Borough_location = geolocator.geocode(Borough_name) 
    Borough_latitude = Borough_location.latitude
    Borough_longitude = Borough_location.longitude
    print('The geographical coordinates of "{}" are {}, {}.'.format(Borough_name, Borough_latitude, Borough_longitude))
    
    # To verify the number of Boroughs and Neighborhoods in the extracted data
    print('The "{}" dataframe has {} different venue types and {} neighborhoods.'.format(
          Borough_name,
          len(Borough_neighborhoods['VenueCategory'].unique()),
          len(Borough_neighborhoods['Neighborhood'].unique())))
    
    # create map of city using latitude and longitude values
    map_Borough = folium.Map(location=[Borough_latitude, Borough_longitude], zoom_start=11)

    # add markers to map
    for lat, lng, venue, category in zip(Borough_neighborhoods['VenueLatitude'], Borough_neighborhoods['VenueLongitude'], Borough_neighborhoods['Venue'], Borough_neighborhoods['VenueCategory']):
        label = '{}, {}'.format(category, venue)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=0.1,
            popup=label,
            color='red',
            fill=True,
            fill_color='#FF0000',
            fill_opacity=0.3).add_to(map_Borough)  

    return map_Borough

In [71]:
Venues_Map('Munich', Munich_venues)

The geographical coordinates of "Munich" are 48.1371079, 11.5753822.
The "Munich" dataframe has 296 different venue types and 99 neighborhoods.


#### Let's check how many venues by catagory are there

In [72]:
Munich_venues.groupby('VenueCategory')['Venue'].count().sort_values(ascending=False)

VenueCategory
Café                                        402
Italian Restaurant                          329
German Restaurant                           253
Hotel                                       230
Supermarket                                 227
Bakery                                      172
Plaza                                       169
Bar                                         150
Bus Stop                                    141
Restaurant                                  123
Drugstore                                   112
Ice Cream Shop                              108
Vietnamese Restaurant                       100
Coffee Shop                                  86
Greek Restaurant                             84
Indian Restaurant                            83
Asian Restaurant                             83
Pizza Place                                  81
Bavarian Restaurant                          72
Trattoria/Osteria                            72
Cocktail Bar              

### Let's also check how many venues are there for each neighborhood

In [73]:
Munich_venues.groupby('Neighborhood').count().sort_values(by="Venue",ascending=False)

Unnamed: 0_level_0,NeighborhoodLatitude,NeighborhoodLongitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Westend,100,100,100,100,100,100
Haidhausen Nord,100,100,100,100,100,100
Ludwigsvorstadt Kliniken,100,100,100,100,100,100
Schwanthalerhöhe,100,100,100,100,100,100
Obere Au,100,100,100,100,100,100
Glockenbach,100,100,100,100,100,100
Graggenau,100,100,100,100,100,100
Neuschwabing,100,100,100,100,100,100
Gärtnerplatz,100,100,100,100,100,100
Hackenviertel,100,100,100,100,100,100


### Let's find out how many unique venue categories are there

In [74]:
print("There are {} Venue Categories in Munich".format(Munich_venues["VenueCategory"].nunique()))

There are 296 Venue Categories in Munich


## 2. Analyze Each Neighborhood

In [75]:
# one hot encoding
Munich_onehot = pd.get_dummies(Munich_venues[['VenueCategory']], prefix="", prefix_sep="")

#column lists before adding neighborhood
column_names = ['Neighborhood'] + list(Munich_onehot.columns)

# add neighborhood column back to dataframe
Munich_onehot['Neighborhood'] = Munich_venues['Neighborhood'] 

# move neighborhood column to the first column
Munich_onehot = Munich_onehot[column_names]

Munich_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Basketball Stadium,Bavarian Restaurant,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Board Shop,Boarding House,Boat Rental,Boat or Ferry,Bookstore,Bosnian Restaurant,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridge,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Camera Store,Candy Store,Castle,Caucasian Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Cretan Restaurant,Cultural Center,Cupcake Shop,Currywurst Joint,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Fair,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Hawaiian Restaurant,Hill,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Rink,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kebab Restaurant,Kitchen Supply Store,Korean Restaurant,Lake,Laundry Service,Lebanese Restaurant,Light Rail Station,Liquor Store,Lottery Retailer,Lounge,Malay Restaurant,Manti Place,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Modern Greek Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outlet Store,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Planetarium,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Post Office,Pub,Public Art,Ramen Restaurant,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Rest Area,Restaurant,River,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Sauna / Steam Room,Sausage Shop,Scenic Lookout,Schnitzel Restaurant,Science Museum,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Storage Facility,Supermarket,Surf Spot,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Taverna,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Volleyball Court,Water Park,Waterfall,Wine Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Industriebezirk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Industriebezirk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Industriebezirk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Industriebezirk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Industriebezirk,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### We are only interested in restaurants. Let's filter the data.

In [76]:
restaurant_List = []
search = 'Restaurant'
for i in Munich_onehot.columns :
    if search in i:
        restaurant_List.append(i)

In [77]:
restaurant_List

['Afghan Restaurant',
 'African Restaurant',
 'American Restaurant',
 'Argentinian Restaurant',
 'Asian Restaurant',
 'Austrian Restaurant',
 'Bavarian Restaurant',
 'Bosnian Restaurant',
 'Caucasian Restaurant',
 'Chinese Restaurant',
 'Comfort Food Restaurant',
 'Cretan Restaurant',
 'Czech Restaurant',
 'Dim Sum Restaurant',
 'Doner Restaurant',
 'Dumpling Restaurant',
 'Eastern European Restaurant',
 'English Restaurant',
 'Ethiopian Restaurant',
 'Falafel Restaurant',
 'Fast Food Restaurant',
 'French Restaurant',
 'German Restaurant',
 'Greek Restaurant',
 'Grilled Meat Restaurant',
 'Hawaiian Restaurant',
 'Indian Restaurant',
 'Israeli Restaurant',
 'Italian Restaurant',
 'Japanese Restaurant',
 'Jewish Restaurant',
 'Kebab Restaurant',
 'Korean Restaurant',
 'Lebanese Restaurant',
 'Malay Restaurant',
 'Mediterranean Restaurant',
 'Mexican Restaurant',
 'Middle Eastern Restaurant',
 'Modern European Restaurant',
 'Modern Greek Restaurant',
 'Peruvian Restaurant',
 'Portuguese 

#### Let's check the number of restaurant categories

In [78]:
print("There are {} different categories of restaurants in Munich".format(len(restaurant_List)))

There are 58 different categories of restaurants in Munich


In [79]:
col_name = []
col_name = ['Neighborhood'] + restaurant_List
Munich_restaurant = Munich_onehot[col_name]
Munich_restaurant = Munich_restaurant.iloc[:,::]

In [80]:
Munich_restaurant_grouped = Munich_restaurant.groupby('Neighborhood').sum().reset_index()

In [81]:
Munich_restaurant_grouped['Total'] =Munich_restaurant_grouped .sum(axis=1)

## 3. Cluster Neighborhoods and Examine Clusters

There are several methods to determine the best number of clusters. I decided to use the Silhouette-Score.The higher the coefficient, the better an object fits into its cluster and the worse into neighboring clusters.  To find the best number of clusters, I use a for-loop for 2 to 10 clusters, Kmeans is the chosen clustering algorithm.

In [82]:
Munich_grouped_clustering = Munich_restaurant_grouped.drop('Neighborhood', 1)

for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(Munich_grouped_clustering)
    label = kmeans.labels_
    sil_coeff = silhouette_score(Munich_grouped_clustering, label, metric='euclidean')
    print("The Silhouette Coefficient for {} Clusters is {}".format(n_cluster,sil_coeff)) 

The Silhouette Coefficient for 2 Clusters is 0.642988506884554
The Silhouette Coefficient for 3 Clusters is 0.5349862084994662
The Silhouette Coefficient for 4 Clusters is 0.42489165167271004
The Silhouette Coefficient for 5 Clusters is 0.414404087118509
The Silhouette Coefficient for 6 Clusters is 0.3949431880719668
The Silhouette Coefficient for 7 Clusters is 0.3865595643406843
The Silhouette Coefficient for 8 Clusters is 0.2750731481848784
The Silhouette Coefficient for 9 Clusters is 0.27108324320527194


##### 2 Clusters have the highest Silhouette Coefficient. Therefore, the number of Clusters ist set to 2.

In [83]:
# set number of clusters
kclusters = 2

Munich_grouped_clustering = Munich_restaurant_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Munich_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1,
       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
       0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1])

In [84]:
Munich_results = pd.DataFrame(kmeans.cluster_centers_)
Munich_results.columns = Munich_grouped_clustering.columns
Munich_results.index = ['cluster0','cluster1']
Munich_results['Total Sum'] = Munich_results.sum(axis = 1)
Munich_results

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Austrian Restaurant,Bavarian Restaurant,Bosnian Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Cretan Restaurant,Czech Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Grilled Meat Restaurant,Hawaiian Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Modern Greek Restaurant,Peruvian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Schnitzel Restaurant,Seafood Restaurant,South American Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Xinjiang Restaurant,Total,Total Sum
cluster0,0.051724,-6.245005e-17,0.051724,0.017241,0.396552,-1.110223e-16,0.275862,1.387779e-17,0.017241,0.224138,2.4286130000000003e-17,1.387779e-17,3.469447e-18,1.387779e-17,0.327586,6.938894e-18,0.06896552,0.017241,1.387779e-17,0.086207,0.258621,0.017241,1.344828,0.741379,1.387779e-17,6.938894e-18,0.310345,8.326673e-17,1.275862,0.017241,1.387779e-17,4.8572260000000006e-17,4.8572260000000006e-17,6.938894e-18,0.01724138,2.498002e-16,0.086207,0.034483,0.051724,2.4286130000000003e-17,-6.245005e-17,0.017241,2.775558e-17,0.396552,0.01724138,0.103448,3.469447e-18,0.017241,0.12069,3.469447e-18,1.665335e-16,0.086207,0.034483,3.469447e-18,0.12069,0.068966,0.086207,1.387779e-17,6.758621,13.517241
cluster1,0.780488,0.1219512,0.146341,0.02439,1.463415,0.4634146,1.365854,0.09756098,0.097561,0.414634,0.07317073,0.09756098,0.02439024,0.09756098,0.756098,0.04878049,-2.0816680000000002e-17,0.04878,0.09756098,0.512195,0.292683,0.804878,4.268293,1.0,0.09756098,0.04878049,1.585366,0.1707317,6.219512,0.829268,0.09756098,0.1463415,0.1463415,0.04878049,-5.2041700000000004e-18,0.5365854,0.268293,0.585366,0.243902,0.07317073,0.1219512,0.170732,0.195122,2.439024,-5.2041700000000004e-18,0.536585,0.02439024,0.560976,0.829268,0.02439024,0.3414634,0.853659,0.04878,0.02439024,0.585366,0.414634,2.317073,0.09756098,33.780488,67.560976


#### The Total and Total sum of cluster1 has the smaller value. This indicates that this cluster is not saturated wheras cluster0 seems to be highly saturated.

##### Let's create a DataFrame that contains the Neighborhood, the Total Sum and the Number of the Cluster

In [85]:
Munich_results_merged = pd.DataFrame(Munich_restaurant_grouped['Neighborhood'])

Munich_results_merged['Total'] = Munich_restaurant_grouped['Total']
Munich_results_merged = Munich_results_merged.assign(Cluster_Labels = kmeans.labels_)

In [86]:
print(Munich_results_merged.shape)
Munich_results_merged.head()

(99, 3)


Unnamed: 0,Neighborhood,Total,Cluster_Labels
0,Alt Moosach,3,0
1,Altaubing,1,0
2,Altbogenhausen,27,1
3,Alte Heide Hirschau,1,0
4,Alte Kaserne,48,1


#### Let's get the neighborhoods and their respective longitudes and latitudes from the Foursquare data

In [87]:
Munich_neighborhoods = Munich_venues['Neighborhood'].unique().tolist()
Munich_latitudes = Munich_venues['NeighborhoodLatitude'].unique().tolist()
Munich_longitudes = Munich_venues['NeighborhoodLongitude'].unique().tolist()
Munich_Geo = pd.DataFrame({'Neighborhood':Munich_neighborhoods,'Latitude':Munich_latitudes,'Longitude':Munich_longitudes},
                          columns = ['Neighborhood','Latitude','Longitude'])
Munich_Geo.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Industriebezirk,48.196839,11.476602
1,Untermenzing Allach,48.177715,11.472676
2,Graggenau,48.139168,11.581965
3,Angerviertel,48.13367,11.571569
4,Hackenviertel,48.135731,11.569955


#### Let's merge the Munich_Geo data with the munich_results_merged Data

In [88]:
Munich_merged = Munich_Geo
Munich_merged = Munich_merged.join(Munich_results_merged.set_index('Neighborhood'), on='Neighborhood')

print(Munich_merged.shape)
Munich_merged.head(10) 

(99, 5)


Unnamed: 0,Neighborhood,Latitude,Longitude,Total,Cluster_Labels
0,Industriebezirk,48.196839,11.476602,0,0
1,Untermenzing Allach,48.177715,11.472676,3,0
2,Graggenau,48.139168,11.581965,25,1
3,Angerviertel,48.13367,11.571569,23,1
4,Hackenviertel,48.135731,11.569955,20,0
5,Kreuzviertel,48.139698,11.573209,21,1
6,Lehel,48.139656,11.587921,27,1
7,Englischer Garten Süd,48.149148,11.60579,21,1
8,Maximilianeum,48.136134,11.595202,34,1
9,Steinhausen,48.138524,11.627511,4,0


### Use geocoder to get latitude and longitude of munich.

In [91]:
geolocator = Nominatim(user_agent="munich")
location = geolocator.geocode("Munich")
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Munich are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Munich are 48.1371079, 11.5753822.


### Let's visualize the resulting clusters

In [92]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Munich_merged['Latitude'], Munich_merged['Longitude'], Munich_merged['Neighborhood'], Munich_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### The red circles belong to cluster0 and the blue ones to cluster1

### Let's make lists of the neighborhoods of interest

## Cluster0: Saturated Markets

In [57]:
Munich_merged[Munich_merged['Cluster_Labels'] == 0].reset_index(drop=True).sort_values(by="Total",ascending=False)

Unnamed: 0,Neighborhood,Latitude,Longitude,Total,Cluster_Labels
2,Hackenviertel,48.135731,11.569955,20,0
25,Gärtnerplatz,48.131486,11.575828,19,0
37,Pasing,48.149956,11.461767,18,0
34,Obergiesing,48.111156,11.588909,17,0
28,Am Riesenfeld,48.182373,11.558598,16,0
50,Thalkirchen,48.10284,11.545979,14,0
45,Kleinhesselohe,48.15542,11.600026,13,0
35,Neupasing,48.152271,11.469903,13,0
57,Giesing,48.11113,11.596084,13,0
23,Friedenheim,48.136197,11.518428,12,0


## Cluster 1 : Untapped Markets


In [58]:
Munich_merged[Munich_merged['Total'] == 1].reset_index(drop=True)

Unnamed: 0,Neighborhood,Latitude,Longitude,Total,Cluster_Labels
0,Altaubing,48.165736,11.401493,1,0
1,Lochhausen,48.176021,11.408845,1,0
2,St Ulrich,47.995819,11.487573,1,0
3,Alte Heide Hirschau,48.177105,11.606023,1,0
4,Obersendling,48.97802,11.520807,1,0


### There are four neighboorhoods that are untapped:
     - Altaubing
     - Lochhausen
     - St. Ulrich and
     - Obersendling

The analysis (and the map above) show that these neighborhoods are the only untapped markets which are suitable for establishing a restaurant. All these neighborhoods are located on the outskirts of Munich. The highly saturated markets in the centre of Munich are probably too risky to start a restaurant.