# IBM Applied Data Science Capstone
### Week 5 
**Opening a new Indian restaurant in Amsterdam, Netherlands**
- Build a dataframe of neighborhoods in Amsterdam, Netherlands from Wikipedia page
- Get the latitude longitude details of each neighborhood
- Get the venue data for each neighborhood using Foursquare API
- Cluster the neighbourhood
- Choose the best neighbourhood

### ** Importing Libraries

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder
import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


#### 2. Web scraping through Wikipedia page

In [6]:
# send the GET request
data = requests.get('https://en.wikipedia.org/wiki/Category:Neighbourhoods_of_Amsterdam').text
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
# create neighbourhood list to store table data
Neighbourhoods = []
# append the data into the list
for row in soup.find_all("div", class_="mw-category-generated")[0].findAll("li"):
    Neighbourhoods.append(row.text)
amsterdam_df = pd.DataFrame({"Neighbourhoods": Neighbourhoods})
amsterdam_df.drop(0,0,inplace=True)
amsterdam_df['Neighbourhoods'] = amsterdam_df['Neighbourhoods'].str.replace(" \(Amsterdam\)","")
amsterdam_df['Neighbourhoods'] = amsterdam_df['Neighbourhoods'].str.replace(" \(neighbourhood\)","")
amsterdam_df.head()

Unnamed: 0,Neighbourhoods
1,Admiralenbuurt
2,Amsteldorp
3,Amsterdam Oud-West
4,Amsterdam Oud-Zuid
5,Amsterdam Science Park


In [7]:
# print the number of rows of the dataframe
amsterdam_df.shape

(105, 1)

## **Get latitude longitude details

In [8]:
# define a function to get coordinates
def get_latlng(neighbourhoods):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Amsterdam, Netherlands'.format(neighbourhoods))
        lat_lng_coords = g.latlng
        lat = lat_lng_coords[0]
        long = lat_lng_coords[1]
    return neighbourhoods,lat,long

#lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighbourhoods) for neighbourhoods in amsterdam_df["Neighbourhoods"].tolist() ]

In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Neighbourhoods','Latitude', 'Longitude'])

In [14]:
amsterdam_df = df_coords
amsterdam_df.head()

Unnamed: 0,Neighbourhoods,Latitude,Longitude
0,Admiralenbuurt,52.372734,4.856363
1,Amsteldorp,52.36054,4.90516
2,Amsterdam Oud-West,52.36539,4.87022
3,Amsterdam Oud-Zuid,52.35235,4.87788
4,Amsterdam Science Park,52.35432,4.95803


In [15]:
CLIENT_ID = 'QBEOHTMYDSZGNPWZNKIMGAHL1BVY5KVZBJGFUPM3UDPN2JAF' # your Foursquare ID
CLIENT_SECRET = 'Q142X5G20QH1QKQEWWXLLF1KZDH4HD0LQVTNCQZH3VYEXJLL' # your Foursquare Secret
VERSION = '20180604'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QBEOHTMYDSZGNPWZNKIMGAHL1BVY5KVZBJGFUPM3UDPN2JAF
CLIENT_SECRET:Q142X5G20QH1QKQEWWXLLF1KZDH4HD0LQVTNCQZH3VYEXJLL


In [16]:
address = 'Amsterdam, NL'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

52.3745403 4.89797550561798


In [17]:
# create map of Amsterdam using latitude and longitude values
map_amsterdam = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, long, neighbourhoods in zip(amsterdam_df['Latitude'], amsterdam_df['Longitude'], amsterdam_df['Neighbourhoods']):
    label = '{}'.format(neighbourhoods)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_amsterdam)  
    
map_amsterdam

#### I am filtering my search only on restaurants and not going with other venue categories such as Shopping Mall, Grocery Store etc. We can definitely change the criteria for better analysis and judgement.

In [18]:
search_query = 'Restaurants'
radius = 500
print(search_query + ' .... OK!')

Restaurants .... OK!


In [20]:
radius = 500
LIMIT = 100

venues = []

for row in amsterdam_df.itertuples():
    neighbourhoods = getattr(row,'Neighbourhoods')
    lat = getattr(row,'Latitude')
    long = getattr(row,'Longitude')
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,lat,long,VERSION,search_query,radius,LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
            venues.append((
            neighbourhoods,
            lat,
            long,
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [21]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighbourhoods', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)

(2778, 7)


In [22]:
venues_df.head()

Unnamed: 0,Neighbourhoods,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Admiralenbuurt,52.372734,4.856363,Radijs,52.371049,4.856756,Bistro
1,Admiralenbuurt,52.372734,4.856363,Maz Mez,52.371231,4.857968,Middle Eastern Restaurant
2,Admiralenbuurt,52.372734,4.856363,Broodje Daan,52.373456,4.853299,Deli / Bodega
3,Admiralenbuurt,52.372734,4.856363,Kattencafé Kopjes,52.370556,4.855507,Pet Café
4,Admiralenbuurt,52.372734,4.856363,Sapporo Ramen Sora,52.371294,4.855144,Ramen Restaurant


In [23]:
venues_df.groupby(['Neighbourhoods']).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighbourhoods,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Admiralenbuurt,29,29,29,29,29,29
Amsteldorp,28,28,28,28,28,28
Amsterdam Oud-West,77,77,77,77,77,77
Amsterdam Oud-Zuid,41,41,41,41,41,41
Amsterdam Science Park,4,4,4,4,4,4
Apollobuurt,15,15,15,15,15,15
Betondorp,2,2,2,2,2,2
Binnenstad,43,43,43,43,43,43
Bos en Lommer,42,42,42,42,42,42
Buiksloot,5,5,5,5,5,5


#### Let's find out how many unique categories can be identified from all the returned venues

In [25]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 101 uniques categories.


#### Number of Indian restaurants across the neighbourhood

In [26]:
seriesObj = venues_df.apply(lambda x: True if x['VenueCategory'] == 'Indian Restaurant' else False, axis = 1)
numOfRows = len(seriesObj[seriesObj==True].index)
print(numOfRows)

32


#### Count of restaurants category-wise

In [27]:
venues_df.groupby('VenueCategory').count()

Unnamed: 0_level_0,Neighbourhoods,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude
VenueCategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghan Restaurant,2,2,2,2,2,2
African Restaurant,22,22,22,22,22,22
American Restaurant,4,4,4,4,4,4
Argentinian Restaurant,11,11,11,11,11,11
Asian Restaurant,46,46,46,46,46,46
Australian Restaurant,3,3,3,3,3,3
Austrian Restaurant,4,4,4,4,4,4
BBQ Joint,15,15,15,15,15,15
Bagel Shop,51,51,51,51,51,51
Bakery,158,158,158,158,158,158


#### Which neighbourhood has the max number of restaurants

In [28]:
venues_sort = venues_df.groupby(['Neighbourhoods'])['VenueCategory'].count() \
                             .reset_index(name='Restaurant count') \
                             .sort_values(['Restaurant count'], ascending=False) \
                             .head(10)
#top 10 neighbourhoods with maximum number of restaurants
venues_sort = pd.DataFrame(venues_sort)
venues_sort

Unnamed: 0,Neighbourhoods,Restaurant count
66,Oude Pijp,100
19,De Pijp,100
38,Jordaan,100
20,De Wallen,97
13,Burgwallen Oude Zijde,97
2,Amsterdam Oud-West,77
26,Frederik Hendrikbuurt,68
37,Jodenbuurt,65
85,Trompbuurt,65
32,Hoofddorppleinbuurt,57


#### Lets see how many Indian restaurants are there in each of the neighbourhoods in top 10 list

In [29]:
top10_neighbourhoodlist = []
top10_neighbourhoodlist = venues_sort['Neighbourhoods'].tolist()
top10_venues_df = venues_df.loc[venues_df['Neighbourhoods'].isin(top10_neighbourhoodlist)]
top10_list = venues_df.loc[venues_df['Neighbourhoods'].isin(top10_neighbourhoodlist)]
top10_venues_df = top10_venues_df.loc[top10_venues_df['VenueCategory']=='Indian Restaurant']
top10_venues_df

Unnamed: 0,Neighbourhoods,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
132,Amsterdam Oud-West,52.36539,4.87022,Tandoori Express,52.367866,4.874061,Indian Restaurant
347,Burgwallen Oude Zijde,52.37169,4.89724,Koh-I-Noor,52.371892,4.892994,Indian Restaurant
421,Burgwallen Oude Zijde,52.37169,4.89724,Kamasutra,52.375108,4.898459,Indian Restaurant
424,Burgwallen Oude Zijde,52.37169,4.89724,Gandhi,52.375384,4.894932,Indian Restaurant
590,De Pijp,52.35625,4.89057,Taj Mahal,52.357233,4.890989,Indian Restaurant
595,De Pijp,52.35625,4.89057,Balti House,52.354743,4.888495,Indian Restaurant
622,De Pijp,52.35625,4.89057,Restaurant Surya,52.353228,4.893747,Indian Restaurant
680,De Wallen,52.37169,4.89724,Koh-I-Noor,52.371892,4.892994,Indian Restaurant
754,De Wallen,52.37169,4.89724,Kamasutra,52.375108,4.898459,Indian Restaurant
757,De Wallen,52.37169,4.89724,Gandhi,52.375384,4.894932,Indian Restaurant


#### Number of Indian restaurants in the top 10 neighbourhoods.

In [30]:
top10_venues_indian = top10_venues_df.groupby(['Neighbourhoods']).size() \
                    .reset_index(name='Indian Restaurant count') \
                    .sort_values(['Indian Restaurant count'], ascending=False)
top10_venues_indian = pd.DataFrame(top10_venues_indian)
top10_venues_indian

Unnamed: 0,Neighbourhoods,Indian Restaurant count
1,Burgwallen Oude Zijde,3
2,De Pijp,3
3,De Wallen,3
7,Oude Pijp,3
5,Jodenbuurt,2
6,Jordaan,2
8,Trompbuurt,2
0,Amsterdam Oud-West,1
4,Frederik Hendrikbuurt,1


#### Count of all the restaurants and Indian restaurants in each of the top 10 neightbourhoods

In [34]:
merge_df = pd.merge(venues_sort,top10_venues_indian,how='inner',on='Neighbourhoods')
merge_df

Unnamed: 0,Neighbourhoods,Restaurant count,Indian Restaurant count
0,Oude Pijp,100,3
1,De Pijp,100,3
2,Jordaan,100,2
3,De Wallen,97,3
4,Burgwallen Oude Zijde,97,3
5,Amsterdam Oud-West,77,1
6,Frederik Hendrikbuurt,68,1
7,Jodenbuurt,65,2
8,Trompbuurt,65,2


As you can see, top 5 neighbourhoods already have atleast 2 or 3 Indian restaurants, whereas bottom 4 have 1 or 2 Indian restaurants. It may or may not be a wise decision to open Indian restaurants in top 5 or neighbourhoods where there are minimum 2 Indian restaurants as there is more competition. So even if someone has to open an Indian restaurant he should go either with <b>Amsterdam Oud-West or Frederik Hendrikbuurt</b> because there is only 1 Indian restaurant in each of these neighbourhoods.

### Lets analyze each neighbourhood in the top 10 list

In [37]:
# one hot encoding
venues_onehot = pd.get_dummies(top10_list[['VenueCategory']], prefix="", prefix_sep="")

# add Neighbourhoods column back to dataframe
venues_onehot['Neighbourhoods'] = top10_list['Neighbourhoods'] 

# move neighbourhoods column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Neighbourhoods,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Café,Caribbean Restaurant,Chinese Restaurant,Creperie,Deli / Bodega,Diner,Donut Shop,Dumpling Restaurant,Dutch Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food Court,Food Stand,Food Truck,French Restaurant,Friterie,Gastropub,German Restaurant,Greek Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Noodle House,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Portuguese Restaurant,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Steakhouse,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
57,Amsterdam Oud-West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
58,Amsterdam Oud-West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
59,Amsterdam Oud-West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
60,Amsterdam Oud-West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
61,Amsterdam Oud-West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [38]:
venues_onehot.shape

(826, 80)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [39]:
venues_grouped = venues_onehot.groupby('Neighbourhoods').mean().reset_index()
venues_grouped

Unnamed: 0,Neighbourhoods,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Café,Caribbean Restaurant,Chinese Restaurant,Creperie,Deli / Bodega,Diner,Donut Shop,Dumpling Restaurant,Dutch Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food Court,Food Stand,Food Truck,French Restaurant,Friterie,Gastropub,German Restaurant,Greek Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Noodle House,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Pizza Place,Portuguese Restaurant,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Steakhouse,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Amsterdam Oud-West,0.0,0.0,0.0,0.025974,0.0,0.012987,0.012987,0.025974,0.0,0.0,0.0,0.038961,0.025974,0.0,0.116883,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.025974,0.012987,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.012987,0.025974,0.0,0.077922,0.0,0.012987,0.0,0.0,0.025974,0.012987,0.012987,0.025974,0.0,0.012987,0.0,0.0,0.0,0.012987,0.038961,0.0,0.0,0.103896,0.0,0.025974,0.038961,0.012987,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.012987,0.012987,0.0,0.012987,0.025974,0.051948,0.012987
1,Burgwallen Oude Zijde,0.0,0.0,0.030928,0.020619,0.0,0.0,0.020619,0.051546,0.0,0.010309,0.0,0.030928,0.020619,0.0,0.092784,0.0,0.072165,0.0,0.041237,0.010309,0.020619,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.010309,0.0,0.030928,0.010309,0.020619,0.0,0.010309,0.0,0.030928,0.020619,0.010309,0.051546,0.0,0.0,0.0,0.020619,0.010309,0.010309,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.020619,0.0,0.010309,0.041237,0.0,0.041237,0.010309,0.020619,0.010309,0.010309,0.0,0.0,0.030928,0.030928,0.010309,0.010309,0.0,0.010309,0.041237,0.0,0.010309,0.0,0.0,0.0
2,De Pijp,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.05,0.0,0.01,0.0,0.03,0.04,0.01,0.06,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.05,0.01,0.01,0.0,0.0,0.0,0.03,0.03,0.0,0.06,0.05,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.06,0.03,0.04,0.02,0.0,0.0,0.03,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.01
3,De Wallen,0.0,0.0,0.030928,0.020619,0.0,0.0,0.020619,0.051546,0.0,0.010309,0.0,0.030928,0.020619,0.0,0.092784,0.0,0.072165,0.0,0.041237,0.010309,0.020619,0.0,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.010309,0.0,0.030928,0.010309,0.020619,0.0,0.010309,0.0,0.030928,0.020619,0.010309,0.051546,0.0,0.0,0.0,0.020619,0.010309,0.010309,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.020619,0.0,0.010309,0.041237,0.0,0.041237,0.010309,0.020619,0.010309,0.010309,0.0,0.0,0.030928,0.030928,0.010309,0.010309,0.0,0.010309,0.041237,0.0,0.010309,0.0,0.0,0.0
4,Frederik Hendrikbuurt,0.014706,0.0,0.014706,0.0,0.0,0.0,0.029412,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.014706,0.029412,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.014706,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.014706,0.0,0.014706,0.044118,0.0,0.132353,0.014706,0.0,0.0,0.0,0.014706,0.029412,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.029412,0.0,0.014706,0.058824,0.0,0.029412,0.058824,0.0,0.0,0.014706,0.0,0.029412,0.014706,0.029412,0.0,0.0,0.014706,0.014706,0.073529,0.0,0.0,0.0,0.014706,0.0
5,Hoofddorppleinbuurt,0.0,0.0,0.017544,0.052632,0.017544,0.017544,0.017544,0.070175,0.017544,0.0,0.0,0.052632,0.017544,0.0,0.087719,0.0,0.0,0.0,0.035088,0.035088,0.0,0.0,0.0,0.0,0.0,0.017544,0.035088,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.017544,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.0,0.035088,0.017544,0.017544,0.0,0.017544,0.0,0.0,0.017544,0.0,0.017544,0.035088,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.017544,0.017544,0.017544
6,Jodenbuurt,0.0,0.015385,0.0,0.046154,0.0,0.0,0.015385,0.015385,0.0,0.0,0.015385,0.061538,0.015385,0.0,0.123077,0.0,0.030769,0.0,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.030769,0.0,0.046154,0.015385,0.0,0.0,0.030769,0.015385,0.0,0.061538,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.015385,0.123077,0.015385,0.030769,0.0,0.0,0.015385,0.0,0.0,0.0,0.046154,0.030769,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.015385,0.015385,0.0
7,Jordaan,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.04,0.0,0.03,0.0,0.02,0.01,0.0,0.11,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.02,0.04,0.0,0.08,0.01,0.0,0.01,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.02,0.03,0.01,0.05,0.04,0.01,0.0,0.01,0.0,0.02,0.02,0.02,0.0,0.0,0.01,0.02,0.05,0.0,0.0,0.04,0.0,0.0
8,Oude Pijp,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.05,0.0,0.01,0.0,0.03,0.04,0.01,0.06,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.05,0.01,0.01,0.0,0.0,0.0,0.03,0.03,0.0,0.06,0.05,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.06,0.03,0.04,0.02,0.0,0.0,0.03,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.01
9,Trompbuurt,0.0,0.015385,0.0,0.046154,0.0,0.0,0.015385,0.015385,0.0,0.0,0.015385,0.061538,0.015385,0.0,0.123077,0.0,0.030769,0.0,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.030769,0.0,0.046154,0.015385,0.0,0.0,0.030769,0.015385,0.0,0.061538,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.015385,0.123077,0.015385,0.030769,0.0,0.0,0.015385,0.0,0.0,0.0,0.046154,0.030769,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.015385,0.015385,0.0


In [40]:
venues_grouped.shape

(10, 80)

#### Let's print each neighborhood along with the top 5 most common venues

In [41]:
num_top_venues = 5

for hood in venues_grouped['Neighbourhoods']:
    print("----"+hood+"----")
    temp = venues_grouped[venues_grouped['Neighbourhoods'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Amsterdam Oud-West----
                           venue  freq
0                           Café  0.12
1                     Restaurant  0.10
2             Italian Restaurant  0.08
3  Vegetarian / Vegan Restaurant  0.05
4                    Pizza Place  0.04


----Burgwallen Oude Zijde----
                venue  freq
0                Café  0.09
1  Chinese Restaurant  0.07
2  Italian Restaurant  0.05
3              Bakery  0.05
4     Thai Restaurant  0.04


----De Pijp----
                 venue  freq
0                 Café  0.06
1           Restaurant  0.06
2   Italian Restaurant  0.06
3  Japanese Restaurant  0.05
4    French Restaurant  0.05


----De Wallen----
                venue  freq
0                Café  0.09
1  Chinese Restaurant  0.07
2  Italian Restaurant  0.05
3              Bakery  0.05
4     Thai Restaurant  0.04


----Frederik Hendrikbuurt----
                venue  freq
0  Italian Restaurant  0.13
1     Thai Restaurant  0.07
2  Seafood Restaurant  0.06
3              

As you can see none of the Indian restaurants in each neighbourhood appear in the top 5 list of common venues. 

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [42]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [43]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhoods'] = venues_grouped['Neighbourhoods']

for ind in np.arange(venues_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(venues_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amsterdam Oud-West,Café,Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Pizza Place,Breakfast Spot,Seafood Restaurant,Bakery,Mediterranean Restaurant,Modern European Restaurant
1,Burgwallen Oude Zijde,Café,Chinese Restaurant,Italian Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Restaurant,Sandwich Place,Steakhouse,Indian Restaurant
2,De Pijp,Café,Restaurant,Italian Restaurant,French Restaurant,Japanese Restaurant,Bakery,Vegetarian / Vegan Restaurant,Sandwich Place,Thai Restaurant,Mediterranean Restaurant
3,De Wallen,Café,Chinese Restaurant,Italian Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Restaurant,Sandwich Place,Steakhouse,Indian Restaurant
4,Frederik Hendrikbuurt,Italian Restaurant,Thai Restaurant,Restaurant,Café,Seafood Restaurant,Indonesian Restaurant,Bagel Shop,Bakery,Bistro,Spanish Restaurant


In [44]:
amsterdam_grouped_indian = venues_grouped[["Neighbourhoods","Indian Restaurant"]]

In [45]:
amsterdam_grouped_indian

Unnamed: 0,Neighbourhoods,Indian Restaurant
0,Amsterdam Oud-West,0.012987
1,Burgwallen Oude Zijde,0.030928
2,De Pijp,0.03
3,De Wallen,0.030928
4,Frederik Hendrikbuurt,0.014706
5,Hoofddorppleinbuurt,0.0
6,Jodenbuurt,0.030769
7,Jordaan,0.02
8,Oude Pijp,0.03
9,Trompbuurt,0.030769


## Cluster Neighborhoods

In [46]:
# set number of clusters
kclusters = 5

amsterdam_grouped_clustering = amsterdam_grouped_indian.drop('Neighbourhoods', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(amsterdam_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 1, 1, 1, 2, 0, 1, 3, 1, 1], dtype=int32)

In [48]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
amsterdam_merged = amsterdam_grouped_indian.copy()

# add clustering labels
amsterdam_merged["Cluster Labels"] = kmeans.labels_

# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
amsterdam_merged = amsterdam_merged.join(amsterdam_df.set_index("Neighbourhoods"), on="Neighbourhoods")

amsterdam_merged.rename(columns={"Neighbourhoods": "Neighbourhoods"}, inplace=True)
amsterdam_merged.head()

Unnamed: 0,Neighbourhoods,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,Amsterdam Oud-West,0.012987,4,52.36539,4.87022
1,Burgwallen Oude Zijde,0.030928,1,52.37169,4.89724
2,De Pijp,0.03,1,52.35625,4.89057
3,De Wallen,0.030928,1,52.37169,4.89724
4,Frederik Hendrikbuurt,0.014706,2,52.378646,4.877719


In [56]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(amsterdam_merged['Latitude'], amsterdam_merged['Longitude'], amsterdam_merged['Neighbourhoods'], amsterdam_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

### cluster 0

In [51]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighbourhoods,Indian Restaurant,Cluster Labels,Latitude,Longitude
5,Hoofddorppleinbuurt,0.0,0,52.351592,4.850202


### cluster 1

In [52]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighbourhoods,Indian Restaurant,Cluster Labels,Latitude,Longitude
1,Burgwallen Oude Zijde,0.030928,1,52.37169,4.89724
2,De Pijp,0.03,1,52.35625,4.89057
3,De Wallen,0.030928,1,52.37169,4.89724
6,Jodenbuurt,0.030769,1,52.363,4.88436
8,Oude Pijp,0.03,1,52.35625,4.89057
9,Trompbuurt,0.030769,1,52.363,4.88436


### cluster 2

In [53]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighbourhoods,Indian Restaurant,Cluster Labels,Latitude,Longitude
4,Frederik Hendrikbuurt,0.014706,2,52.378646,4.877719


### cluster 3

In [54]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 3]

Unnamed: 0,Neighbourhoods,Indian Restaurant,Cluster Labels,Latitude,Longitude
7,Jordaan,0.02,3,52.37687,4.87927


### cluster 4

In [55]:
amsterdam_merged.loc[amsterdam_merged['Cluster Labels'] == 4]

Unnamed: 0,Neighbourhoods,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,Amsterdam Oud-West,0.012987,4,52.36539,4.87022


Most of the Indian restaurants are located in cluster 1, whereas there are no Indian restaurants in cluster 0,2,3,4 in the top 10 venues in top 10 neighbourhoods. In my first level analysis, where I had said if someone has to open an Indian restaurant they can open in the following areas: Amsterdam Oud-West, Frederik Hendrikbuurt or Hoofddorppleinbuurt because there are either 0 or 1 Indian restaurant in each of these neighbourhoods (check above, no k-means clustering used). You can even add Jordaan to it but I wont go with because it already has 2 Indian restaurants. I would rather go with Amsterdam Oud-West as it has lowest k-means amonsgt all the clusters and is closer to Amsterdam train st