### Dear Reader - Unfortunately Folium Maps are not natively accessible through GITHUB. If you want to see the maps rendered on this page please follow this link to the Notebook on IBM Cloud:
https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/66379341-1798-49e5-aa5c-23f47289194e/view?access_token=b4656ccc0634ec85d38a537db99dc9a3017bef7c3df87dca3c3ca5733227ba6c

## Import Libraries

In [1]:
import pandas as pd
import numpy as np

In [8]:
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0           conda-forge
    geopy:          

#### FourSquare Credentials and Limit + Radius

In [193]:
# Removed For Security

Define the API Request Function

In [196]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## New York

In [27]:
# read source data on GITHUB
ny = pd.read_html('https://github.com/alecdavo/Coursera_Project/blob/master/nyzip.csv')[0]
ny.head()

Unnamed: 0.1,Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Lat,Long
0,,10001,New York,NY,40.750742,-73.99653,-5,1,40.750742,-73.99653
1,,10002,New York,NY,40.71704,-73.987,-5,1,40.71704,-73.987
2,,10003,New York,NY,40.732509,-73.98935,-5,1,40.732509,-73.98935
3,,10005,New York,NY,40.706019,-74.00858,-5,1,40.706019,-74.00858
4,,10006,New York,NY,40.707904,-74.01342,-5,1,40.707904,-74.01342


In [28]:
#drop unnecessary columns
dfny = ny.drop(['Lat', 'Long', 'Timezone', 'Unnamed: 0', 'Daylight savings time flag' ], axis=1)
dfny.head()


Unnamed: 0,Zip,City,State,Latitude,Longitude
0,10001,New York,NY,40.750742,-73.99653
1,10002,New York,NY,40.71704,-73.987
2,10003,New York,NY,40.732509,-73.98935
3,10005,New York,NY,40.706019,-74.00858
4,10006,New York,NY,40.707904,-74.01342


In [29]:
# define the zip field as text - string
dfny['Zip'] = dfny['Zip'].astype(str)

In [30]:
# create geolocator
nyaddress = 'New York City, NY'

geolocator = Nominatim(user_agent="New York")
location = geolocator.geocode(nyaddress)
latitude_ny = location.latitude
longitude_ny = location.longitude

In [31]:
# create map
map_newyork = folium.Map(location=[latitude_ny, longitude_ny], zoom_start=11)

# add markers to map
for lat, lng in zip(dfny['Latitude'], dfny['Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [34]:
# find venues
ny_venues = getNearbyVenues(names=dfny['Zip'],
                                   latitudes=dfny['Latitude'],
                                   longitudes=dfny['Longitude']
                                  )

10001
10002
10003
10005
10006
10007
10008
10009
10010
10011
10012
10013
10014
10015
10016
10017
10018
10019
10020
10021
10022
10023
10024
10025
10026
10027
10028
10029
10030
10031
10032
10033
10034
10035
10036
10037
10038
10039
10040
10041
10043
10044
10045
10046
10047
10048
10055
10060
10069
10072
10079
10080
10081
10082
10087
10090
10094
10095
10096
10098
10099
10101
10102
10103
10104
10105
10106
10107
10108
10109
10110
10111
10112
10113
10114
10115
10116
10117
10118
10119
10120
10121
10122
10123
10124
10125
10126
10128
10129
10130
10131
10132
10133
10138
10149
10150
10151
10152
10153
10154
10155
10156
10157
10158
10159
10160
10161
10162
10163
10164
10165
10166
10167
10168
10169
10170
10171
10172
10173
10174
10175
10176
10177
10178
10179
10184
10185
10196
10197
10199
10203
10211
10212
10213
10242
10249
10250
10256
10257
10258
10259
10260
10261
10265
10268
10269
10270
10271
10272
10273
10274
10275
10276
10277
10278
10279
10280
10281
10282
10285
10286
10292
10422
11286
11302
11517


In [35]:
# summarise count of venues in each neighbourhood
ny_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
10001,72,72,72,72,72,72
10002,100,100,100,100,100,100
10003,78,78,78,78,78,78
10005,100,100,100,100,100,100
10006,58,58,58,58,58,58
10007,74,74,74,74,74,74
10008,70,70,70,70,70,70
10009,73,73,73,73,73,73
10010,82,82,82,82,82,82
10011,69,69,69,69,69,69


In [36]:
# one hot encoding
ny_onehot = pd.get_dummies(ny_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ny_onehot['Neighbourhood'] = ny_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [ny_onehot.columns[-1]] + list(ny_onehot.columns[:-1])
ny_onehot = ny_onehot[fixed_columns]

In [37]:
# display percentage based data
ny_grouped = ny_onehot.groupby('Neighbourhood').mean().reset_index()
ny_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,...,Thai Restaurant,Theme Restaurant,Tonkatsu Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
0,10001,0.000000,0.013889,0.055556,0.000000,0.013889,0.000000,0.000000,0.00,0.000000,...,0.041667,0.000000,0.0,0.013889,0.000000,0.000000,0.013889,0.00,0.000000,0.000000
1,10002,0.000000,0.000000,0.040000,0.000000,0.010000,0.070000,0.020000,0.01,0.000000,...,0.010000,0.000000,0.0,0.000000,0.000000,0.000000,0.030000,0.01,0.020000,0.000000
2,10003,0.000000,0.000000,0.051282,0.000000,0.000000,0.012821,0.012821,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.012821,0.038462,0.00,0.012821,0.000000
3,10005,0.000000,0.000000,0.080000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.010000,0.000000,0.0,0.000000,0.000000,0.000000,0.010000,0.00,0.000000,0.000000
4,10006,0.000000,0.000000,0.086207,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000
5,10007,0.000000,0.000000,0.054054,0.000000,0.000000,0.013514,0.013514,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.00,0.013514,0.027027
6,10008,0.000000,0.000000,0.042857,0.000000,0.000000,0.014286,0.000000,0.00,0.000000,...,0.057143,0.000000,0.0,0.000000,0.000000,0.000000,0.014286,0.00,0.028571,0.000000
7,10009,0.000000,0.000000,0.041096,0.000000,0.013699,0.013699,0.000000,0.00,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.041096,0.00,0.013699,0.000000
8,10010,0.000000,0.000000,0.073171,0.000000,0.000000,0.000000,0.000000,0.00,0.012195,...,0.048780,0.000000,0.0,0.012195,0.000000,0.000000,0.012195,0.00,0.036585,0.000000
9,10011,0.000000,0.000000,0.101449,0.000000,0.000000,0.000000,0.000000,0.00,0.014493,...,0.043478,0.000000,0.0,0.000000,0.000000,0.000000,0.014493,0.00,0.000000,0.000000


In [38]:
# show top 5 venues in each neighbourhood
num_top_venues = 5

for hood in ny_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = ny_grouped[ny_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----10001----
                 venue  freq
0                 Café  0.08
1          Salad Place  0.07
2        Deli / Bodega  0.07
3         Burger Joint  0.06
4  American Restaurant  0.06


----10002----
                venue  freq
0  Mexican Restaurant  0.08
1    Asian Restaurant  0.07
2         Pizza Place  0.06
3                Café  0.06
4  Chinese Restaurant  0.05


----10003----
                      venue  freq
0       Japanese Restaurant  0.08
1               Pizza Place  0.08
2        Italian Restaurant  0.06
3  Mediterranean Restaurant  0.06
4       American Restaurant  0.05


----10005----
                 venue  freq
0       Sandwich Place  0.09
1  American Restaurant  0.08
2        Deli / Bodega  0.06
3          Pizza Place  0.06
4          Salad Place  0.06


----10006----
                 venue  freq
0          Pizza Place  0.12
1  American Restaurant  0.09
2   Mexican Restaurant  0.07
3           Steakhouse  0.07
4                 Café  0.05


----10007----
            

                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10047----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10048----
                 venue  freq
0       Sandwich Place  0.08
1          Pizza Place  0.08
2         Burger Joint  0.07
3   Italian Restaurant  0.05
4  American Restaurant  0.05


----10055----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10060----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Th

                  venue  freq
0           Pizza Place  0.11
1  Fast Food Restaurant  0.06
2        Sandwich Place  0.06
3                Bakery  0.06
4           Salad Place  0.05


----10124----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10125----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10126----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10128----
                venue  freq
0         Pizza Place  0.12
1  Italian Restaurant  0.07
2    Sushi Restaurant  0.07
3                Café  0.06
4  M

                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10184----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10185----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10196----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                Restaurant  0.06
3           Thai Restaurant  0.06
4  Mediterranean Restaurant  0.04


----10197----
                      venue  freq
0        Italian Restaurant  0.09
1                    Bakery  0.09
2                R

                 venue  freq
0           Food Truck  0.33
1  American Restaurant  0.33
2                Diner  0.33
3         Noodle House  0.00
4           Restaurant  0.00


----11517----
                      venue  freq
0        Italian Restaurant  0.08
1                Taco Place  0.08
2                 BBQ Joint  0.08
3  Mediterranean Restaurant  0.04
4          Cuban Restaurant  0.04




In [39]:
# create function that returns most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
ny_venues_sorted = pd.DataFrame(columns=columns)
ny_venues_sorted['Neighbourhood'] = ny_grouped['Neighbourhood']

for ind in np.arange(ny_grouped.shape[0]):
    ny_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ny_grouped.iloc[ind, :], num_top_venues)

ny_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10001,Café,Salad Place,Deli / Bodega,American Restaurant,Sandwich Place,Burger Joint,Pizza Place,Thai Restaurant,French Restaurant,Spanish Restaurant
1,10002,Mexican Restaurant,Asian Restaurant,Pizza Place,Café,Bakery,Chinese Restaurant,Japanese Restaurant,American Restaurant,Ramen Restaurant,Sandwich Place
2,10003,Pizza Place,Japanese Restaurant,Mediterranean Restaurant,Italian Restaurant,American Restaurant,Mexican Restaurant,Bagel Shop,Vegetarian / Vegan Restaurant,Salad Place,Chinese Restaurant
3,10005,Sandwich Place,American Restaurant,Deli / Bodega,Pizza Place,Salad Place,Mexican Restaurant,Food Truck,Italian Restaurant,Café,New American Restaurant
4,10006,Pizza Place,American Restaurant,Steakhouse,Mexican Restaurant,Café,Food Truck,Donut Shop,Restaurant,Bakery,Italian Restaurant


In [53]:
# set number of clusters
kclusters = 10

ny_grouped_clustering = ny_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ny_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 5, 0, 0, 0, 8, 1, 5, 1, 8], dtype=int32)

In [54]:
# add clustering labels
# ny_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ny_merged = dfny

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
ny_merged = ny_merged.join(ny_venues_sorted.set_index('Neighbourhood'), on='Zip')

ny_merged.head()


Unnamed: 0,Zip,City,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10001,New York,NY,40.750742,-73.99653,0,Café,Salad Place,Deli / Bodega,American Restaurant,Sandwich Place,Burger Joint,Pizza Place,Thai Restaurant,French Restaurant,Spanish Restaurant
1,10002,New York,NY,40.71704,-73.987,5,Mexican Restaurant,Asian Restaurant,Pizza Place,Café,Bakery,Chinese Restaurant,Japanese Restaurant,American Restaurant,Ramen Restaurant,Sandwich Place
2,10003,New York,NY,40.732509,-73.98935,0,Pizza Place,Japanese Restaurant,Mediterranean Restaurant,Italian Restaurant,American Restaurant,Mexican Restaurant,Bagel Shop,Vegetarian / Vegan Restaurant,Salad Place,Chinese Restaurant
3,10005,New York,NY,40.706019,-74.00858,0,Sandwich Place,American Restaurant,Deli / Bodega,Pizza Place,Salad Place,Mexican Restaurant,Food Truck,Italian Restaurant,Café,New American Restaurant
4,10006,New York,NY,40.707904,-74.01342,0,Pizza Place,American Restaurant,Steakhouse,Mexican Restaurant,Café,Food Truck,Donut Shop,Restaurant,Bakery,Italian Restaurant


In [56]:
# create map
ny_map_clusters = folium.Map(location=[latitude_ny, longitude_ny], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, zipcode, cluster in zip(ny_merged['Latitude'], ny_merged['Longitude'], ny_merged['Zip'], ny_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(ny_map_clusters)
       
ny_map_clusters

## Yo! Sushi Locations

In [57]:
# read source data on GITHUB
london = pd.read_html('https://github.com/alecdavo/Coursera_Project/blob/master/Yo.csv')[0]
london.head()

Unnamed: 0.1,Unnamed: 0,Name,Postcode,Latitude,Longitude
0,,YO! London Harvey Nichols,SW1X 7RJ,51.501584,-0.159805
1,,YO! Fulham Broadway,SW6 1BW,51.480416,-0.194515
2,,YO! Southbank Centre Festival Hall,SE1 8XX,51.505767,-0.116825
3,,YO! Bond Street,W1C 2AQ,51.514054,-0.147643
4,,YO! Russell Square,WC1N 1AE,51.52418,-0.123922


In [59]:
# remove unnecessary column
dflondon = london.drop(['Unnamed: 0' ], axis=1)
dflondon.head()


Unnamed: 0,Name,Postcode,Latitude,Longitude
0,YO! London Harvey Nichols,SW1X 7RJ,51.501584,-0.159805
1,YO! Fulham Broadway,SW6 1BW,51.480416,-0.194515
2,YO! Southbank Centre Festival Hall,SE1 8XX,51.505767,-0.116825
3,YO! Bond Street,W1C 2AQ,51.514054,-0.147643
4,YO! Russell Square,WC1N 1AE,51.52418,-0.123922


In [67]:
# create geolocator
londonaddress = 'London, GB'

geolocator = Nominatim(user_agent="London")
location = geolocator.geocode(londonaddress)
latitude_london = location.latitude
longitude_london = location.longitude

In [68]:
map_london = folium.Map(location=[latitude_london, longitude_london], zoom_start=11)

# add markers to map
for lat, lng in zip(dflondon['Latitude'], dflondon['Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [64]:
# find venues in proximity to each YO! Sushi Restaurant
london_venues = getNearbyVenues(names=dflondon['Name'],
                                   latitudes=dflondon['Latitude'],
                                   longitudes=dflondon['Longitude']
                                  )

YO! London Harvey Nichols
YO! Fulham Broadway
YO! Southbank Centre Festival Hall
YO! Bond Street
YO! Russell Square
YO! London Selfridges
YO! St Pancras Station
YO! Waterloo Station
YO! Victoria Station
YO! St Paul's
YO! Finchley Road
YO! Westfield Stratford
YO! Brent Cross
YO! Kitchen Westfield White City
YO! Bromley
YO! Richmond
YO! Kingston
YO! Heathrow T3
YO!


In [70]:
# group found food establishments by neighbourhood
london_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
YO!,15,15,15,15,15,15
YO! Bond Street,100,100,100,100,100,100
YO! Brent Cross,16,16,16,16,16,16
YO! Bromley,36,36,36,36,36,36
YO! Finchley Road,35,35,35,35,35,35
YO! Fulham Broadway,37,37,37,37,37,37
YO! Heathrow T3,11,11,11,11,11,11
YO! Kingston,74,74,74,74,74,74
YO! Kitchen Westfield White City,65,65,65,65,65,65
YO! London Harvey Nichols,51,51,51,51,51,51


In [71]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Neighbourhood'] = london_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

In [94]:
# show percentage based data
london_grouped = london_onehot.groupby('Neighbourhood').mean().reset_index()
london_grouped

Unnamed: 0,Neighbourhood,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,...,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,YO!,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.133333,...,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,YO! Bond Street,0.0,0.01,0.03,0.0,0.0,0.0,0.06,0.0,0.0,...,0.01,0.03,0.04,0.0,0.0,0.01,0.02,0.01,0.0,0.0
2,YO! Brent Cross,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,YO! Bromley,0.0,0.0,0.027778,0.0,0.0,0.0,0.055556,0.027778,0.0,...,0.0,0.0,0.055556,0.0,0.0,0.0,0.027778,0.0,0.0,0.0
4,YO! Finchley Road,0.0,0.0,0.057143,0.0,0.0,0.0,0.057143,0.0,0.0,...,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0
5,YO! Fulham Broadway,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,...,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0
6,YO! Heathrow T3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,YO! Kingston,0.0,0.0,0.027027,0.0,0.013514,0.0,0.067568,0.0,0.0,...,0.0,0.0,0.054054,0.0,0.0,0.054054,0.013514,0.013514,0.013514,0.0
8,YO! Kitchen Westfield White City,0.0,0.0,0.015385,0.0,0.0,0.0,0.061538,0.0,0.0,...,0.0,0.015385,0.046154,0.0,0.015385,0.015385,0.015385,0.0,0.0,0.0
9,YO! London Harvey Nichols,0.0,0.0,0.019608,0.0,0.019608,0.0,0.039216,0.0,0.019608,...,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [95]:
# show top 5 venue types in each neighbourhood
num_top_venues = 5

for hood in london_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----YO!----
                venue  freq
0          Restaurant  0.20
1  Italian Restaurant  0.13
2              Bistro  0.13
3      Sandwich Place  0.13
4                Café  0.07


----YO! Bond Street----
                venue  freq
0                Café  0.09
1  Italian Restaurant  0.08
2   French Restaurant  0.08
3              Bakery  0.06
4        Burger Joint  0.06


----YO! Brent Cross----
                   venue  freq
0                   Café  0.25
1   Fast Food Restaurant  0.19
2       Asian Restaurant  0.06
3  Portuguese Restaurant  0.06
4          Deli / Bodega  0.06


----YO! Bromley----
                venue  freq
0                Café  0.14
1         Pizza Place  0.08
2      Sandwich Place  0.08
3  Italian Restaurant  0.06
4  Mexican Restaurant  0.06


----YO! Finchley Road----
                venue  freq
0                Café  0.20
1         Pizza Place  0.11
2  Italian Restaurant  0.09
3              Bakery  0.06
4  Chinese Restaurant  0.06


----YO! Fulham Broadway---

In [97]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
london_venues_sorted = pd.DataFrame(columns=columns)
london_venues_sorted['Neighbourhood'] = london_grouped['Neighbourhood']

for ind in np.arange(london_grouped.shape[0]):
    london_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

london_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,YO!,Restaurant,Italian Restaurant,Bistro,Sandwich Place,Asian Restaurant,Sushi Restaurant,Bakery,Café,Seafood Restaurant,Fast Food Restaurant
1,YO! Bond Street,Café,Italian Restaurant,French Restaurant,Bakery,Burger Joint,Sandwich Place,English Restaurant,Deli / Bodega,Sushi Restaurant,Restaurant
2,YO! Brent Cross,Café,Fast Food Restaurant,Portuguese Restaurant,Deli / Bodega,Donut Shop,Sandwich Place,Burger Joint,Snack Place,Pizza Place,Sushi Restaurant
3,YO! Bromley,Café,Pizza Place,Sandwich Place,Italian Restaurant,Bakery,Mexican Restaurant,Burger Joint,Fast Food Restaurant,Sushi Restaurant,Irish Pub
4,YO! Finchley Road,Café,Pizza Place,Italian Restaurant,Asian Restaurant,Chinese Restaurant,Japanese Restaurant,Bakery,Restaurant,Sandwich Place,Diner


In [98]:
# set number of clusters
kclusters = 10

london_grouped_clustering = london_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 6, 3, 0, 5, 7, 1, 0, 8, 6], dtype=int32)

In [100]:
# add clustering labels
london_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = dflondon

london_merged = london_merged.join(london_venues_sorted.set_index('Neighbourhood'), on='Name')

london_merged.head()


Unnamed: 0,Name,Postcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,YO! London Harvey Nichols,SW1X 7RJ,51.501584,-0.159805,6,Italian Restaurant,Café,Restaurant,French Restaurant,Steakhouse,Japanese Restaurant,Middle Eastern Restaurant,Gastropub,Bakery,BBQ Joint
1,YO! Fulham Broadway,SW6 1BW,51.480416,-0.194515,7,Pizza Place,Gastropub,Café,Burger Joint,Fast Food Restaurant,Portuguese Restaurant,Sandwich Place,French Restaurant,Food Court,Japanese Restaurant
2,YO! Southbank Centre Festival Hall,SE1 8XX,51.505767,-0.116825,6,Restaurant,Italian Restaurant,Sandwich Place,Café,Sushi Restaurant,Bakery,Asian Restaurant,Fish & Chips Shop,Indian Restaurant,Food Truck
3,YO! Bond Street,W1C 2AQ,51.514054,-0.147643,6,Café,Italian Restaurant,French Restaurant,Bakery,Burger Joint,Sandwich Place,English Restaurant,Deli / Bodega,Sushi Restaurant,Restaurant
4,YO! Russell Square,WC1N 1AE,51.52418,-0.123922,5,Café,Sandwich Place,Italian Restaurant,Bakery,Pizza Place,Indian Restaurant,Deli / Bodega,Fish & Chips Shop,Greek Restaurant,French Restaurant


In [101]:
# create map
london_map_clusters = folium.Map(location=[latitude_london, longitude_london], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighbourhood, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Name'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(london_map_clusters)
       
london_map_clusters

## London and New York

In [124]:
dfny.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude
0,10001,New York,NY,40.750742,-73.99653
1,10002,New York,NY,40.71704,-73.987
2,10003,New York,NY,40.732509,-73.98935
3,10005,New York,NY,40.706019,-74.00858
4,10006,New York,NY,40.707904,-74.01342


In [125]:
dflondon.head()

Unnamed: 0,Name,Postcode,Latitude,Longitude
0,YO! London Harvey Nichols,SW1X 7RJ,51.501584,-0.159805
1,YO! Fulham Broadway,SW6 1BW,51.480416,-0.194515
2,YO! Southbank Centre Festival Hall,SE1 8XX,51.505767,-0.116825
3,YO! Bond Street,W1C 2AQ,51.514054,-0.147643
4,YO! Russell Square,WC1N 1AE,51.52418,-0.123922


In [130]:
# amend new york data to fit london
dfnytemp = dfny.drop(['City', 'State'], axis = 1)
dfnytemp = dfnytemp.rename(columns= {"Zip": "Name"})
dfnytemp.head()

Unnamed: 0,Name,Latitude,Longitude
0,10001,40.750742,-73.99653
1,10002,40.71704,-73.987
2,10003,40.732509,-73.98935
3,10005,40.706019,-74.00858
4,10006,40.707904,-74.01342


In [133]:
# merge two datasets
DFLDNY = dflondon.append(dfnytemp, sort=False)
DFLDNY.dtypes

Name          object
Postcode      object
Latitude     float64
Longitude    float64
dtype: object

In [138]:
# find venues in proximity
LDNY_venues = getNearbyVenues(names=DFLDNY['Name'],
                                   latitudes=DFLDNY['Latitude'],
                                   longitudes=DFLDNY['Longitude']
                                  )

YO! London Harvey Nichols
YO! Fulham Broadway
YO! Southbank Centre Festival Hall
YO! Bond Street
YO! Russell Square
YO! London Selfridges
YO! St Pancras Station
YO! Waterloo Station
YO! Victoria Station
YO! St Paul's
YO! Finchley Road
YO! Westfield Stratford
YO! Brent Cross
YO! Kitchen Westfield White City
YO! Bromley
YO! Richmond
YO! Kingston
YO! Heathrow T3
YO!
10001
10002
10003
10005
10006
10007
10008
10009
10010
10011
10012
10013
10014
10015
10016
10017
10018
10019
10020
10021
10022
10023
10024
10025
10026
10027
10028
10029
10030
10031
10032
10033
10034
10035
10036
10037
10038
10039
10040
10041
10043
10044
10045
10046
10047
10048
10055
10060
10069
10072
10079
10080
10081
10082
10087
10090
10094
10095
10096
10098
10099
10101
10102
10103
10104
10105
10106
10107
10108
10109
10110
10111
10112
10113
10114
10115
10116
10117
10118
10119
10120
10121
10122
10123
10124
10125
10126
10128
10129
10130
10131
10132
10133
10138
10149
10150
10151
10152
10153
10154
10155
10156
10157
10158
10159
1016

In [139]:
# show counts of each venue found by neighbourhood
LDNY_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
10001,72,72,72,72,72,72
10002,100,100,100,100,100,100
10003,78,78,78,78,78,78
10005,100,100,100,100,100,100
10006,58,58,58,58,58,58
10007,74,74,74,74,74,74
10008,70,70,70,70,70,70
10009,73,73,73,73,73,73
10010,82,82,82,82,82,82
10011,69,69,69,69,69,69


In [140]:
# one hot encoding
LDNY_onehot = pd.get_dummies(LDNY_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
LDNY_onehot['Neighbourhood'] = LDNY_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [LDNY_onehot.columns[-1]] + list(LDNY_onehot.columns[:-1])
LDNY_onehot = LDNY_onehot[fixed_columns]

In [147]:
# show percentage based
LDNY_grouped = LDNY_onehot.groupby('Neighbourhood').mean().reset_index()
LDNY_grouped.tail()

Unnamed: 0,Neighbourhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,...,Thai Restaurant,Theme Restaurant,Tonkatsu Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
180,YO! St Pancras Station,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,...,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
181,YO! St Paul's,0.0,0.0,0.015152,0.0,0.0,0.030303,0.0,0.0,0.0,...,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.060606,0.0
182,YO! Victoria Station,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,...,0.03,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0
183,YO! Waterloo Station,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.011111,...,0.011111,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.011111,0.0
184,YO! Westfield Stratford,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,...,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [148]:
# show top 5 venue types by neighbourhood
num_top_venues = 5

for hood in LDNY_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = LDNY_grouped[LDNY_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----10001----
                 venue  freq
0                 Café  0.08
1          Salad Place  0.07
2        Deli / Bodega  0.07
3  American Restaurant  0.06
4       Sandwich Place  0.06


----10002----
                venue  freq
0  Mexican Restaurant  0.08
1    Asian Restaurant  0.07
2         Pizza Place  0.06
3                Café  0.06
4  Chinese Restaurant  0.05


----10003----
                      venue  freq
0       Japanese Restaurant  0.08
1               Pizza Place  0.08
2        Italian Restaurant  0.06
3  Mediterranean Restaurant  0.06
4       American Restaurant  0.05


----10005----
                 venue  freq
0       Sandwich Place  0.09
1  American Restaurant  0.08
2          Pizza Place  0.06
3        Deli / Bodega  0.06
4          Salad Place  0.06


----10006----
                 venue  freq
0          Pizza Place  0.12
1  American Restaurant  0.09
2   Mexican Restaurant  0.07
3           Steakhouse  0.07
4                 Café  0.05


----10007----
            

                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10047----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10048----
               venue  freq
0        Pizza Place  0.08
1     Sandwich Place  0.08
2       Burger Joint  0.07
3  French Restaurant  0.05
4               Café  0.05


----10055----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10060----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10069----
                venue  freq
0                Café  0.21
1          Food Truck  0.21
2 

                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10130----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10131----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10132----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10133----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10138----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0

                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10242----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10249----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10250----
                venue  freq
0         Pizza Place  0.13
1          Food Truck  0.07
2      Sandwich Place  0.07
3  Chinese Restaurant  0.06
4              Bakery  0.06


----10256----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0.09
2     Thai Restaurant  0.06
3          Restaurant  0.06
4  Chinese Restaurant  0.04


----10257----
                venue  freq
0              Bakery  0.09
1  Italian Restaurant  0

                venue  freq
0  Italian Restaurant  0.14
1                Café  0.12
2              Bakery  0.10
3          Restaurant  0.10
4    Sushi Restaurant  0.07


----YO! Russell Square----
               venue  freq
0               Café  0.23
1     Sandwich Place  0.09
2  Indian Restaurant  0.06
3      Deli / Bodega  0.06
4        Pizza Place  0.06


----YO! Southbank Centre Festival Hall----
                venue  freq
0          Restaurant  0.09
1  Italian Restaurant  0.08
2      Sandwich Place  0.08
3                Café  0.07
4    Sushi Restaurant  0.05


----YO! St Pancras Station----
                venue  freq
0                Café  0.12
1  Italian Restaurant  0.10
2      Sandwich Place  0.07
3        Burger Joint  0.07
4      Breakfast Spot  0.07


----YO! St Paul's----
                   venue  freq
0     Italian Restaurant  0.14
1                   Café  0.11
2         Sandwich Place  0.11
3    Japanese Restaurant  0.08
4  Vietnamese Restaurant  0.06


----YO! Victori

In [150]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
LDNY_venues_sorted = pd.DataFrame(columns=columns)
LDNY_venues_sorted['Neighbourhood'] = LDNY_grouped['Neighbourhood']

for ind in np.arange(LDNY_grouped.shape[0]):
    LDNY_venues_sorted.iloc[ind, 1:] = return_most_common_venues(LDNY_grouped.iloc[ind, :], num_top_venues)

LDNY_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10001,Café,Deli / Bodega,Salad Place,American Restaurant,Sandwich Place,Burger Joint,Thai Restaurant,Pizza Place,Bagel Shop,Restaurant
1,10002,Mexican Restaurant,Asian Restaurant,Pizza Place,Café,Chinese Restaurant,Bakery,Ramen Restaurant,Sandwich Place,Japanese Restaurant,American Restaurant
2,10003,Japanese Restaurant,Pizza Place,Italian Restaurant,Mediterranean Restaurant,American Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant,Chinese Restaurant,Salad Place
3,10005,Sandwich Place,American Restaurant,Salad Place,Pizza Place,Deli / Bodega,Food Truck,Italian Restaurant,Mexican Restaurant,Café,New American Restaurant
4,10006,Pizza Place,American Restaurant,Mexican Restaurant,Steakhouse,Café,Food Truck,Falafel Restaurant,Sandwich Place,Salad Place,Deli / Bodega


In [152]:
# set number of clusters
kclusters = 10

LDNY_grouped_clustering = LDNY_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(LDNY_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 0, 1, 1, 8, 2, 3, 0, 0], dtype=int32)

In [158]:
# add clustering labels
LDNY_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

LDNY_merged = DFLDNY

LDNY_merged = LDNY_merged.join(LDNY_venues_sorted.set_index('Neighbourhood'), on='Name')

LDNY_merged = LDNY_merged.drop(['Postcode'], axis = 1)


In [154]:
# create map
LDNY_map_clusters = folium.Map(location=[latitude_ny, longitude_ny], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighbourhood, cluster in zip(LDNY_merged['Latitude'], LDNY_merged['Longitude'], LDNY_merged['Name'], LDNY_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(LDNY_map_clusters)
       
LDNY_map_clusters

In [155]:
# create map
LDNY_map_clusters = folium.Map(location=[latitude_london, longitude_london], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighbourhood, cluster in zip(LDNY_merged['Latitude'], LDNY_merged['Longitude'], LDNY_merged['Name'], LDNY_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(LDNY_map_clusters)
       
LDNY_map_clusters

In [159]:
LDNY_merged


Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,YO! London Harvey Nichols,51.501584,-0.159805,4,Italian Restaurant,Café,Restaurant,French Restaurant,Bakery,Steakhouse,Middle Eastern Restaurant,Japanese Restaurant,Gastropub,English Restaurant
1,YO! Fulham Broadway,51.480416,-0.194515,3,Pizza Place,Gastropub,Café,Sandwich Place,Burger Joint,Fast Food Restaurant,Portuguese Restaurant,Asian Restaurant,Brazilian Restaurant,Modern European Restaurant
2,YO! Southbank Centre Festival Hall,51.505767,-0.116825,4,Restaurant,Italian Restaurant,Sandwich Place,Café,Bakery,Sushi Restaurant,Indian Restaurant,Fish & Chips Shop,Asian Restaurant,Steakhouse
3,YO! Bond Street,51.514054,-0.147643,4,Café,French Restaurant,Italian Restaurant,Burger Joint,Bakery,Sandwich Place,Restaurant,English Restaurant,Sushi Restaurant,Deli / Bodega
4,YO! Russell Square,51.524180,-0.123922,4,Café,Sandwich Place,Pizza Place,Deli / Bodega,Italian Restaurant,Indian Restaurant,Bakery,Bistro,Restaurant,Chinese Restaurant
5,YO! London Selfridges,51.514586,-0.152839,4,Italian Restaurant,Café,Restaurant,French Restaurant,Sandwich Place,Middle Eastern Restaurant,Burger Joint,Japanese Restaurant,Chinese Restaurant,Bakery
6,YO! St Pancras Station,51.531428,-0.126147,4,Café,Italian Restaurant,Breakfast Spot,Sandwich Place,Burger Joint,Sushi Restaurant,Pizza Place,Gastropub,English Restaurant,Modern European Restaurant
7,YO! Waterloo Station,51.503146,-0.113259,4,Sandwich Place,Bakery,Restaurant,Café,Italian Restaurant,Burger Joint,Japanese Restaurant,Breakfast Spot,Korean Restaurant,Fish & Chips Shop
8,YO! Victoria Station,51.494974,-0.144354,4,Italian Restaurant,Café,Sandwich Place,Restaurant,Sushi Restaurant,Bakery,Indian Restaurant,Mediterranean Restaurant,English Restaurant,Thai Restaurant
9,YO! St Paul's,51.513354,-0.099841,4,Italian Restaurant,Café,Sandwich Place,Japanese Restaurant,Vietnamese Restaurant,Salad Place,Sushi Restaurant,Burger Joint,Falafel Restaurant,Restaurant


In [201]:
LDNY_merged.loc[LDNY_merged['Cluster Labels'] == 3]

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,YO! Fulham Broadway,51.480416,-0.194515,3,Pizza Place,Gastropub,Café,Sandwich Place,Burger Joint,Fast Food Restaurant,Portuguese Restaurant,Asian Restaurant,Brazilian Restaurant,Modern European Restaurant
11,YO! Westfield Stratford,51.543408,-0.006244,3,Sandwich Place,Café,Pizza Place,Burger Joint,Bakery,Italian Restaurant,Lebanese Restaurant,Thai Restaurant,Tapas Restaurant,Diner
13,YO! Kitchen Westfield White City,51.508054,-0.22367,3,Burger Joint,Bakery,Café,Pizza Place,Middle Eastern Restaurant,Chinese Restaurant,Sushi Restaurant,Sandwich Place,Falafel Restaurant,Indian Restaurant
0,10001,40.750742,-73.99653,3,Café,Deli / Bodega,Salad Place,American Restaurant,Sandwich Place,Burger Joint,Thai Restaurant,Pizza Place,Bagel Shop,Restaurant
1,10002,40.71704,-73.987,3,Mexican Restaurant,Asian Restaurant,Pizza Place,Café,Chinese Restaurant,Bakery,Ramen Restaurant,Sandwich Place,Japanese Restaurant,American Restaurant
7,10009,40.727093,-73.97864,3,Deli / Bodega,Pizza Place,Mexican Restaurant,Italian Restaurant,Korean Restaurant,Vegetarian / Vegan Restaurant,American Restaurant,Ramen Restaurant,Chinese Restaurant,Sushi Restaurant
23,10025,40.798502,-73.96811,3,Mexican Restaurant,Indian Restaurant,Pizza Place,Latin American Restaurant,Bagel Shop,Café,Szechuan Restaurant,Japanese Restaurant,Chinese Restaurant,Vietnamese Restaurant
24,10026,40.802853,-73.95471,3,African Restaurant,Deli / Bodega,Caribbean Restaurant,Southern / Soul Food Restaurant,Seafood Restaurant,Mexican Restaurant,Café,Indian Restaurant,Sandwich Place,Fried Chicken Joint
30,10032,40.840686,-73.94154,3,Pizza Place,Mexican Restaurant,Latin American Restaurant,Deli / Bodega,Chinese Restaurant,Bakery,Diner,American Restaurant,Sandwich Place,Thai Restaurant
31,10033,40.848764,-73.93496,3,Deli / Bodega,Bakery,Spanish Restaurant,Mexican Restaurant,Donut Shop,Pizza Place,Latin American Restaurant,Chinese Restaurant,Caribbean Restaurant,Snack Place


In [203]:
LDNY_merged.loc[LDNY_merged['Cluster Labels'] == 4]

Unnamed: 0,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,YO! London Harvey Nichols,51.501584,-0.159805,4,Italian Restaurant,Café,Restaurant,French Restaurant,Bakery,Steakhouse,Middle Eastern Restaurant,Japanese Restaurant,Gastropub,English Restaurant
2,YO! Southbank Centre Festival Hall,51.505767,-0.116825,4,Restaurant,Italian Restaurant,Sandwich Place,Café,Bakery,Sushi Restaurant,Indian Restaurant,Fish & Chips Shop,Asian Restaurant,Steakhouse
3,YO! Bond Street,51.514054,-0.147643,4,Café,French Restaurant,Italian Restaurant,Burger Joint,Bakery,Sandwich Place,Restaurant,English Restaurant,Sushi Restaurant,Deli / Bodega
4,YO! Russell Square,51.52418,-0.123922,4,Café,Sandwich Place,Pizza Place,Deli / Bodega,Italian Restaurant,Indian Restaurant,Bakery,Bistro,Restaurant,Chinese Restaurant
5,YO! London Selfridges,51.514586,-0.152839,4,Italian Restaurant,Café,Restaurant,French Restaurant,Sandwich Place,Middle Eastern Restaurant,Burger Joint,Japanese Restaurant,Chinese Restaurant,Bakery
6,YO! St Pancras Station,51.531428,-0.126147,4,Café,Italian Restaurant,Breakfast Spot,Sandwich Place,Burger Joint,Sushi Restaurant,Pizza Place,Gastropub,English Restaurant,Modern European Restaurant
7,YO! Waterloo Station,51.503146,-0.113259,4,Sandwich Place,Bakery,Restaurant,Café,Italian Restaurant,Burger Joint,Japanese Restaurant,Breakfast Spot,Korean Restaurant,Fish & Chips Shop
8,YO! Victoria Station,51.494974,-0.144354,4,Italian Restaurant,Café,Sandwich Place,Restaurant,Sushi Restaurant,Bakery,Indian Restaurant,Mediterranean Restaurant,English Restaurant,Thai Restaurant
9,YO! St Paul's,51.513354,-0.099841,4,Italian Restaurant,Café,Sandwich Place,Japanese Restaurant,Vietnamese Restaurant,Salad Place,Sushi Restaurant,Burger Joint,Falafel Restaurant,Restaurant
10,YO! Finchley Road,51.547945,-0.18143,4,Café,Pizza Place,Italian Restaurant,Restaurant,Japanese Restaurant,Asian Restaurant,Chinese Restaurant,Bakery,Burger Joint,Portuguese Restaurant
