# The Battle of Neighborhoods (Week 2)

# Business problem

  Jerry is an international business man and he is always looking for new opportunities around the world. Recently, He wants to start his restaurant business in north America. After tons of research, he is finally down to two cities: New York and Toronto. Both cities are very diverse and are the financial capitals of their respective countries. Jerry thinks the safest way is to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. And then he can get more helpful information for him making decisions.

# Data description

  Firstly, we need to fetch and explore neighborhoods data of those two cities from Foursquare API. The information we want to focus on are restaurants, coffee shops, house prices and crime rate. We will choose one venue from each city: Manhattan for New York and Downtown for Toronto. We need to apply Neighborhood Segmentation and Clustering to analyzing the neighborhood data and prioritize the best restaurant location in both cities based on foot traffic and cost. Lastly, we can decide which city is the better place for Jerry to get start. 

# Methodology

  We need to explore two cities one by one and the methodology will be same. Firstly, we will take the processed Toronto data from week 3 but only include Downtown of Toronto for this project. For the New York Manhattan data, we will extract that part of data from the previous data file,  Then explore both cities by using Foursquare API and visualize the results separately.



# Data  preparation-Downtown of Toronto

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import urllib
from bs4 import BeautifulSoup
import requests
from urllib.request import urlopen
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 


In [2]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from folium import plugins

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be 

In [3]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
import urllib.request

with urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M") as url:
    Html = url.read()

Soup = BeautifulSoup(Html, 'lxml')
Soup.prettify
table = Soup.find('table', class_ ='wikitable sortable')
list1 = table.find_all('tr')
list2=[]
for i in list1:
    a = i.text.split('\n')[1:-1]
    list2.append(a)
    
list2[0][-1] = 'Neighborhood'   
df1 = pd.DataFrame(list2[1:], columns=list2[0])

df1.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [4]:
not_boroughs = df1.index[df1['Borough'] == 'Not assigned']

df1.drop(df1.index[not_boroughs], inplace=True)
df1.reset_index(drop=True, inplace=True)
not_neighborhood = df1.index[df1['Neighborhood'] == 'Not assigned'] 
for k in not_neighborhood:
    df1['Neighborhood'][k] ==df1['Borough'][k]  
group = df1.groupby('Postcode')                         

list_neighborhoods = group['Neighborhood'].apply(lambda x: "%s" % ', '.join(x))
list_boroughs = group['Borough'].apply(lambda x: set(x).pop())
df2 = pd.DataFrame(list(zip(list_boroughs.index, list_boroughs, list_neighborhoods)))
df2.columns = ['Postcode', 'Borough', 'Neighborhood']
df2.head(20)

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [5]:
coordinates_df = pd.read_csv('http://cocl.us/Geospatial_data')
df3 = df2.join(coordinates_df.set_index('Postal Code'), on='Postcode')
df3.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [6]:
downtown_toronto = df3[df3['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_toronto = downtown_toronto.drop(['Postcode'], axis=1)
downtown_toronto.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,Rosedale,43.679563,-79.377529
1,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
2,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
4,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
5,Downtown Toronto,St. James Town,43.651494,-79.375418
6,Downtown Toronto,Berczy Park,43.644771,-79.373306
7,Downtown Toronto,Central Bay Street,43.657952,-79.387383
8,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568
9,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752


# Data  preparation-Manhattan 

In [7]:
!wget -q -O 'newyork_data.json' http://cocl.us/new_york_dataset

In [8]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [9]:
neighborhoods_data = newyork_data['features']

In [10]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough']
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()    

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
print('The dataframe has {} boroughs ans {} neighborhoods.'.format( 
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
   )
)

The dataframe has 5 boroughs ans 306 neighborhoods.


In [12]:
manhattan = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [13]:
CLIENT_ID = 'A22KAOOJ5K2HDCPOKCUSCRS1HG2QFU33J0GSSBA4QYGU0LG5'
CLIENT_SECRET = 'V3ERF4RHQGGDDBXE3OFK3SQERWZLGHHXGOKL1ZX3GVNSXFCY'
VERSION = '20180604'


# Downtown_toronto API & Map

In [14]:
address = 'Downtown Toronto, Toronto, ON, Canada'
geolocator = Nominatim()
location = geolocator.geocode(address)
downtown_toronto_latitude = location.latitude
downtown_toronto_longitude = location.longitude

  from ipykernel import kernelapp as app


In [15]:
Downtown_toronto_map = folium.Map(location=[downtown_toronto_latitude, downtown_toronto_longitude], zoom_start=13)
for lat, lng, label in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'], downtown_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#31a5cc',
        fill_opacity=0.6,
        parse_html=False).add_to(Downtown_toronto_map)  
    
Downtown_toronto_map


In [16]:
Downtown_toronto_map = folium.Map(location=[downtown_toronto_latitude, downtown_toronto_longitude], zoom_start=13)
Food = plugins.MarkerCluster().add_to(Downtown_toronto_map)
for lat, lng, label in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'], downtown_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#31a5cc',
        fill_opacity=0.6,
        parse_html=False).add_to(Food)  

Downtown_toronto_map  

# Foursquare API-Manhattan

In [17]:
CLIENT_ID = 'A22KAOOJ5K2HDCPOKCUSCRS1HG2QFU33J0GSSBA4QYGU0LG5'
CLIENT_SECRET = 'V3ERF4RHQGGDDBXE3OFK3SQERWZLGHHXGOKL1ZX3GVNSXFCY'
VERSION = '20180604'


In [18]:
address = 'Manhattan, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
manhattan_latitude = location.latitude
manhattan_longitude = location.longitude

  app.launch_new_instance()


In [19]:
manhattan_map = folium.Map(location=[manhattan_latitude, manhattan_longitude], zoom_start=10)

for lat, lng, label in zip(manhattan['Latitude'], manhattan['Longitude'], manhattan['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#31a5cc',
        fill_opacity=0.6,
        parse_html=False).add_to(manhattan_map)  
manhattan_map    

In [20]:
manhattan_map = folium.Map(location=[manhattan_latitude, manhattan_longitude], zoom_start=10)

grouping = plugins.MarkerCluster().add_to(manhattan_map)

for lat, lng, label in zip(manhattan['Latitude'], manhattan['Longitude'], manhattan['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#31a5cc',
        fill_opacity=0.6,
        parse_html=False).add_to(grouping)  
    
manhattan_map

# Analyzing & Clustering 

#### Downtown Toronto

In [21]:
radius = 500
limit = 20
venues = []

for lat, long, neighborhood in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'],downtown_toronto['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        limit)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))






In [22]:
venues_df1 = pd.DataFrame(venues)
venues_df1.columns = ['Neighborhood', 'NeighborhoodLatitude', 'NeighborhoodLongitude', 'Venue', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
venues_df1.head()

Unnamed: 0,Neighborhood,NeighborhoodLatitude,NeighborhoodLongitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
0,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
1,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
2,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
3,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
4,"Cabbagetown, St. James Town",43.667967,-79.367675,Butter Chicken Factory,43.667072,-79.369184,Indian Restaurant


In [23]:
venues_df1.groupby('Neighborhood').count()
venues_type_onehot1 = pd.get_dummies(venues_df1[['VenueCategory']], prefix="", prefix_sep="")

venues_type_onehot1 ['Neighborhood'] = venues_df1['Neighborhood'] 

fixed_columns = [venues_type_onehot1.columns[-1]] + list(venues_type_onehot1.columns[:-1])
venues_type_onehot1 = venues_type_onehot1[fixed_columns]

venues_type_onehot1.head()  


Unnamed: 0,Wine Bar,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,...,Taco Place,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
venues_grouped1 = venues_type_onehot1.groupby('Neighborhood').mean().reset_index()
venues_grouped1.head()

Unnamed: 0,Neighborhood,Wine Bar,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,...,Taco Place,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.05,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0


In [25]:
num_top_venues = 5
for hood in venues_grouped1 ['Neighborhood']:
    print("----"+hood+"----")
    temp = venues_grouped1[venues_grouped1 ['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')		



----Adelaide, King, Richmond----
              venue  freq
0  Asian Restaurant  0.10
1        Steakhouse  0.10
2  Greek Restaurant  0.05
3        Food Court  0.05
4       Pizza Place  0.05


----Berczy Park----
                venue  freq
0  Seafood Restaurant  0.10
1      Farmers Market  0.10
2            Beer Bar  0.10
3              Bistro  0.05
4            Fountain  0.05


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0    Airport Lounge  0.14
1   Airport Service  0.14
2  Airport Terminal  0.14
3   Harbor / Marina  0.07
4           Airport  0.07


----Cabbagetown, St. James Town----
           venue  freq
0           Café  0.10
1         Market  0.05
2  Jewelry Store  0.05
3           Park  0.05
4          Diner  0.05


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.30
1     Bubble Tea Shop  0.10
2                 Spa  0.05
3    Sushi Restaurant  0.05
4  

In [26]:
def most_common_venues1(row, venues_rank1):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:venues_rank1]

venues_rank1 = 10
indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(venues_rank1):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))




In [27]:
neighborhoods_venues_sorted1 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted1['Neighborhood'] = venues_grouped1['Neighborhood']

for ind in np.arange(venues_grouped1.shape[0]):
    neighborhoods_venues_sorted1.iloc[ind, 1:] = most_common_venues1(venues_grouped1.iloc[ind, :], venues_rank1)


In [63]:
k = 5
grouped_clustering1 = venues_grouped1.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=k, random_state=0).fit(grouped_clustering1)

merged1 = downtown_toronto
merged1['Cluster Labels'] = kmeans.labels_
merged1 = merged1.join(neighborhoods_venues_sorted1.set_index('Neighborhood'), on='Neighborhood')

merged1


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Park,Trail,Playground,Comic Shop,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church
1,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,0,Café,Gift Shop,Restaurant,Diner,Gastropub,General Entertainment,Deli / Bodega,Indian Restaurant,Italian Restaurant,Japanese Restaurant
2,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,4,Breakfast Spot,Salon / Barbershop,Mexican Restaurant,Burger Joint,Bubble Tea Shop,Park,Bookstore,Pizza Place,Pub,Ramen Restaurant
3,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,1,Coffee Shop,Bakery,Breakfast Spot,Gym / Fitness Center,Restaurant,Spa,Pub,Mexican Restaurant,Historic Site,Performing Arts Venue
4,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,3,Café,Sandwich Place,Comic Shop,Movie Theater,Burger Joint,Music Venue,Pizza Place,Clothing Store,Plaza,Ramen Restaurant
5,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Gastropub,Hotel,Coffee Shop,Italian Restaurant,Church,Café,Middle Eastern Restaurant,Restaurant,Cosmetics Shop,Japanese Restaurant
6,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Farmers Market,Beer Bar,Seafood Restaurant,Fountain,Steakhouse,Park,Cocktail Bar,Liquor Store,Breakfast Spot,Bistro
7,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1,Coffee Shop,Bubble Tea Shop,Seafood Restaurant,Japanese Restaurant,Italian Restaurant,Ramen Restaurant,Spa,Sushi Restaurant,Park,Tea Room
8,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,1,Asian Restaurant,Steakhouse,Gym / Fitness Center,Plaza,Concert Hall,Pizza Place,Seafood Restaurant,Hotel,Lounge,Speakeasy
9,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752,3,Park,Plaza,Café,Hotel,Bubble Tea Shop,Supermarket,Performing Arts Venue,Lake,Salad Place,Skating Rink


In [29]:
map_clusters1 = folium.Map(location=[downtown_toronto_latitude,downtown_toronto_longitude], zoom_start=11)

x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(merged1['Latitude'], merged1['Longitude'], merged1['Neighborhood'], merged1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters1)

map_clusters1


### Cluster 1

In [30]:
merged1.loc[merged1['Cluster Labels'] == 0, merged1.columns[[1] + list(range(5, merged1.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Cabbagetown, St. James Town",Café,Gift Shop,Restaurant,Diner,Gastropub,General Entertainment,Deli / Bodega,Indian Restaurant,Italian Restaurant,Japanese Restaurant
17,Christie,Café,Grocery Store,Park,Italian Restaurant,Coffee Shop,Restaurant,Nightclub,Baby Store,Athletics & Sports,Convenience Store


### Cluster2

In [31]:
merged1.loc[merged1['Cluster Labels'] == 1, merged1.columns[[1] + list(range(5, merged1.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Rosedale,Park,Trail,Playground,Comic Shop,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church
3,"Harbourfront, Regent Park",Coffee Shop,Bakery,Breakfast Spot,Gym / Fitness Center,Restaurant,Spa,Pub,Mexican Restaurant,Historic Site,Performing Arts Venue
5,St. James Town,Gastropub,Hotel,Coffee Shop,Italian Restaurant,Church,Café,Middle Eastern Restaurant,Restaurant,Cosmetics Shop,Japanese Restaurant
6,Berczy Park,Farmers Market,Beer Bar,Seafood Restaurant,Fountain,Steakhouse,Park,Cocktail Bar,Liquor Store,Breakfast Spot,Bistro
7,Central Bay Street,Coffee Shop,Bubble Tea Shop,Seafood Restaurant,Japanese Restaurant,Italian Restaurant,Ramen Restaurant,Spa,Sushi Restaurant,Park,Tea Room
8,"Adelaide, King, Richmond",Asian Restaurant,Steakhouse,Gym / Fitness Center,Plaza,Concert Hall,Pizza Place,Seafood Restaurant,Hotel,Lounge,Speakeasy
10,"Design Exchange, Toronto Dominion Centre",Coffee Shop,Café,Restaurant,Gym / Fitness Center,Pub,Salad Place,Japanese Restaurant,Hotel,Deli / Bodega,Beer Bar
11,"Commerce Court, Victoria Hotel",Café,Restaurant,Gastropub,Museum,Coffee Shop,Japanese Restaurant,Beer Bar,Deli / Bodega,Bakery,Gym
12,"Harbord, University of Toronto",Bakery,Restaurant,Japanese Restaurant,Bookstore,Sandwich Place,Bar,Comfort Food Restaurant,Italian Restaurant,College Gym,College Arts Building
15,Stn A PO Boxes 25 The Esplanade,Farmers Market,Cocktail Bar,Food Truck,Steakhouse,Café,Park,Clothing Store,Jazz Club,Beer Bar,Concert Hall


### Cluster3

In [32]:
merged1.loc[merged1['Cluster Labels'] == 2, merged1.columns[[1] + list(range(5, merged1.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Service,Airport Terminal,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Boutique,Sculpture Garden,Harbor / Marina


### Cluster4

In [33]:
merged1.loc[merged1['Cluster Labels'] == 3, merged1.columns[[1] + list(range(5, merged1.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"Ryerson, Garden District",Café,Sandwich Place,Comic Shop,Movie Theater,Burger Joint,Music Venue,Pizza Place,Clothing Store,Plaza,Ramen Restaurant
9,"Harbourfront East, Toronto Islands, Union Station",Park,Plaza,Café,Hotel,Bubble Tea Shop,Supermarket,Performing Arts Venue,Lake,Salad Place,Skating Rink
13,"Chinatown, Grange Park, Kensington Market",Café,Vietnamese Restaurant,Caribbean Restaurant,Mexican Restaurant,Organic Grocery,Arts & Crafts Store,Bakery,Bar,Belgian Restaurant,Cheese Shop


### Cluster5

In [34]:
merged1.loc[merged1['Cluster Labels'] == 4, merged1.columns[[1] + list(range(5, merged1.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Church and Wellesley,Breakfast Spot,Salon / Barbershop,Mexican Restaurant,Burger Joint,Bubble Tea Shop,Park,Bookstore,Pizza Place,Pub,Ramen Restaurant


### manhattan

In [92]:
radius = 500
limit = 20
venues = []

for lat, long, neighborhood in zip(manhattan['Latitude'], manhattan['Longitude'],manhattan['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        limit)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

                  

In [93]:
venues_df2 = pd.DataFrame(venues)
venues_df2.columns = ['Neighborhood', 'NeighborhoodLatitude', 'NeighborhoodLongitude', 'Venue', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
venues_df2.head()

Unnamed: 0,Neighborhood,NeighborhoodLatitude,NeighborhoodLongitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Blink Fitness Riverdale,40.877147,-73.905837,Gym


In [94]:
venues_df2.groupby('Neighborhood').count()
venues_type_onehot2 = pd.get_dummies(venues_df2[['VenueCategory']], prefix="", prefix_sep="")

venues_type_onehot2['Neighborhood'] = venues_df2['Neighborhood'] 

fixed_columns = [venues_type_onehot2.columns[-1]] + list(venues_type_onehot2.columns[:-1])
venues_type_onehot2 = venues_type_onehot2[fixed_columns]

venues_type_onehot2.head()  


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Art Gallery,Art Museum,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [95]:
venues_grouped2 = venues_type_onehot2.groupby('Neighborhood').mean().reset_index()
venues_grouped2.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Art Gallery,Art Museum,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0
2,Central Harlem,0.0,0.0,0.0,0.05,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [96]:
num_top_venues = 5

for hood in venues_grouped2 ['Neighborhood']:
    print("----"+hood+"----")
    temp = venues_grouped2[venues_grouped2['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')		



----Battery Park City----
           venue  freq
0           Park  0.15
1     Food Court  0.10
2  Memorial Site  0.10
3    Coffee Shop  0.05
4      BBQ Joint  0.05


----Carnegie Hill----
                  venue  freq
0  Gym / Fitness Center  0.10
1                   Gym  0.10
2           Coffee Shop  0.10
3    Italian Restaurant  0.10
4             Bookstore  0.05


----Central Harlem----
                 venue  freq
0    French Restaurant  0.10
1  American Restaurant  0.10
2         Dessert Shop  0.05
3              Library  0.05
4          Pizza Place  0.05


----Chelsea----
                venue  freq
0             Theater  0.10
1        Cupcake Shop  0.05
2                 Bar  0.05
3                Café  0.05
4  Chinese Restaurant  0.05


----Chinatown----
                 venue  freq
0   Chinese Restaurant  0.15
1                  Spa  0.10
2       Sandwich Place  0.10
3          Pizza Place  0.05
4  American Restaurant  0.05


----Civic Center----
                  venue  freq


In [97]:
def most_common_venues2(row, venues_rank2):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:venues_rank2]


In [98]:
venues_rank2 = 10
indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(venues_rank2):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


In [99]:
neighborhoods_venues_sorted2 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted2['Neighborhood'] = venues_grouped2['Neighborhood']

for ind in np.arange(venues_grouped2.shape[0]):
    neighborhoods_venues_sorted2.iloc[ind, 1:] = most_common_venues2(venues_grouped2.iloc[ind, :], venues_rank2)
neighborhoods_venues_sorted2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Memorial Site,Food Court,Performing Arts Venue,Grocery Store,Burrito Place,Sandwich Place,Shopping Mall,Plaza,Coffee Shop
1,Carnegie Hill,Coffee Shop,Gym,Gym / Fitness Center,Italian Restaurant,Pizza Place,Bookstore,Café,Gourmet Shop,Bagel Shop,Spa
2,Central Harlem,French Restaurant,American Restaurant,Boutique,Jazz Club,Café,Cycle Studio,Ethiopian Restaurant,Library,Music Venue,Beer Bar
3,Chelsea,Theater,Asian Restaurant,Steakhouse,Beer Bar,Speakeasy,Bar,Tapas Restaurant,Coffee Shop,Italian Restaurant,Cupcake Shop
4,Chinatown,Chinese Restaurant,Spa,Sandwich Place,New American Restaurant,English Restaurant,Garden Center,Museum,Cocktail Bar,Bakery,Greek Restaurant
5,Civic Center,Falafel Restaurant,Yoga Studio,Bar,Park,Dance Studio,Nail Salon,Monument / Landmark,Molecular Gastronomy Restaurant,Martial Arts Dojo,Spa
6,Clinton,Theater,Gym / Fitness Center,Peruvian Restaurant,Mediterranean Restaurant,Lounge,Café,Sporting Goods Shop,French Restaurant,Movie Theater,Dog Run
7,East Harlem,Mexican Restaurant,Pet Store,Beer Bar,Café,Steakhouse,Latin American Restaurant,French Restaurant,Thai Restaurant,Sandwich Place,Dance Studio
8,East Village,Vietnamese Restaurant,Park,Dog Run,Moroccan Restaurant,Coffee Shop,Japanese Restaurant,Bar,Korean Restaurant,Pizza Place,Bagel Shop
9,Financial District,Coffee Shop,Gym / Fitness Center,New American Restaurant,Spa,Food Truck,Monument / Landmark,Café,Steakhouse,Salad Place,Doctor's Office


In [100]:
kclusters = 5

grouped_clustering2 = venues_grouped2.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(grouped_clustering2)

kmeans.labels_[0:10] 



array([0, 4, 1, 1, 0, 0, 0, 0, 2, 0], dtype=int32)

In [101]:
merged2 = manhattan

merged2['Cluster_Labels'] = kmeans.labels_
merged2 = merged2.join(neighborhoods_venues_sorted2.set_index('Neighborhood'), on='Neighborhood')

merged2

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,"KMeans(algorithm='auto', copy_x=True, init='k-...",0,Coffee Shop,Yoga Studio,Department Store,Sandwich Place,Miscellaneous Shop,Donut Shop,Kids Store,Steakhouse,Supplement Shop,Discount Store
1,Manhattan,Chinatown,40.715618,-73.994279,"KMeans(algorithm='auto', copy_x=True, init='k-...",4,Chinese Restaurant,Spa,Sandwich Place,New American Restaurant,English Restaurant,Garden Center,Museum,Cocktail Bar,Bakery,Greek Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,"KMeans(algorithm='auto', copy_x=True, init='k-...",1,Wine Shop,Café,Market,Bakery,Pet Café,Deli / Bodega,Pizza Place,Coffee Shop,Ramen Restaurant,Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,"KMeans(algorithm='auto', copy_x=True, init='k-...",1,Café,Bakery,Wine Bar,Park,Yoga Studio,Latin American Restaurant,Frozen Yogurt Shop,Farmers Market,Mexican Restaurant,Diner
4,Manhattan,Hamilton Heights,40.823604,-73.949688,"KMeans(algorithm='auto', copy_x=True, init='k-...",0,Yoga Studio,Mexican Restaurant,Caribbean Restaurant,Cocktail Bar,Coffee Shop,Italian Restaurant,Mediterranean Restaurant,Café,Bar,Bakery
5,Manhattan,Manhattanville,40.816934,-73.957385,"KMeans(algorithm='auto', copy_x=True, init='k-...",0,Italian Restaurant,Park,Japanese Curry Restaurant,Café,Lounge,Mexican Restaurant,Ramen Restaurant,Bike Trail,Dumpling Restaurant,Juice Bar
6,Manhattan,Central Harlem,40.815976,-73.943211,"KMeans(algorithm='auto', copy_x=True, init='k-...",0,French Restaurant,American Restaurant,Boutique,Jazz Club,Café,Cycle Studio,Ethiopian Restaurant,Library,Music Venue,Beer Bar
7,Manhattan,East Harlem,40.792249,-73.944182,"KMeans(algorithm='auto', copy_x=True, init='k-...",0,Mexican Restaurant,Pet Store,Beer Bar,Café,Steakhouse,Latin American Restaurant,French Restaurant,Thai Restaurant,Sandwich Place,Dance Studio
8,Manhattan,Upper East Side,40.775639,-73.960508,"KMeans(algorithm='auto', copy_x=True, init='k-...",2,Hotel,Boutique,Italian Restaurant,French Restaurant,Park,Spa,Bookstore,Chocolate Shop,Coffee Shop,Jazz Club
9,Manhattan,Yorkville,40.77593,-73.947118,"KMeans(algorithm='auto', copy_x=True, init='k-...",0,Wine Shop,Coffee Shop,Deli / Bodega,Liquor Store,Bagel Shop,Hobby Shop,Pub,Diner,Dog Run,Monument / Landmark


In [104]:
map_clusters2 = folium.Map(location=[manhattan_latitude, manhattan_longitude], zoom_start=10)

x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(merged2['Latitude'], merged2['Longitude'],merged2['Neighborhood'], merged2['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters2)

map_clusters2


### Cluster 1

In [108]:
merged2.loc[merged2['Cluster_Labels'] == 0, merged2.columns[[1] + list(range(5, merged2.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,0,Coffee Shop,Yoga Studio,Department Store,Sandwich Place,Miscellaneous Shop,Donut Shop,Kids Store,Steakhouse,Supplement Shop,Discount Store
4,Hamilton Heights,0,Yoga Studio,Mexican Restaurant,Caribbean Restaurant,Cocktail Bar,Coffee Shop,Italian Restaurant,Mediterranean Restaurant,Café,Bar,Bakery
5,Manhattanville,0,Italian Restaurant,Park,Japanese Curry Restaurant,Café,Lounge,Mexican Restaurant,Ramen Restaurant,Bike Trail,Dumpling Restaurant,Juice Bar
6,Central Harlem,0,French Restaurant,American Restaurant,Boutique,Jazz Club,Café,Cycle Studio,Ethiopian Restaurant,Library,Music Venue,Beer Bar
7,East Harlem,0,Mexican Restaurant,Pet Store,Beer Bar,Café,Steakhouse,Latin American Restaurant,French Restaurant,Thai Restaurant,Sandwich Place,Dance Studio
9,Yorkville,0,Wine Shop,Coffee Shop,Deli / Bodega,Liquor Store,Bagel Shop,Hobby Shop,Pub,Diner,Dog Run,Monument / Landmark
16,Murray Hill,0,Hotel,Jazz Club,Ramen Restaurant,Jewish Restaurant,Sushi Restaurant,Chinese Restaurant,Restaurant,Japanese Restaurant,Tea Room,Speakeasy
22,Little Italy,0,Café,Wine Bar,Sandwich Place,Ice Cream Shop,History Museum,Coffee Shop,Snack Place,Spanish Restaurant,Salon / Barbershop,Salad Place
23,Soho,0,Women's Store,Men's Store,Salon / Barbershop,Clothing Store,Cupcake Shop,Art Museum,Miscellaneous Shop,Tea Room,Dessert Shop,Arts & Crafts Store
29,Financial District,0,Coffee Shop,Gym / Fitness Center,New American Restaurant,Spa,Food Truck,Monument / Landmark,Café,Steakhouse,Salad Place,Doctor's Office


### Cluster 2

In [107]:
merged2.loc[merged2['Cluster_Labels'] == 1, merged2.columns[[1] + list(range(5, merged2.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,1,Wine Shop,Café,Market,Bakery,Pet Café,Deli / Bodega,Pizza Place,Coffee Shop,Ramen Restaurant,Restaurant
3,Inwood,1,Café,Bakery,Wine Bar,Park,Yoga Studio,Latin American Restaurant,Frozen Yogurt Shop,Farmers Market,Mexican Restaurant,Diner
14,Clinton,1,Theater,Gym / Fitness Center,Peruvian Restaurant,Mediterranean Restaurant,Lounge,Café,Sporting Goods Shop,French Restaurant,Movie Theater,Dog Run
19,East Village,1,Vietnamese Restaurant,Park,Dog Run,Moroccan Restaurant,Coffee Shop,Japanese Restaurant,Bar,Korean Restaurant,Pizza Place,Bagel Shop
25,Manhattan Valley,1,Bar,Yoga Studio,Hawaiian Restaurant,Grocery Store,Italian Restaurant,Korean Restaurant,Fried Chicken Joint,Mexican Restaurant,New American Restaurant,Noodle House
26,Morningside Heights,1,American Restaurant,Bookstore,Park,Sandwich Place,Coffee Shop,Farmers Market,Ice Cream Shop,Outdoor Sculpture,Burger Joint,Salad Place
27,Gramercy,1,Pizza Place,Spa,Coffee Shop,Thrift / Vintage Store,Comedy Club,Playground,Mexican Restaurant,Food Truck,Liquor Store,Bike Rental / Bike Share
35,Turtle Bay,1,Karaoke Bar,Sushi Restaurant,Tourist Information Center,Grocery Store,Ramen Restaurant,Gift Shop,Café,Cocktail Bar,Greek Restaurant,Museum
36,Tudor City,1,Park,Yoga Studio,Asian Restaurant,Pizza Place,Deli / Bodega,Café,Bridge,Martial Arts Dojo,Spanish Restaurant,Sushi Restaurant
38,Flatiron,1,Cycle Studio,Japanese Restaurant,Gym / Fitness Center,Wine Shop,Gym,Salad Place,Furniture / Home Store,Thai Restaurant,Sports Club,Café


### Cluster 3

In [109]:
merged2.loc[merged2['Cluster_Labels'] == 2, merged2.columns[[1] + list(range(5, merged2.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Upper East Side,2,Hotel,Boutique,Italian Restaurant,French Restaurant,Park,Spa,Bookstore,Chocolate Shop,Coffee Shop,Jazz Club
11,Roosevelt Island,2,Sandwich Place,Hotpot Restaurant,Japanese Restaurant,Coffee Shop,Pizza Place,Playground,Deli / Bodega,Residential Building (Apartment / Condo),Bus Line,Scenic Lookout
13,Lincoln Square,2,Theater,Concert Hall,Performing Arts Venue,Indie Movie Theater,Library,Gift Shop,College Arts Building,Plaza,Fountain,Circus
20,Lower East Side,2,Japanese Restaurant,Coffee Shop,Art Gallery,Yoga Studio,Pet Café,Cocktail Bar,Chinese Restaurant,Café,Filipino Restaurant,Bubble Tea Shop
21,Tribeca,2,Park,Wine Shop,Greek Restaurant,Cycle Studio,Coffee Shop,Poke Place,Café,Salad Place,Spa,Steakhouse
24,West Village,2,Coffee Shop,Cocktail Bar,Chinese Restaurant,Italian Restaurant,Accessories Store,Bakery,Cosmetics Shop,New American Restaurant,Candy Store,Mediterranean Restaurant
28,Battery Park City,2,Park,Memorial Site,Food Court,Performing Arts Venue,Grocery Store,Burrito Place,Sandwich Place,Shopping Mall,Plaza,Coffee Shop
30,Carnegie Hill,2,Coffee Shop,Gym,Gym / Fitness Center,Italian Restaurant,Pizza Place,Bookstore,Café,Gourmet Shop,Bagel Shop,Spa
33,Midtown South,2,Korean Restaurant,Lingerie Store,Hotel,Grocery Store,Cosmetics Shop,Dessert Shop,Coffee Shop,Clothing Store,Building,Food Truck


### Cluster 4

In [110]:
merged2.loc[merged2['Cluster_Labels'] == 3, merged2.columns[[1] + list(range(5, merged2.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Chelsea,3,Theater,Asian Restaurant,Steakhouse,Beer Bar,Speakeasy,Bar,Tapas Restaurant,Coffee Shop,Italian Restaurant,Cupcake Shop


### Cluster 5

In [111]:
merged2.loc[merged2['Cluster_Labels'] == 4, merged2.columns[[1] + list(range(5, merged2.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,4,Chinese Restaurant,Spa,Sandwich Place,New American Restaurant,English Restaurant,Garden Center,Museum,Cocktail Bar,Bakery,Greek Restaurant
10,Lenox Hill,4,Gym,Thai Restaurant,Restaurant,Dessert Shop,Japanese Restaurant,Pizza Place,Gourmet Shop,Salad Place,College Academic Building,Czech Restaurant
12,Upper West Side,4,Italian Restaurant,American Restaurant,Southern / Soul Food Restaurant,Mediterranean Restaurant,Café,Bookstore,Chinese Restaurant,Juice Bar,Pub,Movie Theater
15,Midtown,4,Hotel,Historic Site,Plaza,Smoke Shop,Park,Spa,Miscellaneous Shop,Salad Place,Sporting Goods Shop,Steakhouse
18,Greenwich Village,4,Café,Italian Restaurant,Yoga Studio,Jazz Club,Snack Place,Food Truck,French Restaurant,Caribbean Restaurant,Sushi Restaurant,Beer Bar
32,Civic Center,4,Falafel Restaurant,Yoga Studio,Bar,Park,Dance Studio,Nail Salon,Monument / Landmark,Molecular Gastronomy Restaurant,Martial Arts Dojo,Spa
34,Sutton Place,4,Indian Restaurant,Grocery Store,Gym,Yoga Studio,Italian Restaurant,French Restaurant,Liquor Store,Spiritual Center,Steakhouse,Beer Store
37,Stuyvesant Town,4,Bar,Park,Playground,Pet Service,Bistro,Fountain,Heliport,Baseball Field,Harbor / Marina,Gym / Fitness Center
39,Hudson Yards,4,American Restaurant,Art Gallery,Pet Store,Theater,Cocktail Bar,Deli / Bodega,Music School,Public Art,Residential Building (Apartment / Condo),Scenic Lookout


#### Recommendation & Conlusion

As we can see from clustering results: Manhattan has more diversed restaurants and entertament venues than Downtown Toronto, which means it could attract more foot traffic and opportunites. Also, the more diversed restaurants and cafes are in neborhoods , the more people are open to different types of food cultures. This may also lower the risk for Jerry strating his restaurant business since the poetential customers base at Manhattan. 