### CAPSTONE PROJECT - BATTLE OF NEIGHBORHOODS

### INTRODUCTION

In this project, I am interested in New York City data. First, we will find the most
visited commercial shop according to the number of check-ins, then we will try to find
the neighborhoods that are lacking the selected type of shop which could be potential
business opportunity. So, the aim is to explore the two cities for tourist who wants to
visit in them keeping in mindthe areas of food, hotels, museums and much more.

### PREPROCESSING

In [16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import requests
import matplotlib.cm as cm
from geopy.geocoders import Nominatim
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup

In [4]:
URL = r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
actual_content = soup.table
table_content = actual_content.tbody
info = list(table_content.find_all('tr'))
actual_data = []
for i in info : 
  actual_data.append([j.string.replace('\n','') for j in i.find_all('td')])
head = [j.string.replace('\n','') for j in info[0].find_all('th')]
df = pd.DataFrame(actual_data[1:],columns = head)
df.drop(df.index[df['Borough'] == 'Not assigned'], inplace = True)
df = df.reset_index(drop = True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [6]:
df1=pd.read_csv('https://cocl.us/Geospatial_data')
df_final = pd.merge(left=df, right=df1, left_on='Postal Code', right_on='Postal Code')
df_final.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [8]:
downtown_toronto_data = df_final[df_final['Borough'].str.contains("Downtown Toronto")].reset_index(drop=True)
downtown_toronto_data=downtown_toronto_data.drop(['Postal Code'], axis=1)
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [9]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

In [11]:
#define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [12]:
# Creating new Dataframe manhattan_data
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [13]:
CLIENT_ID = 'GTVVD23UNJDTNFAYV23DAS0VHUN4NMSGWWVQKUFREWIWG5TE' 
CLIENT_SECRET = '5NE04HTZ3BPELSN2114VZRI1KQ0QALUTR2KF4E3WHXNZO0SJ' 
VERSION = '20180605' # Foursquare API version
limit = 20
print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:GTVVD23UNJDTNFAYV23DAS0VHUN4NMSGWWVQKUFREWIWG5TE
CLIENT_SECRET:5NE04HTZ3BPELSN2114VZRI1KQ0QALUTR2KF4E3WHXNZO0SJ


In [17]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

Downtown Toronto latitude 43.6563221 & longitude -79.3809161


In [18]:
address = 'Manhattan, NY'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


### VISUALIZATION

In [19]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_downtown_toronto)  

map_downtown_toronto

In [20]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(incidents)  
    
map_downtown_toronto

In [21]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [22]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)
grouping = plugins.MarkerCluster().add_to(map_manhattan)
# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(grouping)  
    
map_manhattan

In [23]:
# create a function to repeat the process to all the neighborhoods in Toronto and Manhattan
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,limit)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(name,lat,lng, v['venue']['name'],v['venue']['location']['lat'], v['venue']['location']['lng'],  v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood','Neighborhood Latitude','Neighborhood Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']
    return(nearby_venues)

In [24]:
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],latitudes=downtown_toronto_data['Latitude'],longitudes=downtown_toronto_data['Longitude'],)

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [25]:
# check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,20,20,20,20,20,20
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,20,20,20,20,20,20
Christie,16,16,16,16,16,16
Church and Wellesley,20,20,20,20,20,20
"Commerce Court, Victoria Hotel",20,20,20,20,20,20
"First Canadian Place, Underground city",20,20,20,20,20,20
"Garden District, Ryerson",20,20,20,20,20,20
"Harbourfront East, Union Station, Toronto Islands",20,20,20,20,20,20
"Kensington Market, Chinatown, Grange Park",20,20,20,20,20,20


In [26]:
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]
downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()
# print each neighborhood along with the top 5 most common venues
num_top_venues = 5
for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0  Seafood Restaurant  0.10
1        Cocktail Bar  0.05
2  Basketball Stadium  0.05
3         Coffee Shop  0.05
4          Restaurant  0.05


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                venue  freq
0     Airport Service  0.19
1      Airport Lounge  0.12
2    Airport Terminal  0.12
3     Harbor / Marina  0.06
4  Airport Food Court  0.06


----Central Bay Street----
                        venue  freq
0                 Coffee Shop  0.30
1   Middle Eastern Restaurant  0.05
2                        Park  0.05
3                         Spa  0.05
4  Modern European Restaurant  0.05


----Christie----
           venue  freq
0  Grocery Store  0.25
1           Café  0.19
2           Park  0.12
3     Baby Store  0.06
4    Coffee Shop  0.06


----Church and Wellesley----
             venue  freq
0      Pizza Place  0.05
1  Bubble Tea Shop  0.05
2     Burger Joint 

In [28]:
# put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']
for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Basketball Stadium,Bakery,Park,Cocktail Bar,Coffee Shop,Museum,Breakfast Spot,Concert Hall,Restaurant
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Plane,Harbor / Marina,Boat or Ferry,Boutique,Rental Car Location,Sculpture Garden,Airport Gate
2,Central Bay Street,Coffee Shop,Café,Sushi Restaurant,Modern European Restaurant,Bubble Tea Shop,Ramen Restaurant,Middle Eastern Restaurant,Sandwich Place,Spa,Japanese Restaurant
3,Christie,Grocery Store,Café,Park,Candy Store,Diner,Coffee Shop,Nightclub,Restaurant,Italian Restaurant,Baby Store
4,Church and Wellesley,Pizza Place,Dance Studio,Park,Coffee Shop,Mexican Restaurant,Burger Joint,Bubble Tea Shop,Breakfast Spot,Ramen Restaurant,Martial Arts Dojo


In [30]:
# set number of clusters
kclusters = 5
downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 4, 0, 3, 3, 3, 0, 3], dtype=int32)

In [31]:
downtown_toronto_merged = downtown_toronto_data
# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
downtown_toronto_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Breakfast Spot,Farmers Market,Chocolate Shop,Pub,Restaurant,Performing Arts Venue,Dessert Shop,Bakery
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Sushi Restaurant,Wings Joint,Park,Arts & Crafts Store,Beer Bar,Burrito Place,Creperie,Diner,Distribution Center
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Café,Mexican Restaurant,Shopping Mall,Restaurant,Ramen Restaurant,Coffee Shop,Plaza,Steakhouse,Art Gallery,Sandwich Place
3,Downtown Toronto,St. James Town,43.651494,-79.375418,4,Gastropub,Coffee Shop,Café,Creperie,Art Gallery,Italian Restaurant,BBQ Joint,Cosmetics Shop,Food Truck,Restaurant
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Seafood Restaurant,Basketball Stadium,Bakery,Park,Cocktail Bar,Coffee Shop,Museum,Breakfast Spot,Concert Hall,Restaurant


In [32]:
# create map
map_clusters = folium.Map(location=[latitude_downtown_toronto, longitude_downtown_toronto], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon],radius=5,popup=label,color=rainbow[cluster-1],fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [33]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Coffee Shop,Park,Breakfast Spot,Farmers Market,Chocolate Shop,Pub,Restaurant,Performing Arts Venue,Dessert Shop,Bakery
1,"Queen's Park, Ontario Provincial Government",Coffee Shop,Sushi Restaurant,Wings Joint,Park,Arts & Crafts Store,Beer Bar,Burrito Place,Creperie,Diner,Distribution Center
4,Berczy Park,Seafood Restaurant,Basketball Stadium,Bakery,Park,Cocktail Bar,Coffee Shop,Museum,Breakfast Spot,Concert Hall,Restaurant
8,"Harbourfront East, Union Station, Toronto Islands",Park,Plaza,Hotel,Café,Supermarket,Performing Arts Venue,New American Restaurant,Bubble Tea Shop,Salad Place,Lake
12,"Kensington Market, Chinatown, Grange Park",Café,Vietnamese Restaurant,Mexican Restaurant,Caribbean Restaurant,Bakery,Wine Bar,Fish Market,Farmers Market,Dessert Shop,Coffee Shop
16,"St. James Town, Cabbagetown",Restaurant,Café,Gift Shop,Indian Restaurant,Deli / Bodega,Jewelry Store,Diner,Bakery,Japanese Restaurant,Italian Restaurant
18,Church and Wellesley,Pizza Place,Dance Studio,Park,Coffee Shop,Mexican Restaurant,Burger Joint,Bubble Tea Shop,Breakfast Spot,Ramen Restaurant,Martial Arts Dojo


In [34]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Plane,Harbor / Marina,Boat or Ferry,Boutique,Rental Car Location,Sculpture Garden,Airport Gate


In [35]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Garden District, Ryerson",Café,Mexican Restaurant,Shopping Mall,Restaurant,Ramen Restaurant,Coffee Shop,Plaza,Steakhouse,Art Gallery,Sandwich Place
10,"Commerce Court, Victoria Hotel",Café,Coffee Shop,Bakery,Gym,Gastropub,Ice Cream Shop,Japanese Restaurant,Deli / Bodega,Museum,Pub
11,"University of Toronto, Harbord",Restaurant,Bookstore,Bakery,Japanese Restaurant,Yoga Studio,Italian Restaurant,Café,College Gym,Comfort Food Restaurant,Sandwich Place


In [36]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Central Bay Street,Coffee Shop,Café,Sushi Restaurant,Modern European Restaurant,Bubble Tea Shop,Ramen Restaurant,Middle Eastern Restaurant,Sandwich Place,Spa,Japanese Restaurant
6,Christie,Grocery Store,Café,Park,Candy Store,Diner,Coffee Shop,Nightclub,Restaurant,Italian Restaurant,Baby Store
7,"Richmond, Adelaide, King",Coffee Shop,Seafood Restaurant,Gym / Fitness Center,Steakhouse,Hotel,Concert Hall,Lounge,Opera House,Café,Pizza Place
9,"Toronto Dominion Centre, Design Exchange",Café,Coffee Shop,Gym / Fitness Center,Steakhouse,Pub,Restaurant,Bookstore,Deli / Bodega,Beer Bar,Japanese Restaurant
14,Rosedale,Park,Playground,Trail,Deli / Bodega,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym
15,Stn A PO Boxes,Café,Cocktail Bar,Tailor Shop,Museum,Comfort Food Restaurant,Restaurant,Concert Hall,Beer Bar,Park,Farmers Market
17,"First Canadian Place, Underground city",Café,Coffee Shop,Restaurant,Gym / Fitness Center,Bookstore,Deli / Bodega,Bakery,Steakhouse,Pub,Seafood Restaurant


In [37]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,Gastropub,Coffee Shop,Café,Creperie,Art Gallery,Italian Restaurant,BBQ Joint,Cosmetics Shop,Food Truck,Restaurant


In [38]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],latitudes=manhattan_data['Latitude'],longitudes=manhattan_data['Longitude'],)

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [39]:
# check how many venues were returned for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,20,20,20,20,20,20
Carnegie Hill,20,20,20,20,20,20
Central Harlem,20,20,20,20,20,20
Chelsea,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Civic Center,20,20,20,20,20,20
Clinton,20,20,20,20,20,20
East Harlem,20,20,20,20,20,20
East Village,20,20,20,20,20,20
Financial District,20,20,20,20,20,20


In [40]:
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]
manhattan_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
# Set Index
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
# print each neighborhood along with the top 5 most common venues
num_top_venues = 5
for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
           venue  freq
0  Memorial Site  0.15
1           Park  0.15
2     Food Court  0.10
3       Building  0.05
4            Gym  0.05


----Carnegie Hill----
                  venue  freq
0  Gym / Fitness Center  0.10
1    Italian Restaurant  0.10
2                   Gym  0.10
3          Dance Studio  0.05
4          Gourmet Shop  0.05


----Central Harlem----
                venue  freq
0  African Restaurant  0.10
1   French Restaurant  0.10
2           Juice Bar  0.05
3          Bagel Shop  0.05
4                Café  0.05


----Chelsea----
                venue  freq
0  Seafood Restaurant  0.10
1      Ice Cream Shop  0.10
2             Theater  0.10
3            Beer Bar  0.05
4              Market  0.05


----Chinatown----
                 venue  freq
0       Sandwich Place  0.10
1   Chinese Restaurant  0.10
2  Indie Movie Theater  0.05
3             Tea Room  0.05
4          Pizza Place  0.05


----Civic Center----
                             venue  

In [42]:
# put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [43]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']
for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Memorial Site,Park,Food Court,Cooking School,Sandwich Place,Shopping Mall,Smoke Shop,Gym,BBQ Joint,Auditorium
1,Carnegie Hill,Gym,Italian Restaurant,Gym / Fitness Center,Community Center,Spa,Café,Shoe Store,Bookstore,Gourmet Shop,Coffee Shop
2,Central Harlem,French Restaurant,African Restaurant,Dessert Shop,Bagel Shop,Ethiopian Restaurant,Boutique,Library,Gym / Fitness Center,Beer Bar,Café
3,Chelsea,Theater,Ice Cream Shop,Seafood Restaurant,Scenic Lookout,Italian Restaurant,Chinese Restaurant,Market,Office,Beer Bar,Hotel
4,Chinatown,Sandwich Place,Chinese Restaurant,Indie Movie Theater,Tea Room,Hotpot Restaurant,Ice Cream Shop,Greek Restaurant,Museum,English Restaurant,New American Restaurant


In [44]:
# set number of clusters
kclusters = 5
manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 1, 0, 1, 0, 4, 0, 0, 2], dtype=int32)

In [45]:
manhattan_merged = manhattan_data
# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,0,Gym,Coffee Shop,Yoga Studio,Diner,Pharmacy,Seafood Restaurant,Steakhouse,Supplement Shop,Sandwich Place,Donut Shop
1,Manhattan,Chinatown,40.715618,-73.994279,2,Sandwich Place,Chinese Restaurant,Indie Movie Theater,Tea Room,Hotpot Restaurant,Ice Cream Shop,Greek Restaurant,Museum,English Restaurant,New American Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,1,Wine Shop,Café,Park,Deli / Bodega,Ramen Restaurant,Breakfast Spot,Market,Liquor Store,Cocktail Bar,Coffee Shop
3,Manhattan,Inwood,40.867684,-73.92121,0,Wine Bar,Café,Park,Yoga Studio,Diner,Spanish Restaurant,Farmers Market,Frozen Yogurt Shop,Bistro,Latin American Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,1,Yoga Studio,Mexican Restaurant,Cocktail Bar,Japanese Restaurant,Café,Smoke Shop,Caribbean Restaurant,Mediterranean Restaurant,Coffee Shop,Bakery


In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon],radius=5,popup=label,color=rainbow[cluster-1],fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [47]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Gym,Coffee Shop,Yoga Studio,Diner,Pharmacy,Seafood Restaurant,Steakhouse,Supplement Shop,Sandwich Place,Donut Shop
3,Inwood,Wine Bar,Café,Park,Yoga Studio,Diner,Spanish Restaurant,Farmers Market,Frozen Yogurt Shop,Bistro,Latin American Restaurant
5,Manhattanville,Italian Restaurant,BBQ Joint,Gastropub,Sushi Restaurant,Bike Trail,Coffee Shop,Lounge,Café,Juice Bar,Dumpling Restaurant
7,East Harlem,Mexican Restaurant,Latin American Restaurant,Thai Restaurant,Sandwich Place,Park,Beer Bar,Street Art,Steakhouse,Bakery,Cocktail Bar
8,Upper East Side,Hotel,Cosmetics Shop,Bar,Italian Restaurant,Jazz Club,Gym / Fitness Center,French Restaurant,Optical Shop,Park,Pet Store
10,Lenox Hill,Thai Restaurant,Gym,Cycle Studio,Liquor Store,Chinese Restaurant,Taco Place,Salad Place,French Restaurant,Restaurant,College Academic Building
11,Roosevelt Island,Coffee Shop,Residential Building (Apartment / Condo),Bus Line,Farmers Market,Food & Drink Shop,School,Dog Run,Liquor Store,Sandwich Place,Outdoors & Recreation
15,Midtown,Clothing Store,Hotel,Miscellaneous Shop,French Restaurant,Plaza,Park,Salad Place,Smoke Shop,Spa,Cycle Studio
20,Lower East Side,Art Gallery,Cocktail Bar,Yoga Studio,Japanese Restaurant,Bubble Tea Shop,French Restaurant,Filipino Restaurant,Chinese Restaurant,Mediterranean Restaurant,Coffee Shop
21,Tribeca,Park,Yoga Studio,Italian Restaurant,Playground,Poke Place,Coffee Shop,Salad Place,Café,Men's Store,Spa


In [48]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Wine Shop,Café,Park,Deli / Bodega,Ramen Restaurant,Breakfast Spot,Market,Liquor Store,Cocktail Bar,Coffee Shop
4,Hamilton Heights,Yoga Studio,Mexican Restaurant,Cocktail Bar,Japanese Restaurant,Café,Smoke Shop,Caribbean Restaurant,Mediterranean Restaurant,Coffee Shop,Bakery
13,Lincoln Square,Theater,Indie Movie Theater,Concert Hall,Performing Arts Venue,College Arts Building,Circus,Opera House,Fountain,Gift Shop,Library
18,Greenwich Village,Italian Restaurant,Yoga Studio,Sushi Restaurant,Beer Bar,Clothing Store,New American Restaurant,Gourmet Shop,Coffee Shop,French Restaurant,Jazz Club
19,East Village,Vietnamese Restaurant,Dessert Shop,Cheese Shop,Japanese Restaurant,Beer Store,Swiss Restaurant,Coffee Shop,Juice Bar,Bar,Dog Run
38,Flatiron,Cycle Studio,Wine Shop,American Restaurant,Japanese Restaurant,Gym / Fitness Center,Café,Sports Club,Furniture / Home Store,Sushi Restaurant,Donut Shop


In [49]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Sandwich Place,Chinese Restaurant,Indie Movie Theater,Tea Room,Hotpot Restaurant,Ice Cream Shop,Greek Restaurant,Museum,English Restaurant,New American Restaurant
9,Yorkville,Wine Shop,Deli / Bodega,Gym,Dog Run,Café,Liquor Store,Sandwich Place,Beer Store,Coffee Shop,Gym / Fitness Center
12,Upper West Side,Bakery,American Restaurant,Italian Restaurant,Pub,Bagel Shop,Bookstore,Tiki Bar,Chinese Restaurant,Greek Restaurant,Movie Theater
16,Murray Hill,Coffee Shop,Hotel,Burger Joint,Jewish Restaurant,Sandwich Place,Tea Room,Bar,Grocery Store,Event Space,Bagel Shop
22,Little Italy,Mediterranean Restaurant,Bakery,History Museum,Thai Restaurant,Mexican Restaurant,Spanish Restaurant,French Restaurant,Chinese Restaurant,Sandwich Place,Chocolate Shop
27,Gramercy,Pizza Place,Coffee Shop,Italian Restaurant,Yoga Studio,Bagel Shop,Sushi Restaurant,Gourmet Shop,Beer Bar,Bar,Park
31,Noho,Sandwich Place,Rock Club,Wine Shop,Italian Restaurant,Coffee Shop,French Restaurant,Boutique,Gourmet Shop,Greek Restaurant,Grocery Store
36,Tudor City,Park,Yoga Studio,Boxing Gym,Sushi Restaurant,Taco Place,Café,Seafood Restaurant,Thai Restaurant,Japanese Restaurant,Spanish Restaurant
39,Hudson Yards,American Restaurant,Hotel,Gym / Fitness Center,Park,Furniture / Home Store,Supermarket,Cocktail Bar,Building,Theater,Pedestrian Plaza


In [50]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Chelsea,Theater,Ice Cream Shop,Seafood Restaurant,Scenic Lookout,Italian Restaurant,Chinese Restaurant,Market,Office,Beer Bar,Hotel


In [51]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Central Harlem,French Restaurant,African Restaurant,Dessert Shop,Bagel Shop,Ethiopian Restaurant,Boutique,Library,Gym / Fitness Center,Beer Bar,Café
14,Clinton,Theater,Gym / Fitness Center,Hotel,Sporting Goods Shop,Peruvian Restaurant,Comedy Club,Cocktail Bar,Café,Building,Food Court


### RESULTS

After clustering the data of the respective neighborhoods, both cities have venues which can be explored and attract the tourists all over the world. The neighborhoods are much similar in features like parks, etc. it differs in terms of some unique places like historical places and monuments.