# TORONTO And NEW YORK - How Similar Are They?!

## Problem In Hand
Toronto is one of the main/major cities of Canada, in fact it is the largest and most populous city of Canada. 
Similarly, New York is one of the main/major cities of USA, on an interesting note it is also the largest and most populous city of USA.
Both are multicultural as well as the financial hubs of their respective countries.
Seeing these similarities it would be interesting to see how similar or diverse they are to visit.
As a tourist, I would like to understand how similar or dissimilar these 2 cities are with regards to restaurants, 
accommodation, places to visit and so on.

Tourism is one of the major pillars for the growth of an economy. 
Every city is unique in it's own way - culture, tradition, history and so on.
So, as a tourist it would be great to have information on similarities or dissimilarities between 2 cities which would 
allow one to plan accordingly - like where to stay, places of interest and so on.
For Instance - say I have visited New York before and like the restaurants I visited and now I am planning to visit Toronto
and would like to stay in a similar place like New York.

## Data Sources

For the above comparison to take place would be using the below sources -

For Toronto - 

1. WikiPedia Page to get Borough, Post codes and Neighborhood information - 
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

2. Geospatial Coordinates csv file used in Week 3 of the assignment to get Latitudes and Longitudes of the Borough's of Downtown Toronto

3. Foursquare API for getting nearby venues of Downtown Toronto

For New York - 

1. JSON file provided as part of the capstone project containing Borough, Neighborhood, Lat and Longitudes of Manhattan

2. Foursquare API for getting nearby venues of Manhattan

In [24]:
## Import required packages for the entire assignment
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import folium
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
import json

## For TORONTO

In [25]:
## Extract the webpage as text using requests package
website = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

## Extract the wiki table
soup = BeautifulSoup(website,'lxml')
soup.prettify()
my_table = soup.find('table',{'class':'wikitable sortable'})
link = soup.findAll('td')


## Scrape the table to get the required information
PostalCode=[]
Borough=[]
Neighborhood=[]
for l in range(0,859,3):
    PostalCode.append(link[l].text)

for b in range(1,860,3):
    Borough.append(link[b].text)

for n in range(2,861,3):
    Neighborhood.append(link[n].text)
    
Neighborhood= [w.replace("\n","") for w in Neighborhood]


df = pd.DataFrame()
df['PostalCode']=PostalCode
df['Borough']=Borough
df['Neighborhood']=Neighborhood

## Combining the Neighborhood from same Borough and PostalCode together
df_combined=df[df['Borough'] != 'Not assigned'].groupby(df['PostalCode']).agg({'Borough':'first','Neighborhood':','.join}).reset_index().reindex(columns=df.columns)


## In case there are no values assigned to the Neighborhood then assign the Borough Name to the Neighborhood
df_combined.loc[df_combined['Neighborhood'] == 'Not assigned','Neighborhood'] = df_combined['Borough']


## Read the csv file using pandas
geo = pd.read_csv("Geospatial_Coordinates.csv")

## Rename the columns as required
geo.rename(columns={'Postal Code':'PostalCode'},inplace=True)

## Merging the new data with old using PostalCode field to get the latitudes and longitudes into one single data frame
geo_combined = pd.merge(df_combined, geo , on='PostalCode')

In [28]:
# Sorting
# set index for only Downtown Toronto
downtown_toronto_data = geo_combined[geo_combined['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
# eliminate 'Postcode' column
downtown_toronto_data=downtown_toronto_data.drop(['PostalCode'], axis=1)
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,Rosedale,43.679563,-79.377529
1,Downtown Toronto,"Cabbagetown,St. James Town",43.667967,-79.367675
2,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,Downtown Toronto,Harbourfront,43.65426,-79.360636
4,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937


In [29]:
## Displaying the shape of the dataset
downtown_toronto_data.shape

(19, 4)

## Data Exploration

In [30]:
## Have removed my credentials to access the foursquare api
## Extracting nearby venues for the Neighborhoods from foursquare api
CLIENT_ID=""
CLIENT_SECRET=""
VERSION="20180605"
LIMIT=150
radius=500
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [62]:
toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude']
                                  )

Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Queen's Park


In [64]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
1,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
2,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
3,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
4,"Cabbagetown,St. James Town",43.667967,-79.367675,Butter Chicken Factory,43.667072,-79.369184,Indian Restaurant


In [65]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,56,56,56,56,56,56
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",17,17,17,17,17,17
"Cabbagetown,St. James Town",49,49,49,49,49,49
Central Bay Street,82,82,82,82,82,82
"Chinatown,Grange Park,Kensington Market",84,84,84,84,84,84
Christie,18,18,18,18,18,18
Church and Wellesley,83,83,83,83,83,83
"Commerce Court,Victoria Hotel",100,100,100,100,100,100
"Design Exchange,Toronto Dominion Centre",100,100,100,100,100,100


In [66]:
## Checking for unique categories per venue category
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 206 uniques categories.


## Feature Engineering

In [67]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [68]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0
5,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.035714,0.0,0.059524,0.011905,0.0,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,...,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0
8,"Commerce Court,Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
9,"Design Exchange,Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0


In [69]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
             venue  freq
0      Coffee Shop  0.06
1  Thai Restaurant  0.04
2             Café  0.04
3              Bar  0.04
4   Cosmetics Shop  0.03


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.04
2            Beer Bar  0.04
3                Café  0.04
4  Seafood Restaurant  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
              venue  freq
0   Airport Service  0.18
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3     Boat or Ferry  0.06
4   Harbor / Marina  0.06


----Cabbagetown,St. James Town----
         venue  freq
0  Coffee Shop  0.06
1         Park  0.06
2         Café  0.06
3   Restaurant  0.06
4  Pizza Place  0.04


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.17
1   Italian Restaurant  0.05
2         Burger Joint  0.04
3       Ice Cream Shop  0.04
4  Japanese Restaurant  0

In [70]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [71]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Bar,Thai Restaurant,Sushi Restaurant,Bakery,Restaurant,Cosmetics Shop,Steakhouse,Burger Joint
1,Berczy Park,Coffee Shop,Cheese Shop,Steakhouse,Cocktail Bar,Café,Seafood Restaurant,Bakery,Farmers Market,Beer Bar,French Restaurant
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Lounge,Airport Terminal,Boutique,Plane,Sculpture Garden,Airport,Airport Food Court,Airport Gate,Harbor / Marina
3,"Cabbagetown,St. James Town",Coffee Shop,Restaurant,Park,Café,Pizza Place,Italian Restaurant,Bakery,Pub,Playground,Japanese Restaurant
4,Central Bay Street,Coffee Shop,Italian Restaurant,Japanese Restaurant,Juice Bar,Sandwich Place,Burger Joint,Ice Cream Shop,Gym / Fitness Center,Department Store,Thai Restaurant


## Data Modelling using Clustering

In [72]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 3, 0, 4, 0, 2, 0, 0, 0])

In [73]:
downtown_toronto_merged = downtown_toronto_data

# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Rosedale,43.679563,-79.377529,0,Park,Playground,Trail,Dance Studio,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner
1,Downtown Toronto,"Cabbagetown,St. James Town",43.667967,-79.367675,0,Coffee Shop,Restaurant,Park,Café,Pizza Place,Italian Restaurant,Bakery,Pub,Playground,Japanese Restaurant
2,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,3,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Café,Fast Food Restaurant,Gastropub,Pub,Men's Store
3,Downtown Toronto,Harbourfront,43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Restaurant,Café,Mexican Restaurant,Electronics Store
4,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,4,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Japanese Restaurant,Bakery,Tea Room,Lingerie Store,Italian Restaurant,Middle Eastern Restaurant


In [75]:
# create map
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [76]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster 1

In [78]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Rosedale,Park,Playground,Trail,Dance Studio,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner
1,"Cabbagetown,St. James Town",Coffee Shop,Restaurant,Park,Café,Pizza Place,Italian Restaurant,Bakery,Pub,Playground,Japanese Restaurant
3,Harbourfront,Coffee Shop,Park,Bakery,Pub,Theater,Breakfast Spot,Restaurant,Café,Mexican Restaurant,Electronics Store
5,St. James Town,Coffee Shop,Café,Restaurant,Clothing Store,Hotel,Breakfast Spot,Cosmetics Shop,Diner,Beer Bar,Bakery
7,Central Bay Street,Coffee Shop,Italian Restaurant,Japanese Restaurant,Juice Bar,Sandwich Place,Burger Joint,Ice Cream Shop,Gym / Fitness Center,Department Store,Thai Restaurant
8,"Adelaide,King,Richmond",Coffee Shop,Café,Bar,Thai Restaurant,Sushi Restaurant,Bakery,Restaurant,Cosmetics Shop,Steakhouse,Burger Joint
9,"Harbourfront East,Toronto Islands,Union Station",Coffee Shop,Aquarium,Hotel,Italian Restaurant,Café,Restaurant,Brewery,Fried Chicken Joint,Scenic Lookout,Bakery
10,"Design Exchange,Toronto Dominion Centre",Coffee Shop,Café,Hotel,Restaurant,Italian Restaurant,Bakery,Seafood Restaurant,Steakhouse,American Restaurant,Bar
11,"Commerce Court,Victoria Hotel",Coffee Shop,Café,Restaurant,Hotel,Gym,Italian Restaurant,Deli / Bodega,Steakhouse,Seafood Restaurant,Bakery
16,"First Canadian Place,Underground city",Coffee Shop,Café,Steakhouse,Restaurant,Gastropub,Burger Joint,Asian Restaurant,Bar,Seafood Restaurant,American Restaurant


## Cluster 2

In [79]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Stn A PO Boxes 25 The Esplanade,Coffee Shop,Café,Restaurant,Japanese Restaurant,Italian Restaurant,Hotel,Beer Bar,Seafood Restaurant,Farmers Market,Lounge


## Cluster 3

In [80]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Berczy Park,Coffee Shop,Cheese Shop,Steakhouse,Cocktail Bar,Café,Seafood Restaurant,Bakery,Farmers Market,Beer Bar,French Restaurant


## Cluster 4

In [81]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Church and Wellesley,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Café,Fast Food Restaurant,Gastropub,Pub,Men's Store


## Cluster 5

In [82]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"Ryerson,Garden District",Coffee Shop,Clothing Store,Cosmetics Shop,Café,Japanese Restaurant,Bakery,Tea Room,Lingerie Store,Italian Restaurant,Middle Eastern Restaurant
12,"Harbord,University of Toronto",Café,Bar,Restaurant,Japanese Restaurant,Bookstore,Bakery,Italian Restaurant,Beer Bar,Beer Store,College Gym
13,"Chinatown,Grange Park,Kensington Market",Bar,Café,Vietnamese Restaurant,Chinese Restaurant,Coffee Shop,Dumpling Restaurant,Bakery,Mexican Restaurant,Vegetarian / Vegan Restaurant,Cocktail Bar
14,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Lounge,Airport Terminal,Boutique,Plane,Sculpture Garden,Airport,Airport Food Court,Airport Gate,Harbor / Marina


## For Manhattan

## Data Acquisition/Extraction

In [22]:
with open('nyu_2451_34572-geojson.json') as json_data:
    newyork_data = json.load(json_data)

ny_neighborhoods_data=newyork_data['features']
ny_neighborhoods_data[0]

# define the dataframe columns
ny_column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
ny_neighborhoods = pd.DataFrame(columns=ny_column_names)


for data in ny_neighborhoods_data:
    ny_borough = ny_neighborhood_name = data['properties']['borough'] 
    ny_neighborhood_name = data['properties']['name']
        
    ny_neighborhood_latlon = data['geometry']['coordinates']
    ny_neighborhood_lat = ny_neighborhood_latlon[1]
    ny_neighborhood_lon = ny_neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'Borough': ny_borough,
                                          'Neighborhood': ny_neighborhood_name,
                                          'Latitude': ny_neighborhood_lat,
                                          'Longitude': ny_neighborhood_lon}, ignore_index=True)

In [23]:
manhattan_data = ny_neighborhoods[ny_neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [63]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [83]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,95,95,95,95,95,95
Carnegie Hill,99,99,99,99,99,99
Central Harlem,45,45,45,45,45,45
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100
East Harlem,41,41,41,41,41,41
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


In [84]:
## Checking for unique categories per venue category
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 334 uniques categories.


## Feature Engineering

In [85]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Vietnamese Restaurant,Volleyball Court,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [86]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Vietnamese Restaurant,Volleyball Court,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.0,0.021053,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,...,0.020202,0.0,0.0,0.0,0.0,0.010101,0.030303,0.0,0.010101,0.030303
2,Central Harlem,0.0,0.0,0.0,0.044444,0.044444,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
5,Civic Center,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.03
6,Clinton,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,...,0.02,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0
9,Financial District,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0


In [87]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


----Battery Park City----
           venue  freq
0    Coffee Shop  0.07
1           Park  0.07
2          Hotel  0.05
3            Gym  0.04
4  Shopping Mall  0.03


----Carnegie Hill----
                venue  freq
0         Coffee Shop  0.07
1         Pizza Place  0.04
2                Café  0.04
3  Italian Restaurant  0.03
4              Bakery  0.03


----Central Harlem----
                 venue  freq
0   Chinese Restaurant  0.07
1   African Restaurant  0.04
2  American Restaurant  0.04
3                  Bar  0.04
4   Seafood Restaurant  0.04


----Chelsea----
                venue  freq
0         Coffee Shop  0.06
1  Italian Restaurant  0.05
2      Ice Cream Shop  0.04
3              Bakery  0.04
4           Nightclub  0.03


----Chinatown----
                   venue  freq
0     Chinese Restaurant  0.09
1           Cocktail Bar  0.05
2  Vietnamese Restaurant  0.04
3    American Restaurant  0.04
4                    Spa  0.03


----Civic Center----
                  venue  freq


In [88]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Coffee Shop,Park,Hotel,Gym,Shopping Mall,Memorial Site,Plaza,Food Court,Sandwich Place,Clothing Store
1,Carnegie Hill,Coffee Shop,Café,Pizza Place,Yoga Studio,French Restaurant,Japanese Restaurant,Bakery,Italian Restaurant,Bookstore,Gym
2,Central Harlem,Chinese Restaurant,Cosmetics Shop,American Restaurant,Bar,French Restaurant,Seafood Restaurant,African Restaurant,Music Venue,Gym,Café
3,Chelsea,Coffee Shop,Italian Restaurant,Bakery,Ice Cream Shop,Nightclub,American Restaurant,Hotel,Art Gallery,Wine Shop,Theater
4,Chinatown,Chinese Restaurant,Cocktail Bar,American Restaurant,Vietnamese Restaurant,Hotpot Restaurant,Optical Shop,Spa,Bakery,Salon / Barbershop,Sandwich Place


## Data Modelling Using Clustering

In [89]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans_man = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_man.labels_[0:10] 


array([3, 1, 1, 0, 1, 0, 0, 4, 0, 3])

In [90]:
manhattan_merged = manhattan_data

# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans_man.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!



Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,3,Gym,American Restaurant,Sandwich Place,Coffee Shop,Diner,Miscellaneous Shop,Steakhouse,Supplement Shop,Shopping Mall,Seafood Restaurant
1,Manhattan,Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Cocktail Bar,American Restaurant,Vietnamese Restaurant,Hotpot Restaurant,Optical Shop,Spa,Bakery,Salon / Barbershop,Sandwich Place
2,Manhattan,Washington Heights,40.851903,-73.9369,1,Café,Grocery Store,Bakery,Mobile Phone Shop,Pizza Place,Mexican Restaurant,Spanish Restaurant,Chinese Restaurant,Supermarket,Supplement Shop
3,Manhattan,Inwood,40.867684,-73.92121,0,Mexican Restaurant,Pizza Place,Restaurant,Café,Lounge,Spanish Restaurant,Bakery,Deli / Bodega,Park,Chinese Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,1,Pizza Place,Café,Coffee Shop,Deli / Bodega,Mexican Restaurant,Yoga Studio,Sandwich Place,Bakery,Caribbean Restaurant,Chinese Restaurant


In [93]:
# create map
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan City are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Manhattan City are 40.7896239, -73.9598939.


In [94]:
map_clusters_ny = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_ny)
       
map_clusters_ny

## Cluster 1

In [95]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Inwood,Mexican Restaurant,Pizza Place,Restaurant,Café,Lounge,Spanish Restaurant,Bakery,Deli / Bodega,Park,Chinese Restaurant
5,Manhattanville,Coffee Shop,Park,Italian Restaurant,Mexican Restaurant,Seafood Restaurant,Deli / Bodega,Bus Stop,Cosmetics Shop,Café,Bike Trail
6,Central Harlem,Chinese Restaurant,Cosmetics Shop,American Restaurant,Bar,French Restaurant,Seafood Restaurant,African Restaurant,Music Venue,Gym,Café
8,Upper East Side,Italian Restaurant,Exhibit,Art Gallery,Bakery,Coffee Shop,Gym / Fitness Center,French Restaurant,Juice Bar,Hotel,Pizza Place
11,Roosevelt Island,Sandwich Place,Park,Pizza Place,Residential Building (Apartment / Condo),Farmers Market,Bubble Tea Shop,Metro Station,Supermarket,School,Outdoors & Recreation
12,Upper West Side,Italian Restaurant,Wine Bar,Bar,Coffee Shop,Indian Restaurant,Bakery,Mediterranean Restaurant,Café,Seafood Restaurant,Pizza Place
14,Clinton,Theater,Gym / Fitness Center,Italian Restaurant,Coffee Shop,American Restaurant,Spa,Wine Shop,Sandwich Place,Gym,Hotel
17,Chelsea,Coffee Shop,Italian Restaurant,Bakery,Ice Cream Shop,Nightclub,American Restaurant,Hotel,Art Gallery,Wine Shop,Theater
27,Gramercy,Italian Restaurant,Pizza Place,Bagel Shop,Thai Restaurant,Thrift / Vintage Store,Mexican Restaurant,Bar,Grocery Store,Hotel,Cocktail Bar
32,Civic Center,Coffee Shop,Gym / Fitness Center,French Restaurant,Italian Restaurant,Hotel,Yoga Studio,Spa,Park,Bakery,Cocktail Bar


## Cluster 2

In [96]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Cocktail Bar,American Restaurant,Vietnamese Restaurant,Hotpot Restaurant,Optical Shop,Spa,Bakery,Salon / Barbershop,Sandwich Place
2,Washington Heights,Café,Grocery Store,Bakery,Mobile Phone Shop,Pizza Place,Mexican Restaurant,Spanish Restaurant,Chinese Restaurant,Supermarket,Supplement Shop
4,Hamilton Heights,Pizza Place,Café,Coffee Shop,Deli / Bodega,Mexican Restaurant,Yoga Studio,Sandwich Place,Bakery,Caribbean Restaurant,Chinese Restaurant
10,Lenox Hill,Coffee Shop,Italian Restaurant,Pizza Place,Sushi Restaurant,Cocktail Bar,Gym,Café,Gym / Fitness Center,Burger Joint,Wine Shop
18,Greenwich Village,Italian Restaurant,Sushi Restaurant,Clothing Store,Café,French Restaurant,Indian Restaurant,Seafood Restaurant,Cosmetics Shop,Burger Joint,Chinese Restaurant
23,Soho,Clothing Store,Boutique,Italian Restaurant,Art Gallery,Women's Store,Mediterranean Restaurant,Shoe Store,Sporting Goods Shop,Bakery,Men's Store
24,West Village,Italian Restaurant,New American Restaurant,Cosmetics Shop,Park,Wine Bar,American Restaurant,Cocktail Bar,Coffee Shop,Ice Cream Shop,Jazz Club
26,Morningside Heights,Park,Bookstore,American Restaurant,Coffee Shop,Food Truck,Deli / Bodega,Sandwich Place,Burger Joint,New American Restaurant,Seafood Restaurant
29,Financial District,Coffee Shop,Bar,American Restaurant,Gym,Hotel,Pizza Place,Food Truck,Steakhouse,Gym / Fitness Center,Cocktail Bar
31,Noho,Italian Restaurant,Hotel,French Restaurant,Cocktail Bar,Coffee Shop,Pizza Place,American Restaurant,Grocery Store,Art Gallery,Rock Club


## Cluster 3

In [97]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Carnegie Hill,Coffee Shop,Café,Pizza Place,Yoga Studio,French Restaurant,Japanese Restaurant,Bakery,Italian Restaurant,Bookstore,Gym


## Cluster 4

In [98]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Gym,American Restaurant,Sandwich Place,Coffee Shop,Diner,Miscellaneous Shop,Steakhouse,Supplement Shop,Shopping Mall,Seafood Restaurant
9,Yorkville,Italian Restaurant,Coffee Shop,Bar,Gym,Deli / Bodega,Pizza Place,Wine Shop,Sushi Restaurant,Japanese Restaurant,Diner
22,Little Italy,Bakery,Café,Italian Restaurant,Sandwich Place,Salon / Barbershop,Bubble Tea Shop,Mediterranean Restaurant,Cocktail Bar,Tea Room,Hotpot Restaurant
25,Manhattan Valley,Indian Restaurant,Pizza Place,Bar,Yoga Studio,Mexican Restaurant,Playground,Coffee Shop,Hostel,Deli / Bodega,Plaza
28,Battery Park City,Coffee Shop,Park,Hotel,Gym,Shopping Mall,Memorial Site,Plaza,Food Court,Sandwich Place,Clothing Store


## Cluster 5

In [99]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,East Harlem,Mexican Restaurant,Latin American Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Fast Food Restaurant,Convenience Store,French Restaurant,Liquor Store,Taco Place
13,Lincoln Square,Café,Concert Hall,Plaza,Italian Restaurant,Theater,Performing Arts Venue,Gym / Fitness Center,American Restaurant,French Restaurant,Indie Movie Theater
15,Midtown,Hotel,Coffee Shop,Sporting Goods Shop,Clothing Store,French Restaurant,Bookstore,Steakhouse,Café,Bakery,Cocktail Bar
16,Murray Hill,Sandwich Place,Coffee Shop,Japanese Restaurant,American Restaurant,Gym / Fitness Center,Hotel,Italian Restaurant,Bar,Bagel Shop,Chinese Restaurant
19,East Village,Bar,Ice Cream Shop,Wine Bar,Pizza Place,Chinese Restaurant,Mexican Restaurant,Italian Restaurant,Korean Restaurant,Japanese Restaurant,Speakeasy
20,Lower East Side,Chinese Restaurant,Café,Pizza Place,Coffee Shop,Art Gallery,Ramen Restaurant,Japanese Restaurant,Cocktail Bar,Bakery,Pharmacy
21,Tribeca,American Restaurant,Park,Italian Restaurant,Café,Spa,Men's Store,Coffee Shop,Greek Restaurant,Wine Bar,Steakhouse
33,Midtown South,Korean Restaurant,Japanese Restaurant,Hotel,Hotel Bar,Dessert Shop,American Restaurant,Cosmetics Shop,Coffee Shop,Cocktail Bar,Salad Place
37,Stuyvesant Town,Bar,Park,Coffee Shop,Gym / Fitness Center,Baseball Field,Playground,Farmers Market,German Restaurant,Harbor / Marina,Gas Station
39,Hudson Yards,American Restaurant,Gym / Fitness Center,Hotel,Italian Restaurant,Café,Bar,Coffee Shop,Spanish Restaurant,Gym,Dog Run


## Results

After clustering the data of the respective neighborhoods, both cities (Boroughs) have venues which can be explored and attract the Tourists. The neighborhoods are much similar in features like Theaters, opera houses, food places, clubs, museums, parks etc. As far as concern to dissimilarity, it differs in terms of some unique places like historical places and monuments.


## Observations
When we compare the tourist places, we observe that the historical place is only situated in Downtown Toronto and the Monument or landmark venue is in Manhattan neighborhoods. Similarly, Airport facility, Harbor, Sculpture garden and Boat or ferry services are also available i**n Downtown Toronto while venues like Nightlife, Climbing gym and Museums are present in Manhattan.
## Conclusion
The downtown Toronto and Manhattan neighborhoods have more like similar venues. As we know that every place is unique in its own way, so that’s argument is present in both neighborhoods. The dissimilarity exists in terms of some different venues and facilities but not on a larger extent.