# CAPSTONE PROJECT - BATTLE OF NEIGHBORHOODS


## INTRODUCTION <a name="introduction"></a>

The   city   of   Toronto   in   Canada   and   that   of   New   York   in   the   United   States   are   one   of   the   most 
 famous   places   in   the   world.   Not   only   are   they   multicultural,   but   also   the   financial   hubs   of   their 
 respective   countries.   So,   the   aim   of   this   project   is   to   explore   the   similarities   and   dissimilarities   in 
 these   two   diverse   cities   from   the   perspective   of   a   tourist   who   wants   to   visit   one   of   these   two 
 cities   keeping   in   mind   the   areas   of   food,   places   to   visit,   culture,   accomodation,   etc

## DATA <a name="data"></a>

   The   services   of   the   Foursquare   API   have   been   used   to   explore   the   data   of   the 
 two   cities,   namely   Toronto   and   New   York,   in   terms   of   their   neighborhoods. 
 The   data   also   includes   the   information   about   the   places   that   are   present   in   each 
 neighborhood   like   restaurants,   hotels,   coffee   shops,   parks,   theaters,   art   galleries, 
 museums   and   many   more.   One   Borough   is   selected   from   each   city   to   analyze   their 
 neighborhoods.   For   this   project,   Manhattan   from   New   York   and   Downtown   Toronto   from 
 Toronto   have   been   selected.   

## METHODOLOGY

  As   two   boroughs   (one   from   each   city)   have   been   selected 
 to   explore   their   neighborhoods,   the   data   exploration,   analysis   and   visualization   for   both 
 boroughs   are   done   in   the   same   way   but   separately.   The   neighborhoods   are   further 
 characterized   as   venues   and   venue   categories. Then, the   machine   learning   technique,   “Clustering”   is   used   to   segment   the neighborhoods   with   similar   objects   on   the   basis   of   each   neighborhood   data.   These 
 objects   are   given   priority   on   the   basis   of   foot   traffic   (activity)   in   their   respective 
 neighborhoods.   This   will   help   to   locate   the   tourist’s   areas   and   hubs,   and   then   we   can 
 judge   the   similarity   or   dissimilarity   between   two   cities   on   that   basis. 

## EXPLORATION

For   the   Downtown   Toronto   case,   the   table   of   Toronto’s   Borough   from   the   Wikipedia 
 page   has   been   extracted.   Then   the   data   is   arranged   according   to   our   requirements. 
 Then,   neighborhoods   which   have   the   same   geographical   coordinates   at   each   borough 
 have   been   combined   and   sorted   against   the   concerned   borough   using   the   csv   file.   For 
 data   verification   and   further   exploration,   Foursquare   API   is   used   to   get   the   coordinates 
 of   Downtown   Toronto   and   explore   its   neighborhoods. 
 For   Manhattan,   the   dataset   from   the   json   file   to   do   the   same   exploration   and   analysis   as 
 done   for   Downtown   Toronto   and   then   Foursquare   API   is   used   to   do   the   exploration   of   its 
 neighborhoods. 
 
 

## PREPROCESSING

### Importing required libraries

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing
# Visualization
import matplotlib.pyplot
import seaborn as sns
# Too see full dataframe...
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                       

### Now, we will extract data for dataframe of Downtown Toronto which will be derived from the Toronto dataframe

In [9]:
import pprint
from bs4 import BeautifulSoup
URL = r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
actual_content = soup.table
table_content = actual_content.tbody
info = list(table_content.find_all('tr'))
actual_data = []
for i in info : 
  actual_data.append([j.string.replace('\n','') for j in i.find_all('td')])
head = [j.string.replace('\n','') for j in info[0].find_all('th')]
df = pd.DataFrame(actual_data[1:],columns = head)
df.drop(df.index[df['Borough'] == 'Not assigned'], inplace = True)
df = df.reset_index(drop = True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### Now we will merge the data of boroghs and geographical coordinates

In [10]:
df1=pd.read_csv('https://cocl.us/Geospatial_data')
df_final = pd.merge(left=df, right=df1, left_on='Postal Code', right_on='Postal Code')
df_final.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


#### Now we will only take the data of Downtown Toronto

In [12]:
downtown_toronto_data = df_final[df_final['Borough'].str.contains("Downtown Toronto")].reset_index(drop=True)
downtown_toronto_data=downtown_toronto_data.drop(['Postal Code'], axis=1)
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


### Now we will move towards New York Boroughs. We will select "Manhattan" as a Borough and anylze its neighborhoods later

In [13]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [16]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

#### Now we will transform data into dataframe

In [17]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Now we will create a dataframe of data of Manhattan

In [18]:
# Creating new Dataframe manhattan_data
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


### Foursquare API

In [19]:
CLIENT_ID = 'GTVVD23UNJDTNFAYV23DAS0VHUN4NMSGWWVQKUFREWIWG5TE' 
CLIENT_SECRET = '5NE04HTZ3BPELSN2114VZRI1KQ0QALUTR2KF4E3WHXNZO0SJ' 
VERSION = '20180605' # Foursquare API version
limit = 20
print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:GTVVD23UNJDTNFAYV23DAS0VHUN4NMSGWWVQKUFREWIWG5TE
CLIENT_SECRET:5NE04HTZ3BPELSN2114VZRI1KQ0QALUTR2KF4E3WHXNZO0SJ


#### Now we will find the coordinates of Downtown Toronto

In [21]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

Downtown Toronto latitude 43.6563221 & longitude -79.3809161


#### Now we determine the coordinates of Manhattan

In [22]:
address = 'Manhattan, NY'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


## VISUALIZATION

We will visualize the data many times at different stages. In the beginning, we will visualize the selected borough neighborhoods so that we can get an idea or confirmation regarding the coordinates of that Borough. The second time after clustering the neighborhoods, we will visualize the clusters to name them. 

### Downtown Toronto (before clustering)

In [23]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_downtown_toronto)  

map_downtown_toronto

#### Now we will instantiate a mark cluster object for the incidents in the dataframe

In [25]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(incidents)  
    
map_downtown_toronto

### Manhattan (Before Clustering)

In [26]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_manhattan)  
    
map_manhattan

#### Now we will instantiate a mark cluster object for the incidents in the dataframe

In [27]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)
grouping = plugins.MarkerCluster().add_to(map_manhattan)
# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(grouping)  
    
map_manhattan

## ANALYSIS

We will analyze both boroughs' neighborhoods through one hot encoding (giving ‘1’ if a venue category is there, and ‘0’ in case of venue category is not there). On the basis of one hot encoding, we will calculate mean of the frequency of occurrence of each category and pick top ten venues on that basis for each neighborhood.

In [28]:
# create a function to repeat the process to all the neighborhoods in Toronto and Manhattan
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,limit)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(name,lat,lng, v['venue']['name'],v['venue']['location']['lat'], v['venue']['location']['lng'],  v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood','Neighborhood Latitude','Neighborhood Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']
    return(nearby_venues)

 ### Exploring Neighborhoods in Downtown Toronto

Create a dataframe by running getNearbyVenues function on each neighborhood

In [29]:
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],latitudes=downtown_toronto_data['Latitude'],longitudes=downtown_toronto_data['Longitude'],)

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [31]:
# check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,20,20,20,20,20,20
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17
Central Bay Street,20,20,20,20,20,20
Christie,17,17,17,17,17,17
Church and Wellesley,20,20,20,20,20,20
"Commerce Court, Victoria Hotel",20,20,20,20,20,20
"First Canadian Place, Underground city",20,20,20,20,20,20
"Garden District, Ryerson",20,20,20,20,20,20
"Harbourfront East, Union Station, Toronto Islands",20,20,20,20,20,20
"Kensington Market, Chinatown, Grange Park",20,20,20,20,20,20


### Analyzing each neighborhood

One Hot Encoding

In [32]:
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]
downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bar,Basketball Stadium,Beer Bar,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Distribution Center,Electronics Store,Farmers Market,Fish Market,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Gastropub,General Entertainment,General Travel,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hobby Shop,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Lake,Liquor Store,Lounge,Martial Arts Dojo,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Opera House,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()
# print each neighborhood along with the top 5 most common venues
num_top_venues = 5
for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0  Seafood Restaurant  0.10
1        Cocktail Bar  0.05
2              Bakery  0.05
3         Coffee Shop  0.05
4          Restaurant  0.05


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0   Airport Service  0.18
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3   Harbor / Marina  0.06
4           Airport  0.06


----Central Bay Street----
                        venue  freq
0                 Coffee Shop  0.30
1   Middle Eastern Restaurant  0.05
2                        Park  0.05
3                         Spa  0.05
4  Modern European Restaurant  0.05


----Christie----
           venue  freq
0  Grocery Store  0.24
1           Café  0.18
2           Park  0.12
3      Nightclub  0.06
4          Diner  0.06


----Church and Wellesley----
             venue  freq
0      Pizza Place  0.05
1  Bubble Tea Shop  0.05
2        Bookstore  0.05
3     

In [34]:
# put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood

In [35]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']
for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Cheese Shop,Farmers Market,Beer Bar,Basketball Stadium,Bakery,Restaurant,Concert Hall,Breakfast Spot,Bistro
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Plane,Harbor / Marina,Rental Car Location,Bar,Boat or Ferry,Sculpture Garden,Coffee Shop
2,Central Bay Street,Coffee Shop,Café,Sushi Restaurant,Modern European Restaurant,Bubble Tea Shop,Ramen Restaurant,Middle Eastern Restaurant,Sandwich Place,Bar,Spa
3,Christie,Grocery Store,Café,Park,Italian Restaurant,Candy Store,Coffee Shop,Nightclub,Restaurant,Baby Store,Athletics & Sports
4,Church and Wellesley,Pizza Place,Park,Restaurant,Bookstore,Ramen Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Mexican Restaurant,Beer Bar


### Clustering Neighborhoods

In [36]:
# set number of clusters
kclusters = 5
downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 4, 0, 3, 1, 3, 3, 3, 1, 3], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [37]:
downtown_toronto_merged = downtown_toronto_data
# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
downtown_toronto_merged.head() 

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Park,Breakfast Spot,Farmers Market,Chocolate Shop,Pub,Restaurant,Performing Arts Venue,Bakery,Dessert Shop
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4,Coffee Shop,Sushi Restaurant,Wings Joint,Park,Arts & Crafts Store,Beer Bar,Burrito Place,Creperie,Diner,Distribution Center
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Café,Mexican Restaurant,Shopping Mall,Restaurant,Ramen Restaurant,Coffee Shop,Plaza,Steakhouse,Art Gallery,Sandwich Place
3,Downtown Toronto,St. James Town,43.651494,-79.375418,3,Gastropub,Coffee Shop,Café,Creperie,Art Gallery,BBQ Joint,Italian Restaurant,Cosmetics Shop,Food Truck,Restaurant
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Seafood Restaurant,Cheese Shop,Farmers Market,Beer Bar,Basketball Stadium,Bakery,Restaurant,Concert Hall,Breakfast Spot,Bistro


Create a map to visualize the clusters

In [38]:
# create map
map_clusters = folium.Map(location=[latitude_downtown_toronto, longitude_downtown_toronto], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon],radius=5,popup=label,color=rainbow[cluster-1],fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

#### CLUSTER 1 (COMMERCIAL PLACES)

In [39]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Garden District, Ryerson",Café,Mexican Restaurant,Shopping Mall,Restaurant,Ramen Restaurant,Coffee Shop,Plaza,Steakhouse,Art Gallery,Sandwich Place
10,"Commerce Court, Victoria Hotel",Café,Coffee Shop,Bakery,Gym,Gastropub,Ice Cream Shop,Japanese Restaurant,Deli / Bodega,Museum,Pub
11,"University of Toronto, Harbord",Bakery,Bookstore,Restaurant,Japanese Restaurant,Italian Restaurant,Café,College Gym,Comfort Food Restaurant,Sandwich Place,Beer Bar


#### CLUSTER 2  (TOURIST PLACES & HUBS)

In [40]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Coffee Shop,Park,Breakfast Spot,Farmers Market,Chocolate Shop,Pub,Restaurant,Performing Arts Venue,Bakery,Dessert Shop
4,Berczy Park,Seafood Restaurant,Cheese Shop,Farmers Market,Beer Bar,Basketball Stadium,Bakery,Restaurant,Concert Hall,Breakfast Spot,Bistro
8,"Harbourfront East, Union Station, Toronto Islands",Park,Plaza,Hotel,Café,Supermarket,Performing Arts Venue,New American Restaurant,Bubble Tea Shop,Salad Place,Lake
16,"St. James Town, Cabbagetown",Restaurant,Café,Gift Shop,Indian Restaurant,Deli / Bodega,Jewelry Store,Bakery,Diner,Japanese Restaurant,Italian Restaurant
18,Church and Wellesley,Pizza Place,Park,Restaurant,Bookstore,Ramen Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Mexican Restaurant,Beer Bar


#### CLUSTER 3 (AIRPORT LOUNGE, CAFE)

In [41]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Plane,Harbor / Marina,Rental Car Location,Bar,Boat or Ferry,Sculpture Garden,Coffee Shop


#### CLUSTER 4 (RESIDENTIAL)

In [42]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,Gastropub,Coffee Shop,Café,Creperie,Art Gallery,BBQ Joint,Italian Restaurant,Cosmetics Shop,Food Truck,Restaurant
5,Central Bay Street,Coffee Shop,Café,Sushi Restaurant,Modern European Restaurant,Bubble Tea Shop,Ramen Restaurant,Middle Eastern Restaurant,Sandwich Place,Bar,Spa
6,Christie,Grocery Store,Café,Park,Italian Restaurant,Candy Store,Coffee Shop,Nightclub,Restaurant,Baby Store,Athletics & Sports
7,"Richmond, Adelaide, King",Coffee Shop,Seafood Restaurant,Gym / Fitness Center,Steakhouse,Hotel,Concert Hall,Lounge,Opera House,Café,Pizza Place
9,"Toronto Dominion Centre, Design Exchange",Café,Coffee Shop,Gym / Fitness Center,Steakhouse,Pub,Bookstore,Restaurant,Sandwich Place,Beer Bar,Deli / Bodega
12,"Kensington Market, Chinatown, Grange Park",Café,Vietnamese Restaurant,Mexican Restaurant,Caribbean Restaurant,Bakery,Wine Bar,Fish Market,Dessert Shop,Coffee Shop,Organic Grocery
14,Rosedale,Park,Playground,Trail,Deli / Bodega,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym
15,Stn A PO Boxes,Café,Cocktail Bar,Tailor Shop,Museum,Comfort Food Restaurant,Restaurant,Concert Hall,Beer Bar,Park,Farmers Market
17,"First Canadian Place, Underground city",Café,Coffee Shop,Restaurant,Gym / Fitness Center,Deli / Bodega,Bookstore,Bakery,Steakhouse,Pub,Seafood Restaurant


#### CLUSTER 5 (CULTURAL & GOING OUT PLACES)

In [43]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Queen's Park, Ontario Provincial Government",Coffee Shop,Sushi Restaurant,Wings Joint,Park,Arts & Crafts Store,Beer Bar,Burrito Place,Creperie,Diner,Distribution Center


### Exploring Neighborhoods in Manhattan

Create dataframe by running the getNearbyVenues function on each neighborhood

In [44]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],latitudes=manhattan_data['Latitude'],longitudes=manhattan_data['Longitude'],)

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [45]:
# check how many venues were returned for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,20,20,20,20,20,20
Carnegie Hill,20,20,20,20,20,20
Central Harlem,20,20,20,20,20,20
Chelsea,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Civic Center,20,20,20,20,20,20
Clinton,20,20,20,20,20,20
East Harlem,20,20,20,20,20,20
East Village,20,20,20,20,20,20
Financial District,20,20,20,20,20,20


### Analyzing each neighborhood

One Hot Encoding

In [46]:
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]
manhattan_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auditorium,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beer Bar,Beer Garden,Beer Store,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Café,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Filipino Restaurant,Fish Market,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health Food Store,Heliport,Historic Site,History Museum,Hobby Shop,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Venue,New American Restaurant,Newsstand,Noodle House,Office,Opera House,Optical Shop,Outdoor Sculpture,Outdoors & Recreation,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pub,Public Art,Ramen Restaurant,Residential Building (Apartment / Condo),Restaurant,Rock Club,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Taco Place,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Tiki Bar,Tourist Information Center,Trail,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [48]:
# Set Index
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
# print each neighborhood along with the top 5 most common venues
num_top_venues = 5
for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
           venue  freq
0  Memorial Site  0.15
1           Park  0.15
2     Food Court  0.10
3       Building  0.05
4  Shopping Mall  0.05


----Carnegie Hill----
                  venue  freq
0                   Gym  0.10
1  Gym / Fitness Center  0.10
2    Italian Restaurant  0.10
3          Gourmet Shop  0.05
4            Bagel Shop  0.05


----Central Harlem----
                venue  freq
0  African Restaurant  0.10
1   French Restaurant  0.10
2           Juice Bar  0.05
3          Bagel Shop  0.05
4                Café  0.05


----Chelsea----
                venue  freq
0             Theater  0.10
1  Seafood Restaurant  0.10
2      Ice Cream Shop  0.10
3              Market  0.05
4              Office  0.05


----Chinatown----
                 venue  freq
0       Sandwich Place  0.10
1   Chinese Restaurant  0.10
2  Indie Movie Theater  0.05
3               Bakery  0.05
4          Pizza Place  0.05


----Civic Center----
                venue  freq
0       

In [49]:
# put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood

In [50]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']
for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Memorial Site,Park,Food Court,Plaza,Gym,Smoke Shop,Shopping Mall,Scenic Lookout,Sandwich Place,Monument / Landmark
1,Carnegie Hill,Italian Restaurant,Gym / Fitness Center,Gym,Wine Bar,Dance Studio,Pizza Place,American Restaurant,Spa,Bookstore,Shoe Store
2,Central Harlem,French Restaurant,African Restaurant,Juice Bar,Café,Music Venue,Boutique,Cocktail Bar,Ethiopian Restaurant,Library,Beer Bar
3,Chelsea,Seafood Restaurant,Ice Cream Shop,Theater,Market,Taco Place,Chinese Restaurant,Office,Fish Market,Coffee Shop,Scenic Lookout
4,Chinatown,Sandwich Place,Chinese Restaurant,Indie Movie Theater,Tea Room,Hotpot Restaurant,Ice Cream Shop,Greek Restaurant,Museum,English Restaurant,New American Restaurant


### Clustering Neighborhoods

In [51]:
# set number of clusters
kclusters = 5
manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 3, 0, 0, 1, 3, 2, 1, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [52]:
manhattan_merged = manhattan_data
# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Coffee Shop,Gym,Yoga Studio,Pharmacy,Seafood Restaurant,Steakhouse,Supplement Shop,Sandwich Place,Donut Shop,Bank
1,Manhattan,Chinatown,40.715618,-73.994279,0,Sandwich Place,Chinese Restaurant,Indie Movie Theater,Tea Room,Hotpot Restaurant,Ice Cream Shop,Greek Restaurant,Museum,English Restaurant,New American Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,3,Wine Shop,Park,Café,Breakfast Spot,Bakery,Pet Café,Pizza Place,Ramen Restaurant,Restaurant,Coffee Shop
3,Manhattan,Inwood,40.867684,-73.92121,0,Wine Bar,Café,Park,Yoga Studio,Diner,Spanish Restaurant,Farmers Market,Frozen Yogurt Shop,Bistro,Latin American Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Yoga Studio,Mexican Restaurant,Cocktail Bar,Bakery,Historic Site,Italian Restaurant,Japanese Restaurant,Mediterranean Restaurant,Pizza Place,Coffee Shop


Create a map to visualize the clusters

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon],radius=5,popup=label,color=rainbow[cluster-1],fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining the clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

#### CLUSTER 1 (COMMERCIAL PLACES)

In [54]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Sandwich Place,Chinese Restaurant,Indie Movie Theater,Tea Room,Hotpot Restaurant,Ice Cream Shop,Greek Restaurant,Museum,English Restaurant,New American Restaurant
3,Inwood,Wine Bar,Café,Park,Yoga Studio,Diner,Spanish Restaurant,Farmers Market,Frozen Yogurt Shop,Bistro,Latin American Restaurant
4,Hamilton Heights,Yoga Studio,Mexican Restaurant,Cocktail Bar,Bakery,Historic Site,Italian Restaurant,Japanese Restaurant,Mediterranean Restaurant,Pizza Place,Coffee Shop
9,Yorkville,Wine Shop,Deli / Bodega,Gym,Park,Café,Liquor Store,Beer Store,Sandwich Place,Coffee Shop,Bagel Shop
11,Roosevelt Island,Coffee Shop,Gym,Liquor Store,Scenic Lookout,School,Baseball Field,Soccer Field,Residential Building (Apartment / Condo),Greek Restaurant,Outdoors & Recreation
12,Upper West Side,Italian Restaurant,American Restaurant,Bakery,Pub,Juice Bar,Greek Restaurant,Bagel Shop,Bookstore,Tiki Bar,Chinese Restaurant
13,Lincoln Square,Theater,Indie Movie Theater,Concert Hall,Performing Arts Venue,College Arts Building,Circus,Opera House,Gift Shop,Library,Fountain
19,East Village,Dessert Shop,Vietnamese Restaurant,Park,Bagel Shop,Dog Run,Cheese Shop,Japanese Restaurant,Beer Store,Swiss Restaurant,Coffee Shop
20,Lower East Side,Cocktail Bar,Art Gallery,Italian Restaurant,Coffee Shop,Mediterranean Restaurant,Juice Bar,Chinese Restaurant,Filipino Restaurant,Japanese Restaurant,Yoga Studio
21,Tribeca,Park,Yoga Studio,Sushi Restaurant,Hotel,Indie Theater,Italian Restaurant,Greek Restaurant,Men's Store,Cycle Studio,Poke Place


#### CLUSTER 2 (TOURIST PLACES & HUBS)

In [55]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Coffee Shop,Gym,Yoga Studio,Pharmacy,Seafood Restaurant,Steakhouse,Supplement Shop,Sandwich Place,Donut Shop,Bank
5,Manhattanville,Italian Restaurant,Café,Supermarket,Sushi Restaurant,Climbing Gym,Gastropub,Coffee Shop,Lounge,Juice Bar,Dumpling Restaurant
8,Upper East Side,Hotel,Bar,Italian Restaurant,Jazz Club,Gym / Fitness Center,French Restaurant,Optical Shop,Park,Pet Store,Hotel Bar
15,Midtown,Clothing Store,Hotel,Cuban Restaurant,Park,Smoke Shop,Miscellaneous Shop,Food Truck,Bookstore,Steakhouse,French Restaurant
25,Manhattan Valley,Pizza Place,Bar,Grocery Store,Deli / Bodega,Park,Coffee Shop,Chinese Restaurant,Bubble Tea Shop,Mexican Restaurant,Korean Restaurant
28,Battery Park City,Memorial Site,Park,Food Court,Plaza,Gym,Smoke Shop,Shopping Mall,Scenic Lookout,Sandwich Place,Monument / Landmark
30,Carnegie Hill,Italian Restaurant,Gym / Fitness Center,Gym,Wine Bar,Dance Studio,Pizza Place,American Restaurant,Spa,Bookstore,Shoe Store
32,Civic Center,Spa,French Restaurant,Yoga Studio,General Entertainment,Dance Studio,Park,Falafel Restaurant,Burrito Place,Monument / Landmark,Molecular Gastronomy Restaurant
33,Midtown South,Korean Restaurant,Hotel,Japanese Restaurant,Building,Grocery Store,Lingerie Store,Fried Chicken Joint,Dessert Shop,Plaza,Coffee Shop
34,Sutton Place,Beer Garden,Grocery Store,Gym,Yoga Studio,Thai Restaurant,Bagel Shop,Beer Store,Furniture / Home Store,Greek Restaurant,Gym / Fitness Center


#### CLUSTER 3 (CENTER ACTIVITY)

In [56]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,East Harlem,Mexican Restaurant,Latin American Restaurant,Thai Restaurant,French Restaurant,Bakery,Pharmacy,Dance Studio,Park,Cocktail Bar,Sandwich Place
16,Murray Hill,Burger Joint,Jewish Restaurant,Coffee Shop,Museum,Shanghai Restaurant,Event Space,Sandwich Place,Sushi Restaurant,Taco Place,Tea Room
18,Greenwich Village,Italian Restaurant,Yoga Studio,Coffee Shop,Café,Caribbean Restaurant,Seafood Restaurant,French Restaurant,New American Restaurant,Sandwich Place,Sushi Restaurant
31,Noho,Rock Club,Wine Shop,Sandwich Place,Italian Restaurant,Coffee Shop,Ice Cream Shop,French Restaurant,Boutique,Gourmet Shop,Greek Restaurant


#### CLUSTER 4 (RESIDENTIAL)

In [57]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Wine Shop,Park,Café,Breakfast Spot,Bakery,Pet Café,Pizza Place,Ramen Restaurant,Restaurant,Coffee Shop
6,Central Harlem,French Restaurant,African Restaurant,Juice Bar,Café,Music Venue,Boutique,Cocktail Bar,Ethiopian Restaurant,Library,Beer Bar
10,Lenox Hill,Thai Restaurant,Gym,Restaurant,Liquor Store,Chinese Restaurant,Salad Place,Taco Place,French Restaurant,College Academic Building,Dessert Shop
14,Clinton,Gym / Fitness Center,Theater,Hotel,Comedy Club,Mediterranean Restaurant,Café,Cocktail Bar,Sports Bar,Lounge,Building
17,Chelsea,Seafood Restaurant,Ice Cream Shop,Theater,Market,Taco Place,Chinese Restaurant,Office,Fish Market,Coffee Shop,Scenic Lookout


#### CLUSTER 5 (CULTURAL & GOING OUT PLACES)

In [58]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Soho,Mediterranean Restaurant,Clothing Store,Dance Studio,Italian Restaurant,Salon / Barbershop,Wine Bar,Furniture / Home Store,Bakery,Coffee Shop,Sporting Goods Shop
24,West Village,Coffee Shop,Cocktail Bar,Bakery,Italian Restaurant,Hardware Store,Gourmet Shop,Board Shop,Mediterranean Restaurant,Boutique,Austrian Restaurant
29,Financial District,Coffee Shop,Gym / Fitness Center,Doctor's Office,Jewelry Store,Falafel Restaurant,Café,Steakhouse,New American Restaurant,Seafood Restaurant,Salad Place


## RESULTS

After clustering the data of the respective neighborhoods, both cities or boroughs, namely Downtown Toronto (Toronto, Canada) and Manhattan (New York, United States) have venues which can be explored and attract the tourists all over the world. The neighborhoods are much similar in features like theaters, opera houses, food places, clubs, museums, parks etc. As far as dissimilarity is concerned, it differs in terms of some unique places like historical places and monuments.

## OBSERVATIONS & DISCUSSIONS

When the tourist places in both the boroughs are compared, it can be observed that the historical place is only situated in Downtown Toronto and the Monument or landmark venue is in Manhattan neighborhoods. Similarly, Airport facility, Harbor, Sculpture garden and Boat or ferry services are also available in Downtown Toronto while venues like Nightlife, Climbing gym and Museums are present in Manhattan.

As far as recommendations are concerned, Downtown Toronto Neighborhoods will be recommended to visit first. The tourists have an easy travelling access due to Airport facility, which not only saves time but also helps to save money. This saved money can be utilized to explore more attracting venues.

## CONCLUSION

The Downtown Toronto and Manhattan neighborhoods have more like similar venues. As we know that every place is unique in its own way, so that argument is present in both neighborhoods. The dissimilarity exists in terms of some different venues and facilities but not on a larger extent.