# Comparison of New York City and Toronto city neighborhoods using Foursquare data

## Objective 

The objective of the current project is to compare New York City /United States of America/ and Toronto city /Canada/ based on the similarity of their neighborhoods. We will use location-based social networks’ data from the Foursquare API to segment the cities’ neighborhoods and make a comparison between the two cities based on similarities and dissimilarities between the neighborhoods. The new features for neighborhood segmentation, stemming from the location-based social networks, could unravel some important social, behavioral and economic trends within the big cities.

This workbook completes all tasks under the final project for the Coursera Applied Data Science: Comparison of New York City and Toronto city neighborhoods using Foursquare data by performing clustering segmentation of the neighbourhoods

In [2]:
#import necessary libraries 
import pandas as pd
import wikipedia as wp
import numpy as np   
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans #K-means clustering algorithm
import folium # map rendering library
import json 
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import matplotlib.cm as cm
import matplotlib.colors as colors

## Data Description

In order to perform the current objective, we are using three main sources for data:
    1. New York City Data for boroughs and neighbourhoods with their GPS coordinates
    2. Toronto City Data for Postcodes, boroughs and neighbourhoods with their GPS coordinates
    3. Foursquare API data - to get all venues information /venues geo coordinates, venue categories/ that Foursquare data has for the venues located within the geo coordinates of the neighbourhoods of the two cities
        

## Methodology 

The methodology we use consists of several major steps.

1. Download, clean and prepare the data for the two cities'neighbourhoods along with their gps coordinates. The data is described in details in Data section

2. Get the Coordinates for New York and Toronto via Geopy Nominatim

3. Establish the connection to the Foursquare API data

4. Get the data for the venues from Foursquare - due to the restrictions of the number of calls to the API, we are making calls for the top 100 venues within 500 radius from the given neighbourhoods' coordinates

5. Select the top 25 venues categories based on their frequency for each neighbourhood for each city 

6. Cluster the neighbourhoods for each city based on the venues category frequency that fall within them. We use K-means clustering as one of the most popular unsupervised learning technique

7. Use folium library to display a map for each of the city along with the created clusters

8. Display each cluster for each city separately and analyse 


## Download and Explore New York City Dataset

New York City has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

We use the following link to get the dataset for New York City: https://geo.nyu.edu/catalog/nyu_2451_34572

In [6]:
# we downloaded the Json file and load it onto the current notebook
with open('C:/Users/AsyaGadzhalova/Documents/GitHub/Coursera_Capstone/nyu-2451-34572-geojson.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
#read the json file
newyork_data

All the relevant data we need /Borough, neighbourhood name, neighbourhood latitude and longitude/ are in the features key. Get the data in a separate variable

In [8]:
#add a new variable with the necessary features from the json file
neighborhoods_data = newyork_data['features']

In [9]:
#check the first item of the list
neighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

This data needs to be transformed to a Pandas dataframe 

In [10]:
# create a blank dataframe
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Loop through the data from the features from the json file and fill in the dataframe: 

In [11]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [12]:
#result dataframe

neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In [13]:
#check if we got the complete data out of the initial json file 
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### Get the geo coordinates of New York City  

Use geopy library to get the latitude and longitude values of New York City.In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [14]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of New York City are 40.7127281, -74.0060152.


## Deploy Foursquare venues data for New York City 

### Define Foursquare Credentials and Version

In [15]:
CLIENT_ID = '4N2B44YIYKLN05NKTVTP3V5MOXHAAY1RDD3NCZ21GFZ320G3' # your Foursquare ID
CLIENT_SECRET = 'U343II3RGG4YNIGYAFKEFFLE2U5K13MTV1BZOSCXJAX23RNV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4N2B44YIYKLN05NKTVTP3V5MOXHAAY1RDD3NCZ21GFZ320G3
CLIENT_SECRET:U343II3RGG4YNIGYAFKEFFLE2U5K13MTV1BZOSCXJAX23RNV


### Explore Neighborhoods in New York City

In [16]:
#create a function that get from the Foursquare API top 100 venues for New York within 500 radius from the Geo Coordinates of the neighbourhoods
radius = 500 
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


Run the code from the function to get the venues for each neighbourhood

In [None]:
newyork_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )


In [18]:
#check the resulting dataframe
print(newyork_venues.shape)
newyork_venues.head()

(10408, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant
4,Wakefield,40.894705,-73.847201,Shell,40.894187,-73.845862,Gas Station


In [19]:
#the number of uniqie categories for the returned venues 
print('There are {} uniques categories.'.format(len(newyork_venues['Venue Category'].unique())))

There are 427 uniques categories.


### Explore the neighbourhoods and the venues

In [20]:
#transform the dataset to get the venues categories as new columns
# one hot encoding
newyork_onehot = pd.get_dummies(newyork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newyork_onehot['Neighborhood'] = newyork_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [newyork_onehot.columns[-1]] + list(newyork_onehot.columns[:-1])
newyork_onehot =newyork_onehot[fixed_columns]

newyork_onehot.head(5)

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
newyork_onehot.shape

(10408, 427)

Group the rows by neighborhood and take the mean of the frequency of occurrence of each category

In [22]:
newyork_grouped = newyork_onehot.groupby('Neighborhood').mean().reset_index()
newyork_grouped.head(5)

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Arcade,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Annadale,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arlington,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
newyork_grouped.shape

(300, 427)

Print each neighborhood along with the top 5 most common venues

In [None]:
num_top_venues = 5

for hood in newyork_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = newyork_grouped[newyork_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

In [25]:
#function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a new dataframe and display the top 25 venues by frequency for each neighborhood.

In [27]:
num_top_venues = 25

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = newyork_grouped['Neighborhood']

for ind in np.arange(newyork_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newyork_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
0,Allerton,Pizza Place,Supermarket,Chinese Restaurant,Pharmacy,Department Store,Fast Food Restaurant,Bus Station,Martial Arts Dojo,Bike Trail,...,Discount Store,Cosmetics Shop,Dessert Shop,Fried Chicken Joint,Intersection,Spanish Restaurant,Playground,Grocery Store,Deli / Bodega,Spa
1,Annadale,Pizza Place,Bakery,Bagel Shop,Train Station,Sports Bar,Pub,American Restaurant,Pet Store,Restaurant,...,Fast Food Restaurant,Exhibit,Event Space,Farmers Market,Women's Store,Field,Filipino Restaurant,Ethiopian Restaurant,Fish & Chips Shop,Fish Market
2,Arden Heights,Pharmacy,Coffee Shop,Bus Stop,Pizza Place,Filipino Restaurant,Event Space,Exhibit,Fabric Shop,Factory,...,Ethiopian Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain
3,Arlington,Bus Stop,Intersection,Caribbean Restaurant,Women's Store,Fish Market,Fabric Shop,Factory,Falafel Restaurant,Farm,...,Flea Market,Event Space,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant
4,Arrochar,Bus Stop,Deli / Bodega,Italian Restaurant,Pizza Place,Hotel,Middle Eastern Restaurant,Pharmacy,Liquor Store,Bagel Shop,...,Cosmetics Shop,Food Truck,Flower Shop,Farm,Exhibit,Fabric Shop,Factory,Food Stand,Falafel Restaurant,Food Court


In [28]:
neighborhoods_venues_sorted.shape

(300, 26)

## Cluster New York City neighbourhoods 

Run k-means to cluster the neighborhood into 7 clusters.

In [29]:
# set number of clusters
kclusters = 7

newyork_grouped_clustering = newyork_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(newyork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 2, 2, 0, 1, 1, 0, 1, 1])

In [31]:
#neighborhoods_venues_sorted.drop(['Cluster Labels'], axis=1,inplace=True)

Create a new dataframe that includes the cluster as well as the top 25 venues for each neighborhood.

In [32]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

newyork_merged = neighborhoods

# merge new york dataset with the initial dataframe to add latitude/longitude for each neighborhood
newyork_merged = newyork_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

newyork_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,6.0,Gas Station,Dessert Shop,Sandwich Place,Caribbean Restaurant,Ice Cream Shop,...,Fast Food Restaurant,Field,Women's Store,Fish Market,Fish & Chips Shop,Event Space,Flea Market,Flower Shop,Food,Food & Drink Shop
1,Bronx,Co-op City,40.874294,-73.829939,1.0,Bus Station,Baseball Field,Fast Food Restaurant,Chinese Restaurant,Mattress Store,...,Falafel Restaurant,Exhibit,Food Truck,Food Stand,Fabric Shop,Food Court,Factory,Farm,Food & Drink Shop,Flea Market
2,Bronx,Eastchester,40.887556,-73.827806,1.0,Caribbean Restaurant,Deli / Bodega,Metro Station,Bus Station,Bus Stop,...,Factory,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Food,Field,Food Stand,Food Truck
3,Bronx,Fieldston,40.895437,-73.905643,1.0,River,Bus Station,Playground,Plaza,Women's Store,...,Fish & Chips Shop,Ethiopian Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck
4,Bronx,Riverdale,40.890834,-73.912585,1.0,Bus Station,Park,Bank,Playground,Food Truck,...,Field,Filipino Restaurant,Ethiopian Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court


In [33]:
#check for empty clusters
newyork_merged[newyork_merged['Cluster Labels'].isna()==True]


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
207,Staten Island,Port Ivory,40.639683,-74.174645,,,,,,,...,,,,,,,,,,
257,Staten Island,Howland Hook,40.638433,-74.186223,,,,,,,...,,,,,,,,,,


In [34]:
#remove empty clusters
newyork_merged = newyork_merged[newyork_merged['Cluster Labels'].isna() == False].copy()

In [35]:
newyork_merged.shape

(304, 30)

In [36]:
newyork_merged['Cluster Labels'].dtype

dtype('float64')

In [37]:
newyork_merged['Cluster Labels'] = newyork_merged['Cluster Labels'].astype(int, inplace=True)

In [38]:
newyork_merged['Cluster Labels'].value_counts()

1    237
0     37
2     16
6      8
3      4
5      1
4      1
Name: Cluster Labels, dtype: int64

### Results For New York City

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(newyork_merged['Latitude'], newyork_merged['Longitude'], newyork_merged['Neighborhood'], newyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


### Examine Clusters

New York Cluster 1 - this first cluster is segmentated based on the similarity of the top 3 common venues to be Deli/Bodega, Italian Restaurant or Pizza Place. We could name it as Italian food cluster

In [40]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 0, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
7,Woodlawn,Deli / Bodega,Pizza Place,Pub,Playground,Bus Stop,Liquor Store,Bar,Beer Bar,Supermarket,...,Donut Shop,Plaza,Cosmetics Shop,Bakery,Rental Car Location,Italian Restaurant,Indian Restaurant,Factory,Electronics Store,Empanada Restaurant
28,Throgs Neck,Italian Restaurant,Juice Bar,Sports Bar,Coffee Shop,Pizza Place,Asian Restaurant,Bar,Deli / Bodega,American Restaurant,...,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Flower Shop,Flea Market,Food,Food & Drink Shop,Food Court
32,Van Nest,Deli / Bodega,Pizza Place,Donut Shop,BBQ Joint,Hookah Bar,Supermarket,Bus Station,Playground,Coffee Shop,...,Falafel Restaurant,Factory,Fabric Shop,Filipino Restaurant,Women's Store,Fish Market,Event Space,Flea Market,Flower Shop,Food
34,Belmont,Italian Restaurant,Pizza Place,Deli / Bodega,Bakery,Bank,Donut Shop,Liquor Store,Dessert Shop,Gas Station,...,Bar,Café,Mexican Restaurant,Fish Market,Department Store,Smoke Shop,Discount Store,Miscellaneous Shop,Cheese Shop,Seafood Restaurant
39,Edgewater Park,Italian Restaurant,Deli / Bodega,Pizza Place,Food & Drink Shop,Chinese Restaurant,Park,Bar,Coffee Shop,Farmers Market,...,Ice Cream Shop,Pub,American Restaurant,Food,Food Truck,Factory,Fountain,Falafel Restaurant,Farm,Field
40,Castle Hill,Pizza Place,Bank,Diner,Market,Pharmacy,Deli / Bodega,Fountain,Food Truck,Event Space,...,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Event Service,Flea Market,Flower Shop,Food,Food & Drink Shop
72,East New York,Deli / Bodega,Asian Restaurant,Pharmacy,Fast Food Restaurant,Caribbean Restaurant,Plaza,Liquor Store,Gym,Spanish Restaurant,...,Farm,Farmers Market,Factory,Women's Store,Filipino Restaurant,Exhibit,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop
83,Marine Park,Park,Baseball Field,Deli / Bodega,Basketball Court,Pizza Place,Soccer Field,Athletics & Sports,Ice Cream Shop,Gym,...,Exhibit,Fast Food Restaurant,Filipino Restaurant,Field,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop
89,Ocean Hill,Deli / Bodega,Bus Stop,Grocery Store,Fried Chicken Joint,Southern / Soul Food Restaurant,Construction & Landscaping,Mexican Restaurant,Park,Bakery,...,African Restaurant,Food,Metro Station,Supermarket,Flea Market,Fish Market,Factory,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant
148,South Ozone Park,Park,Deli / Bodega,Fast Food Restaurant,Hotel,Sandwich Place,Bar,Food Truck,Donut Shop,Exhibit,...,Filipino Restaurant,Women's Store,Event Service,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court


New York Cluster 2 - the biggest cluster; the majority of the neigbourhoods fall within this cluster. This is spreaded all over New YOrk city

In [41]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 1, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
1,Co-op City,Bus Station,Baseball Field,Fast Food Restaurant,Chinese Restaurant,Mattress Store,Pharmacy,Grocery Store,Park,Gift Shop,...,Falafel Restaurant,Exhibit,Food Truck,Food Stand,Fabric Shop,Food Court,Factory,Farm,Food & Drink Shop,Flea Market
2,Eastchester,Caribbean Restaurant,Deli / Bodega,Metro Station,Bus Station,Bus Stop,Diner,Donut Shop,Bakery,Fast Food Restaurant,...,Factory,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Filipino Restaurant,Food,Field,Food Stand,Food Truck
3,Fieldston,River,Bus Station,Playground,Plaza,Women's Store,Filipino Restaurant,Event Space,Exhibit,Fabric Shop,...,Fish & Chips Shop,Ethiopian Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck
4,Riverdale,Bus Station,Park,Bank,Playground,Food Truck,Home Service,Plaza,Event Space,Exhibit,...,Field,Filipino Restaurant,Ethiopian Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court
5,Kingsbridge,Pizza Place,Deli / Bodega,Sandwich Place,Bar,Latin American Restaurant,Mexican Restaurant,Bakery,Supermarket,Donut Shop,...,Burger Joint,Mattress Store,Liquor Store,Breakfast Spot,Mobile Phone Shop,Sports Bar,Candy Store,Café,Nail Salon,Gourmet Shop
6,Marble Hill,Coffee Shop,Sandwich Place,Yoga Studio,Kids Store,Steakhouse,Supplement Shop,Miscellaneous Shop,Tennis Stadium,Gym,...,Ice Cream Shop,Department Store,Video Game Store,American Restaurant,Discount Store,Diner,Deli / Bodega,Dive Bar,Factory,Falafel Restaurant
8,Norwood,Pizza Place,Park,Bank,Mobile Phone Shop,Deli / Bodega,American Restaurant,Pharmacy,Chinese Restaurant,Mexican Restaurant,...,Spanish Restaurant,Restaurant,Bus Station,Supermarket,Bus Stop,Food Court,Fabric Shop,Factory,Falafel Restaurant,Farm
10,Baychester,Donut Shop,Bus Station,Supermarket,Mattress Store,Mexican Restaurant,Fast Food Restaurant,Bank,Pet Store,Electronics Store,...,Spanish Restaurant,Fried Chicken Joint,Arcade,Discount Store,Convenience Store,American Restaurant,Factory,Fountain,Falafel Restaurant,Farm
11,Pelham Parkway,Italian Restaurant,Frozen Yogurt Shop,Pizza Place,Deli / Bodega,Sushi Restaurant,Metro Station,Bank,Bakery,Coffee Shop,...,Bus Station,Plaza,Ice Cream Shop,Flea Market,Food & Drink Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Food Stand
12,City Island,Harbor / Marina,Thrift / Vintage Store,Seafood Restaurant,Grocery Store,Bar,Liquor Store,Baseball Field,Boat or Ferry,Pharmacy,...,Music Venue,Ice Cream Shop,Diner,American Restaurant,French Restaurant,Italian Restaurant,Bus Station,Park,Bank,Falafel Restaurant


New York Cluster 3 - the Staton Island Cluster/spreaded only on Staton Island/, so it handles the task to map the specific venues for the Staton Island

In [42]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 2, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
198,New Brighton,Bus Stop,Deli / Bodega,Park,Convenience Store,Bowling Alley,Discount Store,Playground,Farmers Market,Fish Market,...,Fish & Chips Shop,Women's Store,Food,Flea Market,Flower Shop,Exhibit,Food & Drink Shop,Food Court,Food Stand,Food Truck
205,Port Richmond,Pizza Place,Rental Car Location,Bus Stop,Martial Arts Dojo,Donut Shop,Bar,Food Stand,Frame Store,Event Space,...,Farmers Market,Fast Food Restaurant,Field,Food Truck,Filipino Restaurant,Fountain,Event Service,Fish Market,Flea Market,Flower Shop
208,Castleton Corners,Pizza Place,Bus Stop,Sandwich Place,Mini Golf,Bar,Bank,Grocery Store,Bagel Shop,Tattoo Parlor,...,Factory,Falafel Restaurant,Farm,French Restaurant,Frame Store,Fountain,Food Truck,Fast Food Restaurant,Food Court,Field
212,Oakwood,Bar,Chiropractor,Bus Stop,Fish Market,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,...,Fruit & Vegetable Store,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant
224,Park Hill,Bus Stop,Coffee Shop,Gym / Fitness Center,Athletics & Sports,Hotel,Women's Store,Filipino Restaurant,Exhibit,Fabric Shop,...,Fish & Chips Shop,Event Service,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck
227,Arlington,Bus Stop,Intersection,Caribbean Restaurant,Women's Store,Fish Market,Fabric Shop,Factory,Falafel Restaurant,Farm,...,Flea Market,Event Space,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant
229,Grasmere,Bus Stop,Bank,Grocery Store,Ice Cream Shop,Bagel Shop,Bakery,Park,Vegetarian / Vegan Restaurant,Cosmetics Shop,...,Italian Restaurant,Fast Food Restaurant,Falafel Restaurant,Factory,Fabric Shop,Farm,Farmers Market,Flea Market,Field,Filipino Restaurant
232,Midland Beach,Beach,Bus Stop,Restaurant,Deli / Bodega,Bookstore,Dessert Shop,Chinese Restaurant,Pet Store,Pizza Place,...,Farm,Filipino Restaurant,Factory,Field,Flea Market,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop
238,Butler Manor,Baseball Field,Pool,Bus Stop,Convenience Store,Fish & Chips Shop,Event Space,Exhibit,Fabric Shop,Factory,...,Women's Store,Fish Market,Event Service,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain
241,Arden Heights,Pharmacy,Coffee Shop,Bus Stop,Pizza Place,Filipino Restaurant,Event Space,Exhibit,Fabric Shop,Factory,...,Ethiopian Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain


New York Cluster 4 - the Park Cluster of New York City

In [43]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 3, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
27,Clason Point,Park,South American Restaurant,Scenic Lookout,Bus Stop,Business Service,Boat or Ferry,Grocery Store,Pool,Farmers Market,...,Fabric Shop,Farm,Fish Market,Fish & Chips Shop,Event Space,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court
192,Somerville,Park,Women's Store,Event Service,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,...,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant,Fried Chicken Joint
203,Todt Hill,Park,Women's Store,Event Service,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,...,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant,Fried Chicken Joint
303,Bayswater,Park,Playground,Women's Store,Filipino Restaurant,Event Space,Exhibit,Fabric Shop,Factory,Falafel Restaurant,...,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store


New York Cluster 5 - only 1 neighbourhood falls here, apparently known with its beach 

In [44]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 4, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
179,Neponsit,Beach,Bus Stop,Women's Store,Flea Market,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,...,Furniture / Home Store,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant,Fried Chicken Joint


New York Cluster 6

In [45]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 5, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
255,Emerson Hill,Food,Women's Store,Fish Market,Exhibit,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,...,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop


New York Cluster 7 - the Caribbean Restaurant Cluster

In [47]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 6, newyork_merged.columns[[1] + list(range(5, newyork_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
0,Wakefield,Gas Station,Dessert Shop,Sandwich Place,Caribbean Restaurant,Ice Cream Shop,Donut Shop,Food Truck,Deli / Bodega,Pharmacy,...,Fast Food Restaurant,Field,Women's Store,Fish Market,Fish & Chips Shop,Event Space,Flea Market,Flower Shop,Food,Food & Drink Shop
9,Williamsbridge,Caribbean Restaurant,Bar,Soup Place,Nightclub,Fish & Chips Shop,Exhibit,Fabric Shop,Factory,Falafel Restaurant,...,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,Frame Store,French Restaurant
74,Canarsie,Event Service,Chinese Restaurant,Caribbean Restaurant,Gym,Grocery Store,Asian Restaurant,Event Space,Fabric Shop,Factory,...,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain
78,Coney Island,Baseball Stadium,Caribbean Restaurant,Food Court,Beach,Skating Rink,Pharmacy,Gourmet Shop,Theme Park Ride / Attraction,Vegetarian / Vegan Restaurant,...,Fabric Shop,Factory,Frame Store,Falafel Restaurant,Farm,Fountain,Farmers Market,Food Truck,Food,Field
165,St. Albans,Caribbean Restaurant,Deli / Bodega,Fast Food Restaurant,Dance Studio,Café,Grocery Store,Market,Seafood Restaurant,Motorcycle Shop,...,Fish Market,Farm,Exhibit,Fabric Shop,Factory,Food Truck,Falafel Restaurant,Food Stand,Food & Drink Shop,Food Court
188,Laurelton,Caribbean Restaurant,Cosmetics Shop,Train Station,Women's Store,Filipino Restaurant,Event Space,Exhibit,Fabric Shop,Factory,...,Ethiopian Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain
259,Remsen Village,Caribbean Restaurant,Fast Food Restaurant,Fried Chicken Joint,Fish Market,Pharmacy,Coffee Shop,Sandwich Place,Salad Place,Café,...,Farmers Market,Fabric Shop,Factory,Falafel Restaurant,Food Truck,Farm,Field,Food Stand,Flower Shop,Filipino Restaurant
300,Erasmus,Caribbean Restaurant,Yoga Studio,Health Food Store,Convenience Store,Donut Shop,Playground,Pizza Place,Pharmacy,Food Truck,...,Furniture / Home Store,Juice Bar,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Factory


## Download and explore Toronto City Dataset

### Get the data from the Web, along with initial cleaning and data preparation 

From the first Source we get the Postcodes for Toronto, along with boroughs and neighbourhoods. We transform them into a pandas dataframe

In [48]:
#Get the html source of the Wiki page - we are using pandas to web scrap the Wiki table to get the postcodes along the neighbourhoods in Toronto
html = wp.page("List_of_postal_codes_of_Canada:_M").html().encode("UTF-8")
df = pd.read_html(html, header=0)[0]
#print (df)

In [49]:
df.columns

Index(['Postcode', 'Borough', 'Neighbourhood'], dtype='object')

In [50]:
#create a Pandas Dataframe
df_new = pd.DataFrame(df)

In [51]:
df_new.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [52]:
#set the index to Postcode column
df_new.set_index('Postcode',inplace=True)

In [53]:
#check the DF columns 
df_new.columns

Index(['Borough', 'Neighbourhood'], dtype='object')

In [54]:
#filter out the postcodes with Not assigned Boroughs
df_postal = df_new[df_new['Borough']!='Not assigned']

In [55]:
#transform the Neighbourhood column to list all the neighbourhoods that are for a given Postcode 
df_postal["Neighbourhood"] = df_postal.groupby("Postcode")["Neighbourhood"].transform(lambda neigh: ', '.join(neigh))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [56]:
df_postal.head()

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Harbourfront, Regent Park"
M5A,Downtown Toronto,"Harbourfront, Regent Park"
M6A,North York,"Lawrence Heights, Lawrence Manor"


In [57]:
#remove the duplicate rows, in order to have only 1 row per Postcode
df_postal = df_postal.drop_duplicates()

In [58]:
#For the Postcodes where column Neighbourhood is Not assigned, we take the value of the column Borough
df_postal['Neighbourhood'].replace('Not assigned', df_postal['Borough'], inplace=True)
df_postal.head(10)

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Harbourfront, Regent Park"
M6A,North York,"Lawrence Heights, Lawrence Manor"
M7A,Queen's Park,Queen's Park
M9A,Etobicoke,Islington Avenue
M1B,Scarborough,"Rouge, Malvern"
M3B,North York,Don Mills North
M4B,East York,"Woodbine Gardens, Parkview Hill"
M5B,Downtown Toronto,"Ryerson, Garden District"


In [59]:
#reset the index, so that the dataframe
df_postal.reset_index(inplace=True)

In [60]:
df_postal.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


In [61]:
#display the size of the dataframe
df_postal.shape

(103, 3)

After that we use the following csv to get each postcode geographic coordinates

In [62]:
#read the csv file with the coordinates
coor = pd.read_csv('http://cocl.us/Geospatial_data')
coor.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [63]:
#merge the two tables into a single dataframe by the Postcode
data = pd.merge(df_postal, coor, left_on='Postcode', right_on='Postal Code', how = 'inner')

In [64]:
#drop the second column for Postcode
data.drop(['Postal Code'],axis=1, inplace=True)

In [65]:
data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


In [66]:
#check the final output for number of neighborhoods and boroughs
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(data['Borough'].unique()),
        data.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


### Use geopy library to get the latitude and longitude values of Toronto City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent toronto_explorer, as shown below.

In [67]:
#get the coordinates of Toronto 
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


We have already created and used a function for New York venues - so we will use the same function to get the 100 venues within 500 radius

In [None]:

toronto_venues = getNearbyVenues(names=data['Neighbourhood'],
                                   latitudes=data['Latitude'],
                                   longitudes=data['Longitude']
                                  )

In [70]:
#display the size of the result dataframe
print(toronto_venues.shape)
toronto_venues.head(10)

(2254, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
5,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
6,Victoria Village,43.725882,-79.315572,Eglinton Ave E & Sloane Ave/Bermondsey Rd,43.726086,-79.31362,Intersection
7,Victoria Village,43.725882,-79.315572,Pizza Nova,43.725824,-79.31286,Pizza Place
8,Victoria Village,43.725882,-79.315572,Cash Money,43.725486,-79.312665,Financial or Legal Service
9,"Harbourfront, Regent Park",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery


In [71]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 277 uniques categories.


### Analyze Each Neighborhood of Toronto

In [72]:
# transform the venue categories into columns of the dataframe
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot =toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [73]:
toronto_onehot.shape

(2254, 277)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [75]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [76]:
toronto_grouped.shape

(99, 277)

Let's print each neighborhood along with the top 5 most common venues

In [None]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

Now let's create the new dataframe and display the top 25 venues for each neighborhood

In [78]:
num_top_venues = 25

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Cosmetics Shop,Hotel,Restaurant,Burger Joint,American Restaurant,...,Sushi Restaurant,Gym,Concert Hall,Gastropub,Lounge,Vegetarian / Vegan Restaurant,Juice Bar,Dance Studio,Salon / Barbershop,Colombian Restaurant
1,Agincourt,Clothing Store,Lounge,Skating Rink,Breakfast Spot,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,...,Falafel Restaurant,Farmers Market,Dog Run,Diner,Festival,Creperie,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,...,Falafel Restaurant,Farmers Market,Diner,Department Store,Dessert Shop,College Stadium,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pharmacy,Coffee Shop,Beer Store,Sandwich Place,Fried Chicken Joint,Fast Food Restaurant,Pizza Place,Construction & Landscaping,...,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
4,"Alderwood, Long Branch",Pizza Place,Gym,Skating Rink,Pharmacy,Coffee Shop,Athletics & Sports,Pub,Sandwich Place,Pool,...,Dessert Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Dim Sum Restaurant,Curling Ice,Department Store,Deli / Bodega,Dance Studio


# Cluster Neighbourhouds for Toronto City


In [79]:
# Run k-means to cluster the neighborhood into 7 clusters.
# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20] 

array([1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 3, 1, 1, 1])

In [80]:
kmeans.labels_.dtype

dtype('int32')

Let's create a new dataframe that includes the cluster as well as the top 25 venues for each neighborhood.

In [81]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
data.rename(columns = {'Neighbourhood':'Neighborhood'}, inplace = True)
toronto_merged = data
data.columns.rename("")
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head(6) # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3.0,Fast Food Restaurant,Food & Drink Shop,Park,Discount Store,...,Falafel Restaurant,Farmers Market,Diner,Women's Store,Festival,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Pizza Place,Financial or Legal Service,Coffee Shop,Hockey Arena,...,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Diner,Department Store,Dim Sum Restaurant,Dessert Shop,Fast Food Restaurant,Deli / Bodega
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,1.0,Coffee Shop,Bakery,Café,Park,...,Spa,Dessert Shop,Electronics Store,Event Space,Farmers Market,Ice Cream Shop,French Restaurant,Hotel,Historic Site,Health Food Store
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1.0,Furniture / Home Store,Clothing Store,Boutique,Coffee Shop,...,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Falafel Restaurant,Dog Run,Women's Store,Diner,Dim Sum Restaurant,Farmers Market
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,1.0,Coffee Shop,Park,Gym,Diner,...,Bar,Nightclub,Creperie,Mexican Restaurant,Fast Food Restaurant,Japanese Restaurant,Italian Restaurant,Hobby Shop,Wings Joint,Smoothie Shop
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,...,,,,,,,,,,


In [82]:
toronto_merged.dropna(axis=0,inplace=True)

In [None]:
toronto_merged['Cluster Labels'].astype(np.int64)

Finally, let's visualize the resulting clusters

In [84]:
toronto_merged['Cluster Labels'] =toronto_merged['Cluster Labels'].astype(int)

In [85]:
toronto_merged['Cluster Labels'].value_counts()

1    80
3    10
6     4
4     2
5     1
2     1
0     1
Name: Cluster Labels, dtype: int64

## Results for Toronto City

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

map_clusters

In [None]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#  Explore each cluster for Toronto

Toronto Cluster 1 - Playground 

In [90]:
#Explore Cluster with label 0
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
32,Scarborough,0,Playground,Convenience Store,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,...,Falafel Restaurant,Farmers Market,Diner,Dessert Shop,Festival,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop


Toronto Cluster 2 - the biggest cluster in Toronto.Spread all over the city

In [91]:
#Explore Cluster with label 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
1,North York,1,Pizza Place,Financial or Legal Service,Coffee Shop,Hockey Arena,Intersection,Portuguese Restaurant,Women's Store,Dumpling Restaurant,...,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Diner,Department Store,Dim Sum Restaurant,Dessert Shop,Fast Food Restaurant,Deli / Bodega
2,Downtown Toronto,1,Coffee Shop,Bakery,Café,Park,Theater,Gym / Fitness Center,Breakfast Spot,Pub,...,Spa,Dessert Shop,Electronics Store,Event Space,Farmers Market,Ice Cream Shop,French Restaurant,Hotel,Historic Site,Health Food Store
3,North York,1,Furniture / Home Store,Clothing Store,Boutique,Coffee Shop,Miscellaneous Shop,Sporting Goods Shop,Vietnamese Restaurant,Accessories Store,...,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Falafel Restaurant,Dog Run,Women's Store,Diner,Dim Sum Restaurant,Farmers Market
4,Queen's Park,1,Coffee Shop,Park,Gym,Diner,Persian Restaurant,Seafood Restaurant,Sandwich Place,Burger Joint,...,Bar,Nightclub,Creperie,Mexican Restaurant,Fast Food Restaurant,Japanese Restaurant,Italian Restaurant,Hobby Shop,Wings Joint,Smoothie Shop
7,North York,1,Basketball Court,Gym / Fitness Center,Caribbean Restaurant,Café,Japanese Restaurant,Women's Store,Doner Restaurant,Donut Shop,...,Event Space,Falafel Restaurant,Farmers Market,Dog Run,Diner,Festival,Dim Sum Restaurant,Dessert Shop,Department Store,Deli / Bodega
8,East York,1,Fast Food Restaurant,Pizza Place,Pet Store,Athletics & Sports,Gastropub,Intersection,Pharmacy,Breakfast Spot,...,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant
9,Downtown Toronto,1,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Cosmetics Shop,Café,Italian Restaurant,Tea Room,Japanese Restaurant,...,Plaza,Bubble Tea Shop,Pizza Place,Bakery,Beer Bar,Seafood Restaurant,Comic Shop,Sandwich Place,Shopping Mall,Music Venue
11,Etobicoke,1,Bank,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,...,Festival,Dog Run,Diner,Filipino Restaurant,Creperie,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop
12,Scarborough,1,Bar,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,...,Festival,Dog Run,Diner,Filipino Restaurant,Creperie,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop
13,North York,1,Gym,Asian Restaurant,Coffee Shop,Beer Store,Clothing Store,Chinese Restaurant,Dim Sum Restaurant,Discount Store,...,Supermarket,Concert Hall,Fruit & Vegetable Store,Department Store,General Travel,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant


Toronto Cluster 3

In [92]:
#Explore Cluster with label 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
6,Scarborough,2,Fast Food Restaurant,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,...,Farmers Market,Diner,Dim Sum Restaurant,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop,Cuban Restaurant


Toronto Cluster 4 - The Park Cluster

In [93]:
#Explore Cluster with label 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
0,North York,3,Fast Food Restaurant,Food & Drink Shop,Park,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,...,Falafel Restaurant,Farmers Market,Diner,Women's Store,Festival,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop
21,York,3,Park,Women's Store,Market,Fast Food Restaurant,Concert Hall,Construction & Landscaping,Farmers Market,Comic Shop,...,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner,Festival,Dessert Shop,Department Store,Deli / Bodega
35,East York,3,Park,Convenience Store,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,...,Falafel Restaurant,Farmers Market,Diner,Dessert Shop,Festival,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop
40,North York,3,Park,Airport,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,...,Farmers Market,Fast Food Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant,Field,Department Store,Deli / Bodega,Dance Studio,Curling Ice
64,York,3,Park,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,...,Farmers Market,Diner,Dessert Shop,Festival,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop,Cuban Restaurant
66,North York,3,Park,Bank,Convenience Store,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,...,Farmers Market,Fast Food Restaurant,Dog Run,Dim Sum Restaurant,Diner,Field,Dessert Shop,Department Store,Deli / Bodega,Dance Studio
83,Central Toronto,3,Park,Tennis Court,Women's Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,...,Falafel Restaurant,Farmers Market,Diner,Department Store,Dessert Shop,Festival,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop
85,Scarborough,3,Park,Playground,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,...,Falafel Restaurant,Farmers Market,Diner,Department Store,Dessert Shop,College Stadium,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop
91,Downtown Toronto,3,Park,Trail,Playground,Building,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,...,Ethiopian Restaurant,Event Space,Falafel Restaurant,Diner,Department Store,Dessert Shop,Fast Food Restaurant,Deli / Bodega,Dance Studio,Curling Ice
98,Etobicoke,3,River,Park,Pool,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,...,Event Space,Falafel Restaurant,Dim Sum Restaurant,Women's Store,Department Store,College Stadium,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop


Toronto Cluster 5 - the Baseball Field Cluster

In [94]:
#Explore Cluster with label 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
57,North York,4,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,...,Festival,Dog Run,Diner,Filipino Restaurant,Creperie,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop
101,Etobicoke,4,Baseball Field,Pool,Women's Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,...,Falafel Restaurant,Farmers Market,Diner,Dessert Shop,Festival,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop


Toronto Cluster 6

In [95]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
50,North York,5,Empanada Restaurant,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,...,Fast Food Restaurant,Diner,Dim Sum Restaurant,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Curling Ice,Cupcake Shop,Cuban Restaurant


Toronto Cluster 7

In [96]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,...,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue
10,North York,6,Park,Pizza Place,Japanese Restaurant,Pub,Women's Store,Drugstore,Discount Store,Dog Run,...,Ethiopian Restaurant,Event Space,Falafel Restaurant,Diner,Department Store,Dessert Shop,Fast Food Restaurant,Deli / Bodega,Dance Studio,Curling Ice
61,Central Toronto,6,Park,Swim School,Bus Line,Women's Store,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,...,Falafel Restaurant,Farmers Market,Discount Store,Dim Sum Restaurant,Festival,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Curling Ice
63,York,6,Pizza Place,Caribbean Restaurant,Convenience Store,Bus Line,Women's Store,Dumpling Restaurant,Dog Run,Doner Restaurant,...,Event Space,Falafel Restaurant,Farmers Market,Discount Store,Dessert Shop,Dim Sum Restaurant,Colombian Restaurant,Department Store,Deli / Bodega,Dance Studio
77,Etobicoke,6,Pizza Place,Park,Mobile Phone Shop,Bus Line,Women's Store,Dumpling Restaurant,Dog Run,Doner Restaurant,...,Event Space,Falafel Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant,Colombian Restaurant,Department Store,Deli / Bodega,Dance Studio,Curling Ice


## Results Section Summary

Based on the clustering performed, we could group the two cities into 7 clusters. 

    Cluster 1: New York City: Italian Food Cluster
    Cluster 2: New York City: Mix
    Cluster 3: New York City: Staton Island
    Cluster 4: New York City: Park Cluster
    Cluster 5: New York City: Single Spot
    Cluster 6: New York City: Single Spot
    Cluster 7: New York City: Carribean Restaurant Cluster
   
    Cluster 1: Toronto: Playground
    Cluster 2: Toronto: Mix
    Cluster 3: Toronto: Single Spot
    Cluster 4: Toronto: Park Cluster
    Cluster 5: Toronto: Baseball Field Cluster
    Cluster 6: Toronto: Single Spot
    Cluster 7: Toronto: Mix

### Similarities and Dissimilarities

For New York City we see that we have managed to segment some very distinct clusters like the Italian Food Cluster and the Staton Island Cluster. 
For Both cities, there is one common cluster - that is the Park Cluster. 
For Both cities the majority of the neighbourhoods fall within the Mix cluster - that is the biggest one for each of the cities and is spreaded all over the cities
While for New York we see some very typical distinctions we could make based on the food preferences of the visitors /two food clusters/, we don't see such a distinction that could be made for Toronto. 
Also, the Staton island segmentation as a separate cluster forms the island as a distinctive part of the city with distinctive venues typical only for that part



## Discussion 

Based on the clusterization done, and the lack of more distinctive characteristics between the different clusters,it is clear that additional data for analysis to be included as new features could provide more distinction when segmenting the different clusters.
We think that additional clusterization could be done based on the the distances between the different venues, using DBSCAN for spatial clusterization. 
In this way we could use just the geo coordinates of the venues from Foursquare and form spatial clusters based on distanses between the venues. 
Then we will have more segmented clusters in terms of spatial perception. 

Also, for the comparison purpose of the similarities and dissimilarities between the two cities, we could include additonal features like: number of inhabitants per each neighbourhood, average income, household size, avg real-estate expense etc.
Such economic and social KPIs could be deployed in the analysis in addition to the venues frequency data so we could get more distinct and uniform clusters.
After that we could use the new clusters to compare better the two cities. Unfortunately, such information is not available on neighbourhood level, that is why we deployed the current analysis using just the Foursquare data.
This could be a good step for future enhancement.


# Conclusion

The current state of the modern technologies along with the development and enhancement of the location-based social network nowadays leads to new geographical features for spatial segmentation based on people's preferences.
Modern cities are much more than just administrative boundaries - they live their own life via people, places and events.
The current analysis used Foursquare API data to segment the neighbourhoods of the two metropolitan cities - New York and Toronto, based on the frequency of the venues visited. 
The analysis shows some certain similarites between the two cities /each one has a big Mix cluster and one Park cluster/, and also shows need to include more features from economic-behavioural field to be included in the analysis in order to segment better the cities' neighbourhoods.
Foursquare data is an excellent source for people' preferences, however, for more detailed analysis we need to include other economic and behavioral measurements. 
