# Segmenting and Clustering Neighborhoods in Toronto

In [1]:
# Import necessary libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import wget
import os.path
from os import path
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import json
from pandas.io.json import json_normalize
import folium

## Part 1

To begin our project of segmenting and clustering neighbourhoods in Toronto we must first define the neighbourhoods in Toronto. Neighbourhoods often lack clear boundaries but are relatively well defined through legal processes. In particular, areas in Toronto have designated postcodes and are often assigned borough and neighbourhood names. Such information is available on the following Wikipedia page which the below code will scrape: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. We seek a dataframe with three columns: Postalcode, Borough and Neighbourhood. We do not want boroughs with unassigned names in our dataframe, and we want neighbourhoods with unassigned names to have the same name as its corresponding borough. We also want different neighbourhoods within the same postcode to be coalesced into a single entry such that every row of the dataframe has a unique postcode.

Rather than scraping the table then perform cleaning, the below code does both simultaneously. The code does so by appending rows one by one and:
1. Refusing to append rows when the borough is not assigned
2. Appending the borough name to the neighbourhood when the neighbourhood is not assigned
3. Combining old neighbourhood entries with new ones and replacing the old entry when the postcode is the same as the one in the previous entry

In [3]:
# Initialise empty dataframe
hood = pd.DataFrame(columns = ['Postcode', 'Borough', 'Neighbourhood'])

# Scrape html code of table in webpage
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'lxml')
table = soup.table

# Create for loop to fill in the empty dataframe, subject to certain conditions (see below code)
for entry in table.find_all('tr')[1:]:
    # Obtain postcode, borough and neighbourhood for each row entry
    entry_list = entry.text.split('\n')
    postcode = entry_list[1]
    borough = entry_list[2]
    neighbourhood = entry_list[3]
    # Condition 1: do not append those rows where borough is not assigned
    if borough == "Not assigned":
        continue
    # Condition 2: give neighbourhoods that are not assigned the same name as borough 
    if neighbourhood == "Not assigned": 
        neighbourhood = entry_list[2]
    # Condition 3: combine neighbourhood entries for entries with same postcode
    try:
        # Do so by changing new neighbourhood entry to include previous neighbourhoods and removing old entry if postcode matches previous entry
        if postcode == hood.iloc[-1, 0]: 
            neighbourhood = "{}, {}".format(hood.iloc[-1, 2], neighbourhood)
            hood.drop(hood.index[-1], inplace = True)
    # Use try/except clause to prevent error in first iteration when dataframe is empty and "previous entry" is undefined
    except:
        pass
    # Append postcode, borough and neighbourhood (possibly modified) to dataframe
    hood = hood.append({'Postcode' : postcode, 'Borough' : borough, 'Neighbourhood' : neighbourhood}, ignore_index = True)

# View top 10 entries of dataframe
hood.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [4]:
# Find shape of dataframe
hood.shape

(103, 3)

In [5]:
# Check that each row has unique postcode by comparing with shape
len(set(hood['Postcode']))

103

## Part 2

In the next part of our analysis we wish to append the latitude and longitude coordinates as columns to our dataframe obtained in part 1. As the geocoder package was taking too long to obtain geospatial data on the postcodes we instead used a csv file already containing this data. We download the file then read it in as a pandas dataframe. Finally, we merge the new dataframe (on geospatial data) with out previous dataframe (on neighbourhood data) to obtain our desired dataframe.

In [6]:
# Download the file corresponding to geospatial coordinates of postcodes if it doesn't exist
if path.exists('Geospatial_Coordinates.csv') == False:
    url = 'https://cocl.us/Geospatial_data'
    wget.download(url)

# Read in the csv file as a pandas dataframe
geo = pd.read_csv('Geospatial_Coordinates.csv')

# Check what dataframe looks like so we know how to proceed
geo.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
# Check that number of postal codes matches that of our previous dataframe containing neighbourhoods
geo.shape

(103, 3)

In [8]:
# Merge the two dataframes into a single dataframe and renaming the Postcode/Postal Code column to PostalCode as instructed
hood.rename(columns = {'Postcode' : 'PostalCode'}, inplace = True)
geo.rename(columns = {'Postal Code' : 'PostalCode'}, inplace = True)
df = pd.merge(left = hood, right = geo, left_on = 'PostalCode', right_on = 'PostalCode')

# View top 10 entries of new dataframe
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


## Part 3

Here we first use the geolocator package to obtain the geographical coordinates of Toronto so we can visualise a map centred around Toronto. We then proceed to add markers corresponding to each postcode using the latitude and longitude data in the dataframe we obtained in part 2. Our next steps will then work towards the goal of clustering these neighbourhoods based on the types of venues in each neighbourhood.

In [9]:
address = 'Toronto'

geolocator = Nominatim(user_agent="gta_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [10]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postcode, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['PostalCode'], df['Borough'], df['Neighbourhood']):
    label = '{}, {} ({})'.format(postcode, borough, neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

For each of the postcodes/groups of neighbourhoods, we now use the explore query to the Foursquare API to obtain nearby venues and their details (including name, latitude, longitude and venue category). We do so by first defining some parameters used in our API call then creating a function that loops through our dataframe from part 2 to systematically extract these venue details and append them in a dataframe along with the corresponding neighbourhood name and coordinates. We then apply this function and preview the resulting dataframe.

In [11]:
CLIENT_ID = 'ZGMWE0F5NFNKSPS1NTGJCLUWQRKW1SIDN0143DK0ZMCE3G3D' # your Foursquare ID
CLIENT_SECRET = 'NWJYE11CG3B2DCKGM0OCFN11KCB53KITLTEXSPGJL4QDGAMT' # your Foursquare Secret
VERSION = '20200211' # Foursquare API version
ACCESS_TOKEN = 'OISRZIBHSE0NS5KSQT1QR1A2XMM3BOUWMSXRKSCHANJ4E3CL'
LIMIT = 100
radius = 500

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&oauth_token={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            ACCESS_TOKEN, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
toronto_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

In [17]:
print(toronto_venues.shape)
toronto_venues.head()

(2738, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Careful & Reliable Painting,43.752622,-79.331957,Construction & Landscaping
2,Parkwoods,43.753259,-79.329656,649 Variety,43.754513,-79.331942,Convenience Store
3,Parkwoods,43.753259,-79.329656,Sun Life,43.75476,-79.332783,Construction & Landscaping
4,Parkwoods,43.753259,-79.329656,GTA Restoration,43.753396,-79.333477,Fireworks Store


In [18]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,7,7,7,7,7,7
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",1,1,1,1,1,1
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",26,26,26,26,26,26
"Alderwood, Long Branch",13,13,13,13,13,13
...,...,...,...,...,...,...
Willowdale West,9,9,9,9,9,9
Woburn,9,9,9,9,9,9
"Woodbine Gardens, Parkview Hill",16,16,16,16,16,16
Woodbine Heights,16,16,16,16,16,16


In [19]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 303 uniques categories.


Our venues dataframe in its current form simply has as a word entry under the venue category column. We want to change this to a form which allows our clustering algorithms to work. We choose to perform a one hot encoding here, splitting the venue category columns to many columns each corresponding to a unique venue category. A 1 represents the presence of that venue category within the neighbourhood and 0 represents an absence. We then convert the resultant dataframe into proportional frequencies and add back into the dataframe the neighbourhood column.

In [20]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
toronto_onehot.shape

(2738, 303)

In [22]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.000000,0.0,0.0,0.01,0.0,0.0,0.0
1,Agincourt,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0000,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.038462,0.0,0.0,0.00,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Willowdale West,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0,0.0
97,Woburn,0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0,0.0
98,"Woodbine Gardens, Parkview Hill",0.0,0.0000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0,0.0
99,Woodbine Heights,0.0,0.0625,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00,0.0,0.062500,0.0,0.0,0.00,0.0,0.0,0.0


In [23]:
toronto_grouped.shape

(101, 303)

We now define a function that returns the most common venues in each neighbourhood and apply this to our data to create a new dataframe which lists the top 10 most common venues for each neighbourhood.

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,American Restaurant,Steakhouse,Sushi Restaurant,Cosmetics Shop,Thai Restaurant,Asian Restaurant,Gym,Bar
1,Agincourt,Hardware Store,Clothing Store,Lounge,Fireworks Store,Breakfast Spot,Latin American Restaurant,Skating Rink,Creperie,Convenience Store,Electronics Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Women's Store,Empanada Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Electronics Store,Pizza Place,Pharmacy,Mobile Phone Shop,Grocery Store,Greek Restaurant,Sandwich Place,Japanese Restaurant,Discount Store,Indie Movie Theater
4,"Alderwood, Long Branch",Pizza Place,Pharmacy,Pub,Pool,Dance Studio,Spa,Sandwich Place,Athletics & Sports,Automotive Shop,Skating Rink


Finally, to cluster our neighbourhoods we perform a k-means clustering using a value of k = 5. We append the resulting cluster labels as well as our original postcode dataframe along with the most common venues to create a dataframe set for map creation.

In [26]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 4, 2, 2, 2, 2, 2, 2, 2])

In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood', how = 'right')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Construction & Landscaping,Park,Fireworks Store,Convenience Store,BBQ Joint,Bus Stop,Food & Drink Shop,Electronics Store,Donut Shop,Drugstore
1,M4A,North York,Victoria Village,43.725882,-79.315572,2,Intersection,French Restaurant,Financial or Legal Service,Coffee Shop,Hockey Arena,Pizza Place,Portuguese Restaurant,Eastern European Restaurant,Dog Run,Doner Restaurant
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,2,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Mexican Restaurant,Gym / Fitness Center,Theater,Brewery
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,2,Clothing Store,Furniture / Home Store,Accessories Store,Women's Store,Home Service,Shoe Store,Miscellaneous Shop,Lighting Store,Kids Store,Gift Shop
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,2,Coffee Shop,Sushi Restaurant,Park,Japanese Restaurant,Burger Joint,Diner,Beer Bar,Seafood Restaurant,Boutique,Salad Place


We create a map using folium to visualise the clustered neighbourhoods.

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Below, we preview the 5 clusters to look at their commonalities and how we might name/define each cluster.

In [29]:
# Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Construction & Landscaping,Park,Fireworks Store,Convenience Store,BBQ Joint,Bus Stop,Food & Drink Shop,Electronics Store,Donut Shop,Drugstore
12,Scarborough,0,Moving Target,Construction & Landscaping,Golf Course,Women's Store,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
21,York,0,Park,Spa,Market,Fast Food Restaurant,Women's Store,Grocery Store,Eastern European Restaurant,Doctor's Office,Dog Run,Doner Restaurant
35,East York,0,Film Studio,Park,Convenience Store,Metro Station,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
40,North York,0,Park,Other Repair Shop,Construction & Landscaping,Airport,Electronics Store,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
45,North York,0,Park,Martial Arts Dojo,Cafeteria,Farmers Market,Fast Food Restaurant,Farm,Falafel Restaurant,Fabric Shop,Event Space,Doctor's Office
49,North York,0,Park,Bakery,Construction & Landscaping,Massage Studio,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant
57,North York,0,Paper / Office Supplies Store,Construction & Landscaping,Fabric Shop,Baseball Field,Women's Store,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
61,Central Toronto,0,Park,Gym / Fitness Center,Bus Line,Lawyer,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
64,York,0,Park,Electronics Store,Theater,Convenience Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant


In [30]:
# Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,1,Fast Food Restaurant,Home Service,Women's Store,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
11,Etobicoke,1,Print Shop,Home Service,Gift Shop,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
50,North York,1,Pharmacy,Pizza Place,Home Service,Empanada Restaurant,Women's Store,Eastern European Restaurant,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop
52,North York,1,Home Service,Women's Store,Ethiopian Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant
53,North York,1,Home Service,Baseball Field,Korean Restaurant,Business Service,Farmers Market,Farm,Falafel Restaurant,Fast Food Restaurant,Fabric Shop,Event Space
62,Central Toronto,1,Health & Beauty Service,Home Service,Ice Cream Shop,Pool,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


In [31]:
# Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,2,Intersection,French Restaurant,Financial or Legal Service,Coffee Shop,Hockey Arena,Pizza Place,Portuguese Restaurant,Eastern European Restaurant,Dog Run,Doner Restaurant
2,Downtown Toronto,2,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Mexican Restaurant,Gym / Fitness Center,Theater,Brewery
3,North York,2,Clothing Store,Furniture / Home Store,Accessories Store,Women's Store,Home Service,Shoe Store,Miscellaneous Shop,Lighting Store,Kids Store,Gift Shop
4,Downtown Toronto,2,Coffee Shop,Sushi Restaurant,Park,Japanese Restaurant,Burger Joint,Diner,Beer Bar,Seafood Restaurant,Boutique,Salad Place
5,Queen's Park,2,Coffee Shop,Sushi Restaurant,Park,Japanese Restaurant,Burger Joint,Diner,Beer Bar,Seafood Restaurant,Boutique,Salad Place
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Downtown Toronto,2,Restaurant,Italian Restaurant,Café,Park,Thai Restaurant,Market,Butcher,Jewelry Store,Bank,Bakery
97,Downtown Toronto,2,Coffee Shop,Café,Steakhouse,Restaurant,Asian Restaurant,Gastropub,Gym,Hotel,Bar,Deli / Bodega
99,Downtown Toronto,2,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Burger Joint,Yoga Studio,Gym,Restaurant,Theater,Mexican Restaurant
100,East Toronto,2,Gym / Fitness Center,Fast Food Restaurant,Light Rail Station,Yoga Studio,Auto Workshop,Park,Pizza Place,Butcher,Recording Studio,Restaurant


In [32]:
# Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,3,Women's Store,Doctor's Office,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant


In [33]:
# Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
85,Scarborough,4,Park,Women's Store,Empanada Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
91,Downtown Toronto,4,Park,Trail,Playground,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant


### References

The code in part 3 was largely based on code by Alex Aklson and Polong Lin in the *Segementing and Clustering Neighborhoods in New York City* lab.