# Toronto Neighborhood Clustering

This notebook comprises all three parts of the Toronto Neighborhood Clustering Assignment

## Part 1: Data Scraping

First, import all of the libraries and packages needed for this assignment. The folium install step may take several minutes. 

In [1]:
import pandas as pd
!conda install -c conda-forge folium=0.5.0 --yes
import folium
import requests
from pandas.io.json import json_normalize
import numpy as np
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors

Solving environment: done

# All requested packages already installed.



Next, read the postal code data from the provided URL

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

Drop any rows from the table which do not have a borough assigned

In [3]:
df = df[df['Borough'] != 'Not assigned'].reset_index(drop=True)

For any postcodes that do not have a neighborhood assigned, set the neighborhood name equal to the borough name 

In [4]:
df.loc[df['Neighbourhood'] =='Not assigned', 'Neighbourhood'] = df['Borough']


Group and join the rows such that any neighborhoods which share the same postcode are listed in the same row, separated by a comma

In [5]:
df = df.groupby(['Postcode','Borough'], sort=False).agg( ','.join)

Reset the table indices (which were altered by the grouping) to avoid issues using the dataframe in later steps

In [39]:
df=df.reset_index()
df

Unnamed: 0,index,Postcode,Borough,Neighbourhood
0,0,M3A,North York,Parkwoods
1,1,M4A,North York,Victoria Village
2,2,M5A,Downtown Toronto,Harbourfront
3,3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,4,M7A,Downtown Toronto,Queen's Park
5,5,M9A,Queen's Park,Queen's Park
6,6,M1B,Scarborough,"Rouge,Malvern"
7,7,M3B,North York,Don Mills North
8,8,M4B,East York,"Woodbine Gardens,Parkview Hill"
9,9,M5B,Downtown Toronto,"Ryerson,Garden District"


Check the shape of the dataframe

In [7]:
df.shape

(103, 3)

## Part 2: Add latitude and longitude data to the table

Note: This notebook uses the provided csv file, since the geocoder package was not able to reliably retrieve the location data

Read in the .csv file from the url

In [8]:
lldf=pd.read_csv('https://cocl.us/Geospatial_data')

In [9]:
lldf

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


## Part 3: Explore and Cluster Neighborhoods

Combine the original dataframe (which has multiple neighborhoods for some postal codes) with the latitude and longitude data (merged by matching the postal codes)
Only use the neighborhoods whose boroughs contain "Toronto"

In [10]:
newdf=pd.merge(df, lldf, left_on='Postcode', right_on='Postal Code')
newdf=newdf[newdf['Borough'].str.contains('Toronto')]

Define a function which uses the Foursquare API to get the venues near each location

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Define the Foursquare API client info needed to make requests to Foursquare

In [19]:
CLIENT_ID = 'N5XIY3IIQOWL05E0BKAC31DYDP12D54INLICRBEOZIU3IBJ3'
CLIENT_SECRET = 'DOLZ305KGS5Q2ZE2MQJRNL53MJQW3CVCHIMZPD0RNPHD3GZX'
VERSION = '20180605'
LIMIT=100

Call the function to retrieve the list of venues.
Note: this assumes the goal is to group "locations" by postal code, so many "locations" will include multiple neighborhoods.

In [20]:
toronto_venues = getNearbyVenues(names=newdf['Neighbourhood'],
                                   latitudes=newdf['Latitude'],
                                   longitudes=newdf['Longitude']
                                  )

Harbourfront
Queen's Park
Ryerson,Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide,King,Richmond
Dovercourt Village,Dufferin
Harbourfront East,Toronto Islands,Union Station
Little Portugal,Trinity
The Danforth West,Riverdale
Design Exchange,Toronto Dominion Centre
Brockton,Exhibition Place,Parkdale Village
The Beaches West,India Bazaar
Commerce Court,Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North,Forest Hill West
High Park,The Junction South
North Toronto West
The Annex,North Midtown,Yorkville
Parkdale,Roncesvalles
Davisville
Harbord,University of Toronto
Runnymede,Swansea
Moore Park,Summerhill East
Chinatown,Grange Park,Kensington Market
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown,St. James Town
First Canadian Place,Underground city

Check the head of the dataframe to confirm the results look reasonable

In [21]:
toronto_venues.head(20)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
5,Harbourfront,43.65426,-79.360636,Dominion Pub and Kitchen,43.656919,-79.358967,Pub
6,Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
7,Harbourfront,43.65426,-79.360636,Corktown Common,43.655618,-79.356211,Park
8,Harbourfront,43.65426,-79.360636,The Distillery Historic District,43.650244,-79.359323,Historic Site
9,Harbourfront,43.65426,-79.360636,SOMA chocolatemaker,43.650622,-79.358127,Chocolate Shop


Count the number of venues found for each location (remembering that a location is based on postal code and may contain groupings of neighborhoods)

In [22]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,56,56,56,56,56,56
"Brockton,Exhibition Place,Parkdale Village",24,24,24,24,24,24
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",18,18,18,18,18,18
"Cabbagetown,St. James Town",47,47,47,47,47,47
Central Bay Street,82,82,82,82,82,82
"Chinatown,Grange Park,Kensington Market",87,87,87,87,87,87
Christie,17,17,17,17,17,17
Church and Wellesley,86,86,86,86,86,86


Count the number of unique types of venues

In [23]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 230 uniques categories.


Create "one-hot" encoded dataframe for use in frequency calculation

In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Check shape of the dataframe

In [25]:
toronto_onehot.shape

(1705, 230)

Group the data by neighborhood (or combinations of neighborhoods) 

In [26]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.055556,0.055556,0.055556,0.111111,0.166667,0.111111,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.045977,0.0,0.057471,0.011494,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,...,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.011628,0.0


In [27]:
toronto_grouped.shape

(39, 230)

Find the top five types of venues in each grouping of neighbhorhoods

In [28]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
         venue  freq
0  Coffee Shop  0.07
1   Steakhouse  0.04
2         Café  0.04
3          Bar  0.04
4   Restaurant  0.03


----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.05
2              Bakery  0.04
3  Seafood Restaurant  0.04
4         Cheese Shop  0.04


----Brockton,Exhibition Place,Parkdale Village----
            venue  freq
0            Café  0.12
1     Coffee Shop  0.08
2       Nightclub  0.08
3  Breakfast Spot  0.08
4    Climbing Gym  0.04


----Business Reply Mail Processing Centre 969 Eastern----
           venue  freq
0    Yoga Studio  0.06
1  Garden Center  0.06
2    Pizza Place  0.06
3     Comic Shop  0.06
4     Restaurant  0.06


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
              venue  freq
0   Airport Service  0.17
1    Airport Lounge  0.11
2  Airport Terminal  0.11
3       Coffee Shop  0.06
4               B

Define a function which will return the top [num_top_venues] venues in a given neighborhood grouping

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a new dataframe including the 10 most frequent venue types in each grouping of neighborhoods

In [30]:

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns).reset_index(drop=True)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Steakhouse,Café,Bar,Thai Restaurant,Cosmetics Shop,Restaurant,Burger Joint,Hotel,Asian Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Cheese Shop,Beer Bar,Bakery,Steakhouse,Café,Farmers Market,Park
2,"Brockton,Exhibition Place,Parkdale Village",Café,Breakfast Spot,Nightclub,Coffee Shop,Climbing Gym,Office,Burrito Place,Italian Restaurant,Intersection,Stadium
3,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Auto Workshop,Park,Comic Shop,Pizza Place,Butcher,Restaurant,Burrito Place,Brewery,Light Rail Station
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Lounge,Airport Terminal,Coffee Shop,Harbor / Marina,Boutique,Boat or Ferry,Rental Car Location,Bar,Plane


Perform the K-means clustering to cluster the neighbhorhood groupings based on similarity of venue types

In [31]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0,
       4, 0, 3, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Add labels to the clusters

In [32]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = newdf


# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), how = 'left',on='Neighbourhood')


toronto_merged.head() # check the last columns!


Unnamed: 0,Postcode,Borough,Neighbourhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,Harbourfront,M5A,43.65426,-79.360636,0,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Café,Restaurant,Mexican Restaurant,Dessert Shop,Chocolate Shop
4,M7A,Downtown Toronto,Queen's Park,M7A,43.662301,-79.389494,0,Coffee Shop,Park,Gym,College Auditorium,Seafood Restaurant,Sandwich Place,Burger Joint,Burrito Place,Café,Portuguese Restaurant
9,M5B,Downtown Toronto,"Ryerson,Garden District",M5B,43.657162,-79.378937,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Japanese Restaurant,Ramen Restaurant,Sporting Goods Shop,Italian Restaurant,Lingerie Store
15,M5C,Downtown Toronto,St. James Town,M5C,43.651494,-79.375418,0,Coffee Shop,Restaurant,Café,Beer Bar,Breakfast Spot,Cosmetics Shop,Italian Restaurant,Bakery,Cocktail Bar,Hotel
19,M4E,East Toronto,The Beaches,M4E,43.676357,-79.293031,0,Other Great Outdoors,Pub,Trail,Health Food Store,Women's Store,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant


Draw the map of Toronto with the neighborhoods superimposed and color-coded by cluster

In [33]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="To_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude


# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Explore the clusters to see what types of venues define them

## Observations

In general, we find that most of the neighborhood groupings fall into the same cluster (Cluster 0), with only a few unique variations. This tells us that many of the investigated locations share many of the same types of venues. 

Cluster 0: Characterized by coffee shops, cafes, banks, bars, hotels, and similar common city venues. 

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,-79.360636,0,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Café,Restaurant,Mexican Restaurant,Dessert Shop,Chocolate Shop
4,Downtown Toronto,-79.389494,0,Coffee Shop,Park,Gym,College Auditorium,Seafood Restaurant,Sandwich Place,Burger Joint,Burrito Place,Café,Portuguese Restaurant
9,Downtown Toronto,-79.378937,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Middle Eastern Restaurant,Japanese Restaurant,Ramen Restaurant,Sporting Goods Shop,Italian Restaurant,Lingerie Store
15,Downtown Toronto,-79.375418,0,Coffee Shop,Restaurant,Café,Beer Bar,Breakfast Spot,Cosmetics Shop,Italian Restaurant,Bakery,Cocktail Bar,Hotel
19,East Toronto,-79.293031,0,Other Great Outdoors,Pub,Trail,Health Food Store,Women's Store,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
20,Downtown Toronto,-79.373306,0,Coffee Shop,Cocktail Bar,Seafood Restaurant,Cheese Shop,Beer Bar,Bakery,Steakhouse,Café,Farmers Market,Park
24,Downtown Toronto,-79.387383,0,Coffee Shop,Italian Restaurant,Burger Joint,Sandwich Place,Café,Juice Bar,Japanese Restaurant,Ice Cream Shop,Salad Place,Bakery
25,Downtown Toronto,-79.422564,0,Grocery Store,Café,Park,Candy Store,Baby Store,Gas Station,Coffee Shop,Nightclub,Restaurant,Diner
30,Downtown Toronto,-79.384568,0,Coffee Shop,Steakhouse,Café,Bar,Thai Restaurant,Cosmetics Shop,Restaurant,Burger Joint,Hotel,Asian Restaurant
31,West Toronto,-79.442259,0,Pharmacy,Bakery,Middle Eastern Restaurant,Bank,Bar,Café,Music Venue,Supermarket,Gym / Fitness Center,Grocery Store


Cluster 1: Cluster of only one location characterized by outdoor activities (park, playground, trail)

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Downtown Toronto,-79.377529,1,Park,Playground,Trail,Cupcake Shop,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store


Cluster 2: Cluster of only one location characterized by a seemingly random collection of venues (stores, restaurants, garden)

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,-79.416936,2,Garden,Music Venue,Women's Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


Cluster 3: Cluster of only one location also characterized by seemginly random collection of venues (tennis court, electronics store, assorted restaurants)

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
83,Central Toronto,-79.38316,3,Tennis Court,Women's Store,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store


Cluster 4: Small cluster of only two venues, sharing a park as the most common venue with a Deli, dumpling restaurant, and donut shop in each location

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Central Toronto,-79.38879,4,Park,Swim School,Bus Line,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store
68,Central Toronto,-79.411307,4,Park,Jewelry Store,Trail,Sushi Restaurant,Bus Line,Women's Store,Deli / Bodega,Dumpling Restaurant,Donut Shop,Doner Restaurant
