# Segmenting and Clustering Toronto

## 1. Obtaining data from Wikipedia

### Importing the imporant libraries

In [1]:

# install necessary libs
#!pip install beautifulsoup4 
#!pip install requests
!pip install geocoder

import geocoder # import geocoder

import requests
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen
print('Libraries imported!')

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 6.4 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Libraries imported!


### Loading data from web by scraping the Wikipedia page by Beatifulsoup 

The data is loaded to tables variable

In [2]:
#Import data from web using Beautifil Soup
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')
tables = soup.find_all("table")


Adding data to the lists corresponding to the variable names: postal code; borough and neighbohood.

Checking if the length of each list is equal.

In [3]:
#Create array to hold the data we extract
postalcodes = []
boroughs = []
neighborhoods = []

for table in tables:
    rows = table.find_all('tr')
    
    for row in rows:
        cells = row.find_all('td')
        
        if (len(cells) > 1):
            try:
                postalcode = cells[0]
                borough = cells[1]
                neighborhood = cells[2]                
                if len(postalcode.text)==4:
                    postalcodes.append(postalcode.text[:-1])             
                    boroughs.append(borough.text[:-1])                
                    neighborhoods.append(neighborhood.text[:-1])
            except Exception as e:
                pass
print(len(postalcodes))
print(len(boroughs))
print(len(neighborhoods))


180
180
180


Create an empty data frame with columns names equivalent to the variables named as in the wikipedia

In [4]:
# define the dataframe columns
column_names = ['postalcode', 'borough', 'neighborhood'] 

# instantiate the dataframe
neighborhoods_df = pd.DataFrame(columns=column_names)

Assign the data to each list.

Check the head of dataframe.

In [5]:
neighborhoods_df['postalcode'] = postalcodes
neighborhoods_df['borough'] = boroughs
neighborhoods_df['neighborhood'] = neighborhoods
neighborhoods_df.head(10)

Unnamed: 0,postalcode,borough,neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"




Check the tail of dataframe. It's seen that many rows are assigned with "Not assigned".


In [6]:
neighborhoods_df.tail(10)

Unnamed: 0,postalcode,borough,neighborhood
170,M9Y,Not assigned,Not assigned
171,M1Z,Not assigned,Not assigned
172,M2Z,Not assigned,Not assigned
173,M3Z,Not assigned,Not assigned
174,M4Z,Not assigned,Not assigned
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."
179,M9Z,Not assigned,Not assigned



### Drop all of the rows with values of "Not assigned".


In [7]:
neighborhoods_df.drop(neighborhoods_df[neighborhoods_df['borough']=="Not assigned"].index, inplace = True) 


Check all the data


In [8]:
neighborhoods_df[41:60]


Unnamed: 0,postalcode,borough,neighborhood
66,M4K,East Toronto,"The Danforth West, Riverdale"
67,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange"
68,M6K,West Toronto,"Brockton, Parkdale Village, Exhibition Place"
72,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
73,M2L,North York,"York Mills, Silver Hills"
74,M3L,North York,Downsview
75,M4L,East Toronto,"India Bazaar, The Beaches West"
76,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel"
77,M6L,North York,"North Park, Maple Leaf Park, Upwood Park"
80,M9L,North York,Humber Summit



### The data with the same postal code is grouped, the neighborhood names are added and separated by a comma. The fuction agg(aggregate) is used.


In [9]:
# aggregate rows
neighborhoods_df = neighborhoods_df.groupby(['postalcode', 'borough'], as_index = False).agg({'neighborhood': ', '.join})
print("Number of rows after combining: ", neighborhoods_df.shape[0])
neighborhoods_df[41:60]


Number of rows after combining:  103


Unnamed: 0,postalcode,borough,neighborhood
41,M4K,East Toronto,"The Danforth West, Riverdale"
42,M4L,East Toronto,"India Bazaar, The Beaches West"
43,M4M,East Toronto,Studio District
44,M4N,Central Toronto,Lawrence Park
45,M4P,Central Toronto,Davisville North
46,M4R,Central Toronto,"North Toronto West, Lawrence Park"
47,M4S,Central Toronto,Davisville
48,M4T,Central Toronto,"Moore Park, Summerhill East"
49,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest..."
50,M4W,Downtown Toronto,Rosedale


### If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.

In [10]:
for i in range(len(neighborhoods_df['neighborhood'])):
    if neighborhoods_df.loc[i, 'neighborhood'] == 'Not assigned':
        neighborhoods_df.loc[i, 'neighborhood'] = neighborhoods_df.loc[i, 'borough']



Check how many rows are after cleaning data.


In [11]:
print("Number of rows:",neighborhoods_df.shape[0])

Number of rows: 103



## 2. Collect information of Latitude and Longitude and merge with the data collected in section 1



Loading data from the link provided by Coursera.

Change the name of column Postal Code to postalcode to serve the merging which will be executed later.

In [12]:
!wget -q -O 'toronto_geo.csv' http://cocl.us/Geospatial_data

# read the longtitude, latitude document into a data frame
geo_df = pd.read_csv("toronto_geo.csv")
geo_df.rename(columns={"Postal Code":"postalcode"}, inplace=True)
geo_df.head()

Unnamed: 0,postalcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the data from two dataframe based on the postalcode.

In [13]:

nb_toronto = pd.merge(neighborhoods_df, geo_df, how= 'inner', on = 'postalcode')
nb_toronto.head()

Unnamed: 0,postalcode,borough,neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Establish the connection with Foursquare 

In [14]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: BM4AAK42ZLI4JZ4MB450IAUDUBHFXYB5K3NFZPPJ5MY3CJHC
CLIENT_SECRET:J4JY41DBVS2YOS24MJLDUKJ42M3AZPQD0F53JOAKWYG4PIMR


### Only keep the row contains "Toronto"

In [15]:
nb_toronto = nb_toronto[nb_toronto['borough'].str.contains("Toronto") | nb_toronto['borough'].str.contains('toronto')] 
nb_toronto.reset_index(drop=True, inplace=True)
nb_toronto.head()

Unnamed: 0,postalcode,borough,neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


### Install and import folium for mapping

In [16]:
# import folium library for mapping
!pip install folium
import folium
print("folium installed!")

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 7.5 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
folium installed!



### Get Latitude and Longitude of the interested points by averaging of values in the tables.

### Draw the points corresponding to each neughborhood.


In [17]:
#Cordiantes of Toronto

latitude =nb_toronto['Latitude'].mean()
longitude =nb_toronto['Longitude'].mean()

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(nb_toronto['Latitude'], nb_toronto['Longitude'], nb_toronto['borough'], nb_toronto['neighborhood']):
    label = folium.Popup(neighborhood, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## 3. Explore and cluster the neighborhoods in Toronto

### Establish the url to explore the venues specified by Latitude and Longitude from Four Square. 

In [18]:
# type your answer here
LIMIT=100
radius=500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=BM4AAK42ZLI4JZ4MB450IAUDUBHFXYB5K3NFZPPJ5MY3CJHC&client_secret=J4JY41DBVS2YOS24MJLDUKJ42M3AZPQD0F53JOAKWYG4PIMR&v=20180605&ll=43.66713498717947,-79.38987324871795&radius=500&limit=100'


### Obtain the results under json format and feed them into a variable.


In [19]:
results = requests.get(url).json()



### Function to obtain the near by venues


In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


### Run the custom function above and print outputs.


In [21]:

trt_venues = getNearbyVenues(nb_toronto['neighborhood'],
                                   nb_toronto['Latitude'],
                                   nb_toronto['Longitude']
                                  )

print(trt_venues.shape)
trt_venues.head()

The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West,  Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Beaches,43.676357,-79.293031,Seaspray Restaurant,43.678888,-79.298167,Asian Restaurant


In [22]:
# one hot encoding
trt_onehot = pd.get_dummies(trt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
trt_onehot['neighborhood'] = trt_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [trt_onehot.columns[-1]] + list(trt_onehot.columns[:-1])
trt_onehot = trt_onehot[fixed_columns]

print(trt_onehot.shape)
trt_onehot.head()

(1605, 237)


Unnamed: 0,neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
trt_grouped = trt_onehot.groupby('neighborhood').mean().reset_index()
trt_grouped.shape

(39, 237)

### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [24]:
num_top_venues = 5

for hood in trt_grouped['neighborhood']:
    print("----"+hood+"----")
    temp = trt_grouped[trt_grouped['neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')




----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1        Cocktail Bar  0.05
2      Farmers Market  0.04
3  Seafood Restaurant  0.04
4         Cheese Shop  0.04


----Brockton, Parkdale Village, Exhibition Place----
            venue  freq
0            Café  0.14
1  Breakfast Spot  0.09
2     Coffee Shop  0.09
3    Intersection  0.05
4         Stadium  0.05


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
           venue  freq
0    Yoga Studio  0.06
1  Auto Workshop  0.06
2        Brewery  0.06
3        Butcher  0.06
4            Spa  0.06


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0   Airport Service  0.17
1    Airport Lounge  0.11
2  Airport Terminal  0.11
3           Airport  0.06
4               Bar  0.06


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.18
1                Café 

### Function returning the most common venues

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Show the 10 most common venues

In [26]:
# create a new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['neighborhood'] = trt_grouped['neighborhood']

for ind in np.arange(trt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(trt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Restaurant,Farmers Market,Beer Bar,Bakery,Seafood Restaurant,Bistro,Basketball Stadium
1,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Breakfast Spot,Performing Arts Venue,Stadium,Burrito Place,Restaurant,Climbing Gym,Pet Store,Bakery
2,"Business reply mail Processing Centre, South C...",Yoga Studio,Auto Workshop,Park,Pizza Place,Restaurant,Butcher,Burrito Place,Brewery,Skate Park,Light Rail Station
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Sculpture Garden,Airport Food Court,Airport Gate,Bar,Boat or Ferry,Boutique
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Salad Place,Bubble Tea Shop,Burger Joint,Yoga Studio,Portuguese Restaurant,Indian Restaurant


In [27]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

### Run _k_-means to cluster the neighborhood into 5 clusters.

In [28]:
# set number of clusters
kclusters = 5

trt_grouped_clustering = trt_grouped.drop('neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(trt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

### Create the dataframe

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

trt_merged = nb_toronto

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
trt_merged = trt_merged.join(neighborhoods_venues_sorted.set_index('neighborhood'), on='neighborhood')

trt_merged.head() # check the last columns!

Unnamed: 0,postalcode,borough,neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Pub,Health Food Store,Asian Restaurant,Trail,Neighborhood,Yoga Studio,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bubble Tea Shop,Indian Restaurant,Spa,Cosmetics Shop,Juice Bar
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,0,Fast Food Restaurant,Pizza Place,Park,Brewery,Sandwich Place,Liquor Store,Fish & Chips Shop,Italian Restaurant,Restaurant,Steakhouse
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Coffee Shop,Brewery,Café,Gastropub,American Restaurant,Bakery,Yoga Studio,Neighborhood,Cheese Shop,Clothing Store
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,1,Park,Bus Line,Swim School,Filipino Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [30]:

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

### Visualize the results

In [31]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(trt_merged['Latitude'], trt_merged['Longitude'], trt_merged['neighborhood'], trt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster 1

In [32]:
# examine cluster 1
trt_merged.loc[trt_merged['Cluster Labels'] == 0, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]].head()

Unnamed: 0,borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Pub,Health Food Store,Asian Restaurant,Trail,Neighborhood,Yoga Studio,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop
1,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bubble Tea Shop,Indian Restaurant,Spa,Cosmetics Shop,Juice Bar
2,East Toronto,0,Fast Food Restaurant,Pizza Place,Park,Brewery,Sandwich Place,Liquor Store,Fish & Chips Shop,Italian Restaurant,Restaurant,Steakhouse
3,East Toronto,0,Coffee Shop,Brewery,Café,Gastropub,American Restaurant,Bakery,Yoga Studio,Neighborhood,Cheese Shop,Clothing Store
5,Central Toronto,0,Gym / Fitness Center,Sandwich Place,Park,Pizza Place,Breakfast Spot,Department Store,Hotel,Food & Drink Shop,Doner Restaurant,Donut Shop


## Cluster 2

In [33]:
# examine cluster 2
trt_merged.loc[trt_merged['Cluster Labels'] == 1, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]].head()

Unnamed: 0,borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,1,Park,Bus Line,Swim School,Filipino Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


## Cluster 3

In [34]:
# examine cluster 3
trt_merged.loc[trt_merged['Cluster Labels'] == 2, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]].head()

Unnamed: 0,borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,2,Park,Lawyer,Trail,Yoga Studio,Discount Store,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
10,Downtown Toronto,2,Park,Playground,Trail,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


## Cluster 4

In [35]:
# examine cluster 4
trt_merged.loc[trt_merged['Cluster Labels'] == 3, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]].head()

Unnamed: 0,borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,3,Trail,Jewelry Store,Mexican Restaurant,Sushi Restaurant,Yoga Studio,Distribution Center,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


## Cluster 5

In [36]:
# examine cluster 5
trt_merged.loc[trt_merged['Cluster Labels'] == 4, trt_merged.columns[[1] + list(range(5, trt_merged.shape[1]))]].head()

Unnamed: 0,borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,4,Garden,Health & Beauty Service,Home Service,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Yoga Studio
