# Clustering Neighborhoods in Toronto

## A - Scrape Wikipedia page
### 1. Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
### 2. transform the data into a pandas dataframe.

Optionally: Installation of library lxml

In [1]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


Import of request and pandas library

In [2]:
import requests
import pandas as pd

Saving wiki link to variable

In [3]:
html_wiki_postal_codes_can = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html_wiki_postal_codes_can

'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Reading of Wiki table (searching for the match with the words "Postal Code") and checking first rows

In [4]:
dfs = pd.read_html(html_wiki_postal_codes_can, match='Postal Code')
dfs[0].head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Checking last 5 rows

In [5]:
dfs[0].tail()

Unnamed: 0,Postal Code,Borough,Neighborhood
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."
179,M9Z,Not assigned,Not assigned


In [6]:
df = dfs[0]

Replace all "Not assigned" neighborhoods with their borough values

In [7]:
df.replace(df.Neighborhood != 'Not assigned', df.Borough, inplace = True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Remove all boroughs, which aren't assigned

In [8]:
df = df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Determine the shape of the dataframe

In [9]:
df.shape

(103, 3)

## B - Getting Latitude and Longitude for dataframe
### 1. Used library: pgeocode
### 2. Merging data: dataframe df from above + Longitude and Latitude.

Optionally: installation of pgeocode package

In [10]:
pip install pgeocode

Note: you may need to restart the kernel to use updated packages.


Import of library and setting country to Canada for object nomi

In [11]:
import pgeocode
nomi = pgeocode.Nominatim('ca')

Definition of function to request the Longitude and Latitude using the package pgeocode

In [12]:
def calculate_LatLon(row):
    df_pgeocode = nomi.query_postal_code(row['Postal Code'])
    return pd.Series({'Latitude': df_pgeocode['latitude'], 'Longitude': df_pgeocode['longitude']})

Apply function "calculate_LatLon" and merge the dataframe df and returned Latitude and Longitude columns of called function

In [13]:
df = df.merge(df.apply(calculate_LatLon, axis=1), left_index=True, right_index=True)
df.shape

(103, 5)

Check first 5 rows

In [14]:
df.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.7545,-79.33
3,M4A,North York,Victoria Village,43.7276,-79.3148
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889


Check for NaN values

In [15]:
#df.describe()
print('NaN Latitude values :' ,df['Latitude'].isnull().sum())
print('NaN Longitude values :' ,df['Longitude'].isnull().sum())

NaN Latitude values : 1
NaN Longitude values : 1


Check, which postal codes result in NaN values

In [16]:
print(df['Postal Code'].loc[df['Latitude'].isnull()])
print(df['Postal Code'].loc[df['Longitude'].isnull()])

114    M7R
Name: Postal Code, dtype: object
114    M7R
Name: Postal Code, dtype: object


Remove row with nan values (postal code = M7R)

In [17]:
df.dropna(inplace=True)
df.shape

(102, 5)

Count unique Boroughs (and number of rows) of df

In [18]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 9 boroughs and 102 neighborhoods.


## C - Analyse Boroughs
### 1.a. Create Folium map for Toronto

Optionally: install folium

In [19]:
!conda install -c conda-forge folium=0.5.0 --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



Import plotting librarys (matplotlib)

In [20]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import k-means and folium

In [21]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [22]:
import folium # map rendering library

create function to add markers for each datarow labeling it with Borough and Neighborhood

In [23]:
# add markers to map
def create_folium_map(df):
    # create map using latitude and longitude values    
    for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#4d7dcd',
            fill=True,
            fill_color='#8da5cd',
            fill_opacity=0.7,
            parse_html=False).add_to(map_toronto)  
    return map_toronto

actually create map using the before defined function

In [24]:
map_toronto = folium.Map(location=[df['Latitude'].iloc[0], df['Longitude'].iloc[0]], zoom_start=10)
create_folium_map(df)
map_toronto

### 1.b. Filtering Data to Boroughs containing the string Toronto and define a new dataframe

Afterwards the steps from above are repeated

In [25]:
df_Toronto = df[df['Borough'].str.contains('Toronto')].reset_index(drop=True)
df_Toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756
4,M4E,East Toronto,The Beaches,43.6784,-79.2941


In [26]:
df_Toronto.shape

(39, 5)

In [27]:
map_toronto = folium.Map(location=[df_Toronto['Latitude'].iloc[0], df_Toronto['Longitude'].iloc[0]], zoom_start=12)
create_folium_map(df_Toronto)
map_toronto

### 2. Explore Postal Codes in Toronto using Foursquare API

Define the Foursquare Credentials

In [28]:
CLIENT_ID = 'MPHBUHTNDTSZLNA5O5SQOAN0HXN1ZUAFKS1WDVMISFRLUEHS' # your Foursquare ID
CLIENT_SECRET = '3WFMTZ2TYO1NE44TDMMUZ5PAAVFKYAGBSVIDPQEZCFWP0C3E' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: MPHBUHTNDTSZLNA5O5SQOAN0HXN1ZUAFKS1WDVMISFRLUEHS
CLIENT_SECRET:3WFMTZ2TYO1NE44TDMMUZ5PAAVFKYAGBSVIDPQEZCFWP0C3E


Set a limit for the number of returned venues and the max distance in meters

In [29]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius in [m]

Customized Function to get venues for each postal code

In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Postal Code Latitude', 
                  'Postal Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Call customized function "getNearbyVenues" and get venues for each postal code into dataframe toronto_venues

In [31]:
toronto_venues = getNearbyVenues(names=df_Toronto['Postal Code'],
                                   latitudes=df_Toronto['Latitude'],
                                   longitudes=df_Toronto['Longitude'],
                                  radius=radius)

M5A
M7A
M5B
M5C
M4E
M5E
M5G
M6G
M5H
M6H
M5J
M6J
M4K
M5K
M6K
M4L
M5L
M4M
M4N
M5N
M4P
M5P
M6P
M4R
M5R
M6R
M4S
M5S
M6S
M4T
M5T
M4V
M5V
M4W
M5W
M4X
M5X
M4Y
M7Y


Show shape and head of dataframe

In [32]:
print(toronto_venues.shape)
toronto_venues.head()

(1538, 7)


Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5A,43.6555,-79.3626,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,M5A,43.6555,-79.3626,Roselle Desserts,43.653447,-79.362017,Bakery
2,M5A,43.6555,-79.3626,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,M5A,43.6555,-79.3626,The Yoga Lounge,43.655515,-79.364955,Yoga Studio
4,M5A,43.6555,-79.3626,Body Blitz Spa East,43.654735,-79.359874,Spa


Show first 5 rows with number of venues in each Postal Code

In [33]:
#toronto_venues.groupby('Postal Code')['Venue'].count()
toronto_venues.groupby('Postal Code').count().head()

Unnamed: 0_level_0,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,8,8,8,8,8,8
M4K,34,34,34,34,34,34
M4L,22,22,22,22,22,22
M4M,9,9,9,9,9,9
M4N,2,2,2,2,2,2


In [34]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 213 uniques categories.


### 3. Analyze each postal code

Do one hot encoding, add postal code and show first 5 rows

In [35]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add Postal Code column back to dataframe
toronto_onehot['Postal Code'] = toronto_venues['Postal Code'] 

# move Postal Code column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head(5)

Unnamed: 0,Postal Code,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Show Shape

In [36]:
toronto_onehot.shape

(1538, 214)

Group the data for each Postal Code

In [37]:
toronto_grouped = toronto_onehot.groupby('Postal Code').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Postal Code,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [38]:
toronto_grouped.shape

(39, 214)

Get top 10 venues for each postal code

In [39]:
num_top_venues = 10

for postal_code in toronto_grouped['Postal Code']:
    print("---- "+postal_code+" ----")
    temp = toronto_grouped[toronto_grouped['Postal Code'] == postal_code].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- M4E ----
               venue  freq
0                Pub  0.25
1              Trail  0.12
2          Gastropub  0.12
3  Health Food Store  0.12
4        Cheese Shop  0.12
5       Neighborhood  0.12
6             Bakery  0.12
7  Accessories Store  0.00
8        Music Store  0.00
9             Museum  0.00


---- M4K ----
                venue  freq
0    Greek Restaurant  0.21
1      Ice Cream Shop  0.06
2  Italian Restaurant  0.06
3                Café  0.06
4          Restaurant  0.06
5         Yoga Studio  0.03
6        Dessert Shop  0.03
7                 Spa  0.03
8        Cocktail Bar  0.03
9         Coffee Shop  0.03


---- M4L ----
                  venue  freq
0        Sandwich Place  0.09
1                  Park  0.09
2            Restaurant  0.09
3      Sushi Restaurant  0.05
4          Liquor Store  0.05
5               Brewery  0.05
6  Fast Food Restaurant  0.05
7            Steakhouse  0.05
8           Coffee Shop  0.05
9         Burrito Place  0.05


---- M4M ----
   

define function to return the number(num_top_venues) of top venues for each postal code

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

import numpy

In [41]:
import numpy as np # library to handle data in a vectorized manner

Create dataframe named postal_code_venues_sorted and show the 10 top venues for each postal codes

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postal_code_venues_sorted = pd.DataFrame(columns=columns)
postal_code_venues_sorted['Postal Code'] = toronto_grouped['Postal Code']

for ind in np.arange(toronto_grouped.shape[0]):
    postal_code_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postal_code_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,Pub,Health Food Store,Gastropub,Cheese Shop,Trail,Bakery,Neighborhood,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
1,M4K,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Restaurant,Yoga Studio,Dessert Shop,Spa,Cocktail Bar,Coffee Shop
2,M4L,Restaurant,Sandwich Place,Park,Steakhouse,Brewery,Light Rail Station,Fish & Chips Shop,Liquor Store,Fast Food Restaurant,Gym
3,M4M,Gym,Baseball Field,Park,Diner,Garden Center,Coffee Shop,Performing Arts Venue,Coworking Space,Fast Food Restaurant,Falafel Restaurant
4,M4N,Photography Studio,Park,Doner Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


### 4. Cluster Postal Codes

k-means is used to cluster the postal codes into 5 clusters.

In [43]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 2, 0, 3, 0, 3, 0], dtype=int32)

A new dataframe is created, which includes the clusters, latitudes, longitudes as well as the top 10 venues for each postal code.

In [44]:
# add clustering labels
postal_code_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_Toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(postal_code_venues_sorted.set_index('Postal Code'), on='Postal Code')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,0,Coffee Shop,Breakfast Spot,Yoga Studio,Thai Restaurant,Gym / Fitness Center,Italian Restaurant,Food Truck,Event Space,Electronics Store,Distribution Center
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0,Coffee Shop,Gym,Hobby Shop,Italian Restaurant,Mexican Restaurant,Burrito Place,Dance Studio,Bubble Tea Shop,Ethiopian Restaurant,Ramen Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,0,Coffee Shop,Clothing Store,Hotel,Japanese Restaurant,Café,Cosmetics Shop,Italian Restaurant,Tea Room,Fast Food Restaurant,Movie Theater
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,0,Coffee Shop,Café,Seafood Restaurant,Restaurant,Clothing Store,American Restaurant,Italian Restaurant,Cosmetics Shop,Cocktail Bar,Bakery
4,M4E,East Toronto,The Beaches,43.6784,-79.2941,0,Pub,Health Food Store,Gastropub,Cheese Shop,Trail,Bakery,Neighborhood,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


Folium is used to plot Popups for each postal code, coloring them with the cluster category.

In [45]:
# create map
map_clusters = folium.Map(location=[df_Toronto['Latitude'].iloc[0], df_Toronto['Longitude'].iloc[0]], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Postal Code'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [46]:
import json

Additionally a choropleth analysis is done. The resulting map shows the density of Parks for the different postal codes. To plot the choropleth the created geojson file is loaded first.

In [47]:
map_toronto_choropleth = folium.Map(location=[df_Toronto['Latitude'].iloc[0], df_Toronto['Longitude'].iloc[0]], zoom_start=12)
ontario_geo = "FSATorontoMx.geojson"

map_toronto_choropleth.choropleth(geo_data=ontario_geo,
    data = toronto_grouped,
    columns=['Postal Code','Park'],
    key_on='feature.properties.CFSAUID',
    fill_color='BuGn',
    fill_opacity=0.7, 
    line_opacity=0.2,
    highlight=True,
    legend_name='Density of Parks in Toronto')


map_toronto_choropleth

### 5. Examine Clusters of Toronto

The number of postal codes for each cluster category are counted:

In [48]:
toronto_merged['Cluster Labels'].value_counts()

0    29
3     7
4     1
2     1
1     1
Name: Cluster Labels, dtype: int64

#### Cluster 1 == 0

For each cluster the 10 most common venues will be shown.

In [49]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Breakfast Spot,Yoga Studio,Thai Restaurant,Gym / Fitness Center,Italian Restaurant,Food Truck,Event Space,Electronics Store,Distribution Center
1,Downtown Toronto,0,Coffee Shop,Gym,Hobby Shop,Italian Restaurant,Mexican Restaurant,Burrito Place,Dance Studio,Bubble Tea Shop,Ethiopian Restaurant,Ramen Restaurant
2,Downtown Toronto,0,Coffee Shop,Clothing Store,Hotel,Japanese Restaurant,Café,Cosmetics Shop,Italian Restaurant,Tea Room,Fast Food Restaurant,Movie Theater
3,Downtown Toronto,0,Coffee Shop,Café,Seafood Restaurant,Restaurant,Clothing Store,American Restaurant,Italian Restaurant,Cosmetics Shop,Cocktail Bar,Bakery
4,East Toronto,0,Pub,Health Food Store,Gastropub,Cheese Shop,Trail,Bakery,Neighborhood,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
5,Downtown Toronto,0,Coffee Shop,Café,Hotel,Bakery,Cocktail Bar,Seafood Restaurant,Restaurant,Japanese Restaurant,Beer Bar,Deli / Bodega
6,Downtown Toronto,0,Coffee Shop,Middle Eastern Restaurant,Bubble Tea Shop,Sandwich Place,Italian Restaurant,Clothing Store,Pizza Place,Café,Seafood Restaurant,Ramen Restaurant
8,Downtown Toronto,0,Café,Coffee Shop,Hotel,Restaurant,Gym,American Restaurant,Steakhouse,Salad Place,Thai Restaurant,Asian Restaurant
11,West Toronto,0,Bar,Coffee Shop,Restaurant,Vietnamese Restaurant,Cocktail Bar,Vegetarian / Vegan Restaurant,Asian Restaurant,Cuban Restaurant,Music Store,Cupcake Shop
12,East Toronto,0,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Restaurant,Yoga Studio,Dessert Shop,Spa,Cocktail Bar,Coffee Shop


#### Cluster 2 == 1

In [50]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,1,Home Service,Clothing Store,Yoga Studio,Donut Shop,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


#### Cluster 3 == 2

In [51]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,2,Photography Studio,Park,Doner Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


#### Cluster 4 == 3

In [52]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Downtown Toronto,3,Grocery Store,Café,Baby Store,Athletics & Sports,Candy Store,Coffee Shop,Playground,Park,Fish & Chips Shop,Eastern European Restaurant
9,West Toronto,3,Park,Grocery Store,Bakery,Café,Pizza Place,Middle Eastern Restaurant,Bar,Bank,Bus Line,Athletics & Sports
10,Downtown Toronto,3,Harbor / Marina,Park,Café,Music Venue,Yoga Studio,Doner Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant
21,Central Toronto,3,Cosmetics Shop,Bus Line,Park,Trail,Yoga Studio,Donut Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
23,Central Toronto,3,Playground,Gym Pool,Park,Garden,Yoga Studio,Doner Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant
29,Central Toronto,3,Thai Restaurant,Park,Gym,Grocery Store,Tennis Court,Dog Run,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant
33,Downtown Toronto,3,Playground,Park,Grocery Store,Candy Store,Yoga Studio,Doner Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


#### Cluster 5 == 4

In [53]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,West Toronto,4,Dive Bar,Park,Residential Building (Apartment / Condo),Doner Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


In [54]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, 'Cluster Labels'] = 'Coffe Shop / Cafe dominated'

In [55]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, 'Cluster Labels'] = 'Home Service / Clothing Store dominated'

In [56]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, 'Cluster Labels'] = 'Photography dominated'

In [57]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, 'Cluster Labels'] = 'Playground / Park dominated'

In [58]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, 'Cluster Labels'] = 'Dive Bar / Park dominated'

In [59]:
toronto_merged['Cluster Labels'].value_counts()

0    29
3     7
4     1
2     1
1     1
Name: Cluster Labels, dtype: int64