# Postal Codes of Canada



First, let's download all the dependencies that we will need.

In [1]:
import folium # map rendering library

import numpy as np

import pandas as pd

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import matplotlib.cm as cm
import matplotlib.colors as colors

import requests # library to handle requests

from sklearn.cluster import KMeans # import k-means from clustering stage

print('Libraries imported.')

Libraries imported.


The list of postal codes of Canada can be found from Wikipedia: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Simply use pandas to grab the table from the Wikipedia page.

In [2]:
df=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0)[0]
df.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


First 5 rows showed that there are some postal codes with borough "Not assigned", and empty cells shown as NaN in the column "Neighborhood". Remove boroughs that are not assigned, and then check whether there is any empty (i.e. NaN) Neighborhood left.

In [3]:
df = df[df.Borough != 'Not assigned']
df['Neighborhood'].isnull().sum()

0

In [4]:
df.head(12) #quick look at first few rows of the new list

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


A csv file that has the geographical coordinates of each postal code has been given: http://cocl.us/Geospatial_data

Use pandas to read the csv file and store it as another dataframe dfcor.

In [5]:
dfcor = pd.read_csv('https://cocl.us/Geospatial_data')
dfcor.head(12)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Merge the two dataframes, i.e. df and dfcor, by matching column "Postal Code", to get the coordinates of each neighborhood. Create a new dataframe dfnew.

In [6]:
dfnew = pd.merge(df, dfcor, on='Postal Code', how='outer')
dfnew.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## Explore boroughs in Toronto that contain the word 'York' (instead of 'Toronto').

The assignment mentioned "You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. __It is up to you.__"

There are many boroughts in Toronto. Those boroughs with the word Toronto are not well spread on the map, so I decided to work with only boroughs that contain the word __York__ instead.

In [7]:
toronto_data = dfnew[dfnew['Borough'].str.contains('York')].reset_index(drop=True)
print(toronto_data.shape)
toronto_data.head(12)

(34, 5)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
5,M6B,North York,Glencairn,43.709577,-79.445073
6,M3C,North York,Don Mills,43.7259,-79.340923
7,M4C,East York,Woodbine Heights,43.695344,-79.318389
8,M6C,York,Humewood-Cedarvale,43.693781,-79.428191
9,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512


In [8]:
#Let's get the geographical coordinates of Toronto.
address = 'Toronto, ON'
geolocator = Nominatim(user_agent="Toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [9]:
#let's visualize neighboorhoods with the word 'York' in Toronto

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers of neighboorhoods with the word 'York'to map of Toronto
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Define Foursquare Credentials and Version

Next, utilize the Foursquare API to explore the neighborhoods and segment them.

In [10]:
CLIENT_ID = '12VPPIIE2VQVGUKHQWX4AFR4JTTDK3NKAIP5UAQDE51Y0AYX' # your Foursquare ID
CLIENT_SECRET = 'Q2ZTI5FVYGLMTMPRHLQXN3VVYKN12MSYZ5GYWL5HESSNQUVE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 12VPPIIE2VQVGUKHQWX4AFR4JTTDK3NKAIP5UAQDE51Y0AYX
CLIENT_SECRET:Q2ZTI5FVYGLMTMPRHLQXN3VVYKN12MSYZ5GYWL5HESSNQUVE


##### Explore Neighborhoods in Toronto

Let's create a function to repeat the same process of getting the top 50 venues within a radius of 950 meters.

In [11]:
LIMIT = 50 # limit of number of venues returned by Foursquare API

def getNearbyVenues(names, latitudes, longitudes, radius=950):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function on each neighborhood and create a new dataframe called toronto_venues.

In [12]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Parkwoods
Victoria Village
Lawrence Manor, Lawrence Heights
Don Mills
Parkview Hill, Woodbine Gardens
Glencairn
Don Mills
Woodbine Heights
Humewood-Cedarvale
Caledonia-Fairbanks
Leaside
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Bayview Village
Downsview
York Mills, Silver Hills
Downsview
North Park, Maple Leaf Park, Upwood Park
Humber Summit
Willowdale, Newtonbrook
Downsview
Bedford Park, Lawrence Manor East
Del Ray, Mount Dennis, Keelsdale and Silverthorn
Humberlea, Emery
Willowdale, Willowdale East
Downsview
Runnymede, The Junction North
Weston
York Mills West
Willowdale, Willowdale West


In [13]:
print(toronto_venues.shape) #check the size of the resulting dataframe
toronto_venues.head()

(829, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


In [14]:
toronto_venues.groupby('Neighborhood').count() #check how many venues were returned for each neighborhood

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst Manor, Wilson Heights, Downsview North",29,29,29,29,29,29
Bayview Village,14,14,14,14,14,14
"Bedford Park, Lawrence Manor East",41,41,41,41,41,41
Caledonia-Fairbanks,21,21,21,21,21,21
"Del Ray, Mount Dennis, Keelsdale and Silverthorn",14,14,14,14,14,14
Don Mills,55,55,55,55,55,55
Downsview,66,66,66,66,66,66
"East Toronto, Broadview North (Old East York)",50,50,50,50,50,50
"Fairview, Henry Farm, Oriole",40,40,40,40,40,40
Glencairn,28,28,28,28,28,28


## Analyze Each Neighborhood

In [15]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Trail,Train Station,Turkish Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
toronto_onehot.shape #the new dataframe size

(829, 173)

In [17]:
#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Trail,Train Station,Turkish Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.02439,0.0,0.0
3,Caledonia-Fairbanks,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0
4,"Del Ray, Mount Dennis, Keelsdale and Silverthorn",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
5,Don Mills,0.0,0.0,0.0,0.0,0.018182,0.0,0.054545,0.018182,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Downsview,0.0,0.0,0.015152,0.015152,0.0,0.0,0.0,0.015152,0.0,...,0.0,0.0,0.030303,0.0,0.060606,0.0,0.0,0.0,0.0,0.0
7,"East Toronto, Broadview North (Old East York)",0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Fairview, Henry Farm, Oriole",0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Glencairn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
toronto_grouped.shape #confirm the new size

(30, 173)

In [19]:
#Let's print each neighborhood along with the top 3 most common venues

num_top_venues = 3

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bathurst Manor, Wilson Heights, Downsview North----
         venue  freq
0  Pizza Place  0.07
1  Coffee Shop  0.07
2         Bank  0.07


----Bayview Village----
                 venue  freq
0  Japanese Restaurant  0.14
1        Grocery Store  0.14
2          Gas Station  0.14


----Bedford Park, Lawrence Manor East----
                venue  freq
0         Coffee Shop  0.07
1  Italian Restaurant  0.07
2          Restaurant  0.05


----Caledonia-Fairbanks----
                venue  freq
0                Park   0.1
1  Mexican Restaurant   0.1
2            Bus Stop   0.1


----Del Ray, Mount Dennis, Keelsdale and Silverthorn----
                    venue  freq
0           Grocery Store  0.14
1  Furniture / Home Store  0.14
2       Convenience Store  0.07


----Don Mills----
                 venue  freq
0           Restaurant  0.07
1          Coffee Shop  0.07
2  Japanese Restaurant  0.07


----Downsview----
                   venue  freq
0            Coffee Shop  0.09
1  Vietnamese R

In [20]:
#write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
#create the new dataframe and display the top 5 venues for each neighborhood

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pizza Place,Park,Bridal Shop
1,Bayview Village,Gas Station,Grocery Store,Japanese Restaurant,Bank,Skating Rink
2,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Restaurant,Bank
3,Caledonia-Fairbanks,Pharmacy,Park,Bus Stop,Mexican Restaurant,Cosmetics Shop
4,"Del Ray, Mount Dennis, Keelsdale and Silverthorn",Furniture / Home Store,Grocery Store,Italian Restaurant,Convenience Store,Shopping Mall


## Cluster Neighborhoods

In [22]:
#Run k-means to cluster the neighborhood into clusters.

# set number of clusters
kclusters = 8

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30]

array([2, 5, 5, 0, 1, 5, 2, 5, 1, 1, 0, 6, 4, 2, 1, 5, 4, 2, 2, 0, 2, 2,
       7, 2, 2, 5, 0, 2, 0, 3])

In [23]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Park,Convenience Store,Shopping Mall,Pharmacy,Bus Stop
1,M4A,North York,Victoria Village,43.725882,-79.315572,7,Coffee Shop,Portuguese Restaurant,Intersection,Park,Sporting Goods Shop
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1,Furniture / Home Store,Fast Food Restaurant,Coffee Shop,Clothing Store,Dessert Shop
3,M3B,North York,Don Mills,43.745906,-79.352188,5,Japanese Restaurant,Restaurant,Coffee Shop,Supermarket,Gym
4,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,2,Brewery,Coffee Shop,Fast Food Restaurant,Pizza Place,Gastropub


In [24]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

Let's take a look at the top 5 most common places according to clusters of neighborhood.

In [25]:
#cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,0,Park,Convenience Store,Shopping Mall,Pharmacy,Bus Stop
9,York,0,Pharmacy,Park,Bus Stop,Mexican Restaurant,Cosmetics Shop
11,North York,0,Pharmacy,Park,Convenience Store,Recreation Center,Shopping Mall
32,North York,0,Park,Golf Course,Convenience Store,Dog Run,French Restaurant
33,North York,0,Convenience Store,Pizza Place,Pharmacy,Grocery Store,Coffee Shop


In [26]:
#cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,North York,1,Furniture / Home Store,Fast Food Restaurant,Coffee Shop,Clothing Store,Dessert Shop
5,North York,1,Grocery Store,Fast Food Restaurant,Coffee Shop,Gas Station,Gym Pool
14,North York,1,Coffee Shop,Clothing Store,Restaurant,Bank,Bakery
26,York,1,Furniture / Home Store,Grocery Store,Italian Restaurant,Convenience Store,Shopping Mall


In [27]:
#cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,East York,2,Brewery,Coffee Shop,Fast Food Restaurant,Pizza Place,Gastropub
7,East York,2,Coffee Shop,Pharmacy,Pizza Place,Park,Sandwich Place
8,York,2,Pizza Place,Bagel Shop,Coffee Shop,Field,Bank
12,North York,2,Coffee Shop,Bank,Pizza Place,Park,Bridal Shop
13,East York,2,Coffee Shop,Indian Restaurant,Turkish Restaurant,Pizza Place,Gym
15,North York,2,Coffee Shop,Pizza Place,Furniture / Home Store,Bar,Bank
18,North York,2,Coffee Shop,Vietnamese Restaurant,Hotel,Pizza Place,Park
20,North York,2,Coffee Shop,Vietnamese Restaurant,Hotel,Pizza Place,Park
23,North York,2,Korean Restaurant,Middle Eastern Restaurant,Coffee Shop,Café,Pizza Place
24,North York,2,Coffee Shop,Vietnamese Restaurant,Hotel,Pizza Place,Park


In [28]:
#cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
19,North York,3,Pool,Cafeteria,Martial Arts Dojo,Park,Business Service


In [29]:
#cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
21,North York,4,Convenience Store,Bakery,Coffee Shop,Athletics & Sports,Park
27,North York,4,Convenience Store,Intersection,Storage Facility,Discount Store,Bakery


In [30]:
#cluster 6
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,North York,5,Japanese Restaurant,Restaurant,Coffee Shop,Supermarket,Gym
6,North York,5,Japanese Restaurant,Restaurant,Coffee Shop,Supermarket,Gym
10,East York,5,Coffee Shop,Furniture / Home Store,Burger Joint,Department Store,Sporting Goods Shop
16,East York,5,Café,Greek Restaurant,Coffee Shop,Beer Bar,Ethiopian Restaurant
17,North York,5,Gas Station,Grocery Store,Japanese Restaurant,Bank,Skating Rink
25,North York,5,Italian Restaurant,Coffee Shop,Sandwich Place,Restaurant,Bank
28,North York,5,Ramen Restaurant,Korean Restaurant,Pizza Place,Sushi Restaurant,Coffee Shop


In [31]:
#cluster 7
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
22,North York,6,Electronics Store,Bank,Arts & Crafts Store,Italian Restaurant,Park


In [32]:
#cluster 8
toronto_merged.loc[toronto_merged['Cluster Labels'] == 7, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,North York,7,Coffee Shop,Portuguese Restaurant,Intersection,Park,Sporting Goods Shop


Based on the top 5 most common venue of each cluster:
<br>__cluster 1 (red dots)__ - This cluster is convenient for shopping. There are pharmacy, convenience store, and park. Food options may be limited.
<br>__cluster 3 (dark blue dots)__ - cluster with most numbers of neighborhood, mainly at the west. This cluster has a variety of food, e.g. asian restaurants, pizza, etc.. Coffee shops are among top 3 most common venue in all neighborhoods in this cluster, so coffee lovers may love these neighborhoods. Also, all neighborhoods but one have pizza place among top 5 most common venue.
<br>__cluster 6 (light green dots)__ - This cluster also has a variety of food, especially restaurants. Unlike cluster 3, coffee shops are slightly less common here, and only one neighborhood has pizza place. However, it does have food options that are not found in cluster 3, e.g. Japanese foods (Japanese restaurants, ramen restaurant, sushi restaurant).

Other clusters have too little neighborhoods.