# Segmenting and Clustering of Neighbourhoods in Toronto, Canada

### Part-1 : Dataframe consisting of Neighbourhoods, Postal Codes and Boroughs

In this section, we will create a dataframe that lists all the neighbourhoods and boroughs grouped by postal codes in the city of _Toronto, Canada._

First we import and install the necessary libraries for this project:

In [1]:
!pip install bs4
!pip install requests
!pip install folium

import pandas as pd
import numpy as np
import requests
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize

print("All libraries imported")

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1272 sha256=35f9235394e114466cfa3ba3215a1daf52d9c0a256e19dce68e5cdec64fd8c98
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/0a/9e/ba/20e5bbc1afef3a491f0b3bb74d508f99403aabe76eda2167ca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.8 MB/s  eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.2-py3-n

Now, let's use `requests` to get the html file from the given Wikipedia URL, as text.

We will also create a BeautifulSoup object to scrape the html data.

In [2]:
# The html text will be stored in the variable 'data'
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = requests.get(url).text
 
# Uncomment the following line and run the cell to view the data
#data

In [3]:
# Create a BeautifulSoup object 'soup', using 'html5lib' to parse the html file
soup = BeautifulSoup(data, 'html5lib')
 
# Use 'find' to find the html element with a table tag
table = soup.find("table")
 
# Uncomment the following line and run the cell to view the table in html
#table

Now, we traverse through the table and get the necessary data.

Finally, we add the data, row by row, to our dataframe.

In [4]:
flag = True          # See Part-2 of the notebook to know the reason behind this
flag2 = True

# Create a datframe 'toronto' using pandas to store the data we collect from the html file
toronto = pd.DataFrame(columns=['Postal Code','Borough','Neighbourhood'])
 
# Loop through the html file to find the postal codes and names of boroughs and neighbourhoods
for row in table.find_all('td'):
    
    if row.span.text == 'Not assigned':          # To skip all cells without borough names
        pass
    
    else:
        post = row.p.text[:3]                                                               # Get the postal code      
        brgh = (row.span.text).split('(')[0]                                                # Get the borough name by splitting the neighbourhood names away from combined text
        nbhd = ((row.span.text).split('(')[1].replace(' /', ',')).replace(')', '')          # Get the neighbourhood names, remove brackets, replace forward slases by commas to separate individual neighbourhoods
        
        toronto = toronto.append({'Postal Code':post, 'Borough':brgh, 'Neighbourhood':nbhd}, ignore_index=True)          # Add rows of data to toronto dataframe
 
# There are some erroneous borough names, so we manually replace them with the correct names
toronto['Borough'] = toronto['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### Part-2 : Adding coordinates to each Postal Code in the dataframe

In this section, we will add geographical coordinates to each postal code using a separate dataset at https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv.

In [5]:
# Read the csv data file into a dataframe
coordinates = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')
coordinates

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


We will now merge the two datasets using 'Postal Code' as the key, using an inner join.

However, if the statement is exeucted repeatedly, it causes more columns to be added.

To stop that, we use a control variable called `flag`, and set it up to execute the merge only if it is `True`. We have initialised `flag` with `True` at the beginning, before creating the toronto dataframe, and we change its value to `False` after the merge.

In [6]:
# Merging the two dataframes to add latitudes and longitudes to toronto dataframe
if flag:
    toronto = pd.merge(toronto, coordinates, on='Postal Code', how='inner')
    flag = False          # To prevent further merging
 
toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


### Part-3 : Exploring and Clustering the Neighbourhoods

In this section, we will cluster postal codes of Toronto.

First, let's get the geographical coordinates of Toronto, and create a map with all the postal codes displayed.

In [7]:
from geopy.geocoders import Nominatim
 
address = 'Toronto, ON'
 
coordinates = Nominatim(user_agent="tor")
location = coordinates.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [8]:
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)
 
# Adding markers for postal codes in toronto on the map
for lat, lon, borough, postcode in zip(toronto['Latitude'], toronto['Longitude'], toronto['Borough'], toronto['Postal Code']):
    label = '{}, {}'.format(postcode, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        popup=label,
        radius=3.25,
        color='green',
        fill=True,
        fill_color='#16ff94',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

For the purpose of this assignment, we shall  only consider Downtown Toronto due to computational limitations.

In [9]:
down_toronto = toronto[toronto['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
down_toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
5,M6G,Downtown Toronto,Christie,43.669542,-79.422564
6,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
7,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
8,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576
9,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817


In [31]:
# The code was removed by Watson Studio for sharing.

Now, using Foursquare, we will get a list of all venues around the area in each postal code, within a radius of 500 meters from the center.

In [11]:
def get_venues(codes, latitudes, longitudes, radius=500):          # this function will loop through the various postcodes in Downtown Toronto, and return 20 nearby venues within a radius of 500 meters
    
    venues_list=[]
    for code, lat, lng in zip(codes, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # get the required information about the venues
        venues_list.append([(
            code, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
down_toronto_venues = get_venues(down_toronto['Postal Code'], down_toronto['Latitude'], down_toronto['Longitude'])
down_toronto_venues

Unnamed: 0,Postal Code,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5A,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,M5A,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,M5A,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,M5A,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,M5A,43.65426,-79.360636,Impact Kitchen,43.656369,-79.356980,Restaurant
...,...,...,...,...,...,...,...
308,M4Y,43.66586,-79.383160,Starbucks,43.664980,-79.380510,Coffee Shop
309,M4Y,43.66586,-79.383160,Coach House Restaurant,43.664991,-79.384814,Diner
310,M4Y,43.66586,-79.383160,Baskin-Robbins,43.665073,-79.380684,Ice Cream Shop
311,M4Y,43.66586,-79.383160,Openmat Mixed Martial Arts,43.666172,-79.384767,Martial Arts School


Now, we shall proceed to create a map of Downtown Toronto and display all these venues on it.

In [13]:
address = 'Downtown Toronto, Toronto, ON'

geolocator = Nominatim(user_agent="down_tor")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6563221, -79.3809161.


In [14]:
down_toronto_map = folium.Map(location=[latitude, longitude], zoom_start=14)
 
# Adding markers for venues in toronto on the map
for lat, lon, place, category in zip(down_toronto_venues['Venue Latitude'], down_toronto_venues['Venue Longitude'], down_toronto_venues['Venue'], down_toronto_venues['Venue Category']):
    label = '{}, {}'.format(place, category)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        popup=label,
        radius=3.25,
        color='green',
        fill=True,
        fill_color='#16ff94',
        fill_opacity=0.7,
        parse_html=False).add_to(down_toronto_map)  
    
down_toronto_map

In [15]:
len(down_toronto_venues['Venue Category'].unique())          # to return the number of unique types of venues in each Downtowmn Toronto

121

There are 121 unique types of venues in Downtown Toronto.

Let's encode these venues using the one-hot encoding technique.

This is done by assigning dummy integer variables (usually 0 and 1, as False and True repectively) to each venue category.

Finally, we group the encoded dataframe by mean.

In [16]:
# one hot encoding
down_tor_onehot = pd.get_dummies(down_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add postal code column back to dataframe
down_tor_onehot['Postal Code'] = down_toronto_venues['Postal Code'] 

# move postal codes to the first column
fixed_columns = [down_tor_onehot.columns[-1]] + list(down_tor_onehot.columns[:-1])
down_tor_onehot = down_tor_onehot[fixed_columns]
down_tor_onehot

Unnamed: 0,Postal Code,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
308,M4Y,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
309,M4Y,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
310,M4Y,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
311,M4Y,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
down_tor_groups = down_tor_onehot.groupby('Postal Code').mean().reset_index()
down_tor_groups

Unnamed: 0,Postal Code,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,M4W,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
1,M4X,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M4Y,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
3,M5A,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M5B,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,...,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M5C,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,...,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M5E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0
7,M5G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M5H,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0
9,M5J,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now, let's create a dataframe that displays the top 10 popular venues in each post code, using the encoded dataframe.

In [18]:
def popular_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
n = 10          # number of top venues to be considered

indicators = ['st', 'nd', 'rd']           # to add suffix to numbers

# create columns according to number of top venues
columns = ['Postal Code']
for i in np.arange(n):
    try:
        columns.append('{}{} Popular Venue'.format(i+1, indicators[i]))          # adds 'st', 'nd', 'rd' to 1, 2, and 3 respectively
    except:
        columns.append('{}th Popular Venue'.format(i+1))                         # adds 'st' to each number after 3

# create a new dataframe
postcode_venues_sorted = pd.DataFrame(columns=columns)
postcode_venues_sorted['Postal Code'] = down_tor_groups['Postal Code']

for i in np.arange(down_tor_groups.shape[0]):
    postcode_venues_sorted.iloc[i, 1:] = popular_venues(down_tor_groups.iloc[i, :], n)

postcode_venues_sorted

Unnamed: 0,Postal Code,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
0,M4W,Park,Trail,Playground,Yoga Studio,Diner,Cocktail Bar,Coffee Shop,College Gym,Comfort Food Restaurant,Comic Shop
1,M4X,Café,Park,Taiwanese Restaurant,Butcher,Deli / Bodega,Diner,Pub,Caribbean Restaurant,Bakery,Jewelry Store
2,M4Y,Pizza Place,Juice Bar,Park,Coffee Shop,Pub,Ramen Restaurant,Mexican Restaurant,Restaurant,Martial Arts School,Salon / Barbershop
3,M5A,Coffee Shop,Park,Breakfast Spot,Bakery,Chocolate Shop,Pub,Restaurant,Dessert Shop,Performing Arts Venue,Distribution Center
4,M5B,Café,Pizza Place,Electronics Store,Music Venue,Plaza,Burrito Place,Burger Joint,Ramen Restaurant,Comic Shop,Sandwich Place
5,M5C,Japanese Restaurant,Coffee Shop,Café,Gastropub,Beer Bar,Diner,Creperie,Restaurant,Cosmetics Shop,BBQ Joint
6,M5E,Beer Bar,Farmers Market,Seafood Restaurant,Cocktail Bar,Museum,Pub,Concert Hall,Restaurant,Liquor Store,Basketball Stadium
7,M5G,Coffee Shop,Poke Place,Tea Room,Modern European Restaurant,Japanese Restaurant,Italian Restaurant,Sushi Restaurant,Sandwich Place,Café,Art Museum
8,M5H,Coffee Shop,Hotel,Steakhouse,Gym / Fitness Center,Lounge,Concert Hall,Neighborhood,Opera House,Café,Plaza
9,M5J,Park,Plaza,Hotel,Sporting Goods Shop,Performing Arts Venue,Chinese Restaurant,New American Restaurant,Neighborhood,Café,Salad Place


Now, it's time to create a clustering model. Here we are using the k-means clustering approach, with 7 clusters.

In [20]:
# set number of clusters
kclusters = 7

down_tor_clusters = down_tor_groups.drop('Postal Code', 1)

# run k-means clustering
cluster = KMeans(n_clusters=kclusters, random_state=0).fit(down_tor_clusters)

# check cluster labels generated for each row in the dataframe
cluster.labels_+=1
cluster.labels_

array([4, 2, 1, 6, 7, 2, 1, 6, 1, 1, 2, 2, 7, 7, 5, 2, 3], dtype=int32)

In [21]:
# add clustering labels
if flag2:
  postcode_venues_sorted.insert(0, 'Cluster Label', cluster.labels_)
  flag2=False

down_toronto_final = down_toronto

# merge the two dataframes to add latitude/longitude data for each postal code
down_toronto_final = down_toronto_final.join(postcode_venues_sorted.set_index('Postal Code'), on='Postal Code')
down_toronto_final

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Label,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,6,Coffee Shop,Park,Breakfast Spot,Bakery,Chocolate Shop,Pub,Restaurant,Dessert Shop,Performing Arts Venue,Distribution Center
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,7,Café,Pizza Place,Electronics Store,Music Venue,Plaza,Burrito Place,Burger Joint,Ramen Restaurant,Comic Shop,Sandwich Place
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Japanese Restaurant,Coffee Shop,Café,Gastropub,Beer Bar,Diner,Creperie,Restaurant,Cosmetics Shop,BBQ Joint
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Beer Bar,Farmers Market,Seafood Restaurant,Cocktail Bar,Museum,Pub,Concert Hall,Restaurant,Liquor Store,Basketball Stadium
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,6,Coffee Shop,Poke Place,Tea Room,Modern European Restaurant,Japanese Restaurant,Italian Restaurant,Sushi Restaurant,Sandwich Place,Café,Art Museum
5,M6G,Downtown Toronto,Christie,43.669542,-79.422564,3,Grocery Store,Café,Park,Coffee Shop,Candy Store,Baby Store,Nightclub,Restaurant,Italian Restaurant,Comic Shop
6,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,1,Coffee Shop,Hotel,Steakhouse,Gym / Fitness Center,Lounge,Concert Hall,Neighborhood,Opera House,Café,Plaza
7,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,1,Park,Plaza,Hotel,Sporting Goods Shop,Performing Arts Venue,Chinese Restaurant,New American Restaurant,Neighborhood,Café,Salad Place
8,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,2,Coffee Shop,Café,Japanese Restaurant,Restaurant,Hotel,Steakhouse,Gastropub,Gym / Fitness Center,Pub,Beer Bar
9,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,2,Café,Restaurant,Coffee Shop,Gastropub,Japanese Restaurant,Gym,Bakery,Gym / Fitness Center,Pub,Museum


Now that the postcodes of Downtown Toronto have been clustered, we can show these clusters on a map.

In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme
colors = ['red', 'blue', 'green', 'magenta', 'orange', 'purple', 'brown']

# add markers to the map
for lat, lon, poi, cluster in zip(down_toronto_final['Latitude'], down_toronto_final['Longitude'], down_toronto_final['Postal Code'], down_toronto_final['Cluster Label']):
    label = folium.Popup(str(poi) + ', Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=colors[cluster-1],
        fill=True,
        fill_opacity=0.5).add_to(map_clusters)
       
map_clusters

In [23]:
# creating different dataframes for each cluster

cluster1 = down_toronto_final.loc[down_toronto_final['Cluster Label'] == 1, down_toronto_final.columns[list(range(down_toronto_final.shape[1]))]]
cluster1.drop(columns=['Latitude', 'Longitude', 'Cluster Label', 'Borough'], inplace=True)

cluster2 = down_toronto_final.loc[down_toronto_final['Cluster Label'] == 2, down_toronto_final.columns[list(range(down_toronto_final.shape[1]))]]
cluster2.drop(columns=['Latitude', 'Longitude', 'Cluster Label', 'Borough'], inplace=True)

cluster3 = down_toronto_final.loc[down_toronto_final['Cluster Label'] == 3, down_toronto_final.columns[list(range(down_toronto_final.shape[1]))]]
cluster3.drop(columns=['Latitude', 'Longitude', 'Cluster Label', 'Borough'], inplace=True)

cluster4 = down_toronto_final.loc[down_toronto_final['Cluster Label'] == 4, down_toronto_final.columns[list(range(down_toronto_final.shape[1]))]]
cluster4.drop(columns=['Latitude', 'Longitude', 'Cluster Label', 'Borough'], inplace=True)

cluster5 = down_toronto_final.loc[down_toronto_final['Cluster Label'] == 5, down_toronto_final.columns[list(range(down_toronto_final.shape[1]))]]
cluster5.drop(columns=['Latitude', 'Longitude', 'Cluster Label', 'Borough'], inplace=True)

cluster6 = down_toronto_final.loc[down_toronto_final['Cluster Label'] == 6, down_toronto_final.columns[list(range(down_toronto_final.shape[1]))]]
cluster6.drop(columns=['Latitude', 'Longitude', 'Cluster Label', 'Borough'], inplace=True)

cluster7 = down_toronto_final.loc[down_toronto_final['Cluster Label'] == 7, down_toronto_final.columns[list(range(down_toronto_final.shape[1]))]]
cluster7.drop(columns=['Latitude', 'Longitude', 'Cluster Label', 'Borough'], inplace=True)

Let's have a look at these clusters one by one.

First cluster.

In [24]:
cluster1

Unnamed: 0,Postal Code,Neighbourhood,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
3,M5E,Berczy Park,Beer Bar,Farmers Market,Seafood Restaurant,Cocktail Bar,Museum,Pub,Concert Hall,Restaurant,Liquor Store,Basketball Stadium
6,M5H,"Richmond, Adelaide, King",Coffee Shop,Hotel,Steakhouse,Gym / Fitness Center,Lounge,Concert Hall,Neighborhood,Opera House,Café,Plaza
7,M5J,"Harbourfront East, Union Station, Toronto Islands",Park,Plaza,Hotel,Sporting Goods Shop,Performing Arts Venue,Chinese Restaurant,New American Restaurant,Neighborhood,Café,Salad Place
16,M4Y,Church and Wellesley,Pizza Place,Juice Bar,Park,Coffee Shop,Pub,Ramen Restaurant,Mexican Restaurant,Restaurant,Martial Arts School,Salon / Barbershop


Based on visual observations, it seems the first cluster contains neighbourhoods where coffee shops, parks and hotels are more popular than in other clusters.

Second cluster.

In [25]:
cluster2

Unnamed: 0,Postal Code,Neighbourhood,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
2,M5C,St. James Town,Japanese Restaurant,Coffee Shop,Café,Gastropub,Beer Bar,Diner,Creperie,Restaurant,Cosmetics Shop,BBQ Joint
8,M5K,"Toronto Dominion Centre, Design Exchange",Coffee Shop,Café,Japanese Restaurant,Restaurant,Hotel,Steakhouse,Gastropub,Gym / Fitness Center,Pub,Beer Bar
9,M5L,"Commerce Court, Victoria Hotel",Café,Restaurant,Coffee Shop,Gastropub,Japanese Restaurant,Gym,Bakery,Gym / Fitness Center,Pub,Museum
14,M4X,"St. James Town, Cabbagetown",Café,Park,Taiwanese Restaurant,Butcher,Deli / Bodega,Diner,Pub,Caribbean Restaurant,Bakery,Jewelry Store
15,M5X,"First Canadian Place, Underground city",Café,Restaurant,Coffee Shop,Japanese Restaurant,Pizza Place,Gym,Pub,Seafood Restaurant,Steakhouse,Bakery


Here, cafés and restaurants, especially Japanese ones are very popular.

Third cluster.

In [26]:
cluster3

Unnamed: 0,Postal Code,Neighbourhood,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
5,M6G,Christie,Grocery Store,Café,Park,Coffee Shop,Candy Store,Baby Store,Nightclub,Restaurant,Italian Restaurant,Comic Shop


This one's too unique to be grouped with any other area. The local grocery store rules this neighbourhood.

Fourth Cluster.

In [27]:
cluster4

Unnamed: 0,Postal Code,Neighbourhood,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
13,M4W,Rosedale,Park,Trail,Playground,Yoga Studio,Diner,Cocktail Bar,Coffee Shop,College Gym,Comfort Food Restaurant,Comic Shop


Anotther unique area. People here seem to love their park.

Fifth Cluster.

In [28]:
cluster5

Unnamed: 0,Postal Code,Neighbourhood,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
12,M5V,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Boat or Ferry,Bar,Rental Car Location,Sculpture Garden,Airport Food Court


This area is in the vicinity of Billy Bishop Toronto City Airport, and without a doubt, the airport has the most footfall here.

Sixth Cluster.

In [29]:
cluster6

Unnamed: 0,Postal Code,Neighbourhood,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
0,M5A,"Regent Park, Harbourfront",Coffee Shop,Park,Breakfast Spot,Bakery,Chocolate Shop,Pub,Restaurant,Dessert Shop,Performing Arts Venue,Distribution Center
4,M5G,Central Bay Street,Coffee Shop,Poke Place,Tea Room,Modern European Restaurant,Japanese Restaurant,Italian Restaurant,Sushi Restaurant,Sandwich Place,Café,Art Museum


We can see here that coffee shops are the most popular venues in these 3 neighbourhoods.

Seventh Cluster.

In [30]:
cluster7

Unnamed: 0,Postal Code,Neighbourhood,1st Popular Venue,2nd Popular Venue,3rd Popular Venue,4th Popular Venue,5th Popular Venue,6th Popular Venue,7th Popular Venue,8th Popular Venue,9th Popular Venue,10th Popular Venue
1,M5B,"Garden District, Ryerson",Café,Pizza Place,Electronics Store,Music Venue,Plaza,Burrito Place,Burger Joint,Ramen Restaurant,Comic Shop,Sandwich Place
10,M5S,"University of Toronto, Harbord",Café,Bookstore,Japanese Restaurant,Bakery,Yoga Studio,Sushi Restaurant,College Gym,Comfort Food Restaurant,Restaurant,Dessert Shop
11,M5T,"Kensington Market, Chinatown, Grange Park",Café,Vietnamese Restaurant,Farmers Market,Food Truck,Cocktail Bar,Organic Grocery,Cheese Shop,Dessert Shop,Mexican Restaurant,Belgian Restaurant


Here too, cafés are popular,  whereas japanese restaurants aren't all that famous, hence these neighbourhoods have been placed in a separate cluster.