<h1>Segmenting & Clustering Neighborhoods in the city of Toronto</h1>

<h2>Part 1 - Web Scraping from Wikipedia</h2>

Import the required libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
%matplotlib inline
import sklearn.preprocessing as sklearn
import subprocess
import lxml
import geocoder # import geocoder
import geopy 
from geopy.geocoders import Nominatim
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
#Specify the URL from which the data has to be extracted
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

<h5>Use BeautifulSoup library to extract data from a particular table in the URL provided.</h5>

In [3]:
import requests
from bs4 import BeautifulSoup

#to request the text from the webpage 
website_url = requests.get(url).text
soup = BeautifulSoup(website_url,'lxml')

#Specify the particular table by mentioning the classs. Class name is provide in the Source Code of the web-page, 
    #which is in HTML.  
my_table = soup.find('table',{'class':'wikitable sortable'})


#Use findAll to identify the tr, td and pass the values of the table cell in the List. Remove the '\n' from the values 
    #before they are appended to appropriate List.  
Postal_Code = []
Borough = []
Neighborhood = []
for rows in my_table.findAll('tr'):
    cells = rows.findAll('td')
    if len(cells)==3:
        Postal_Code.append((cells[0].find(text=True)).rstrip('\n'))
        Borough.append((cells[1].find(text=True)).rstrip('\n'))
        Neighborhood.append((cells[2].find(text=True)).rstrip('\n'))

The table data collected in the List should be then passed to a panda daaframe.

In [4]:
df = pd.DataFrame(Postal_Code, columns=['Postal Code'])
df['Borough'] = Borough
df['Neighborhood'] = Neighborhood
print('Number of rows and Columns in the table are:', df.shape)
df.head(30)

Number of rows and Columns in the table are: (180, 3)


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [5]:
#check out how many Boroughs are not assigned.
df[df['Borough']=='Not assigned'].shape

(77, 3)

In [6]:
#Remove the Boroughs and the corresponding data from the dataframe and then check number of rows left.

df.drop(df[df['Borough']=='Not assigned'].index, inplace=True)
df.reset_index(drop=True)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [7]:
df.shape

(103, 3)

<h2>Part 2 - Extract the geographical coordinates and append them to the dataframe.</h2>

In [8]:
geospat_df = pd.read_csv('https://cocl.us/Geospatial_data', index_col='Postal Code')

can_data = pd.merge(df, geospat_df, on=['Postal Code'])
can_data.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [9]:
can_data.shape

(103, 5)

In [10]:
can_data.tail()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509
102,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999


<h2>Part 3 - Explore the Neighborhood in Downtown Toronto</h2>

First we map all the postal codes

In [11]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
ont_lat = location.latitude
ont_long = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(ont_lat, ont_long))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [12]:
# create map of Manhattan using latitude and longitude values
ontario_map = folium.Map(location=[ont_lat, ont_long], zoom_start=10, tiles='OpenStreetMap')

# add markers to map
for lat, lng, label in zip(can_data['Latitude'], can_data['Longitude'], can_data['Postal Code']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(ontario_map)  
    
ontario_map

List number of Postal Code with Borough containing the word "Toronto" and in Downtown Toronto.

In [13]:
#to list all the postal codes with the word Toronto in the Borough
dtb_data = can_data[can_data['Borough'].str.contains('Toronto')].reset_index(drop=True)
dtb_data.shape

(39, 5)

In [14]:
dtb_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


To broaden the scope of the project, will use the future search to the word Toronto in the Borough name. 

In [15]:
# create map of Manhattan using latitude and longitude values
tor_map = folium.Map(location=[ont_lat, ont_long], zoom_start=12, tiles='OpenStreetMap')

# add markers to map
for lat, lng, label in zip(dtb_data['Latitude'], dtb_data['Longitude'], dtb_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tor_map)  
    
tor_map

<b>Define FourSquare API Credentials and Versions</b>

In [16]:
# API Credentials below 
key_df = pd.read_csv('k.txt')

CLIENT_ID = key_df['CLIENT_ID'].loc[0] # your Foursquare ID
CLIENT_SECRET = key_df['CLIENT_SECRET'].loc[0] # your Foursquare Secret

VERSION = '20180605' # Foursquare API version

LIMIT = 100 # A default Foursquare API limit value


<h3>Explore Neighborhood in Downtown Toronto</h3>

In [17]:
LIMIT=50
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Postal Code Latitude', 
                  'Postal Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Call the above function on each postal code and create a new dataframe called dtt_venues 

In [18]:
# type your answer here
dtt_venues = getNearbyVenues(names=dtb_data['Postal Code'], latitudes=dtb_data['Latitude'], longitudes=dtb_data['Longitude'])

M5A
M7A
M5B
M5C
M4E
M5E
M5G
M6G
M5H
M6H
M5J
M6J
M4K
M5K
M6K
M4L
M5L
M4M
M4N
M5N
M4P
M5P
M6P
M4R
M5R
M6R
M4S
M5S
M6S
M4T
M5T
M4V
M5V
M4W
M5W
M4X
M5X
M4Y
M7Y


Print the size of the resulting dataframe

In [19]:
print(dtt_venues.shape)
dtt_venues.head()

(1166, 7)


Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5A,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,M5A,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,M5A,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,M5A,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,M5A,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


Check how many Venues were returned for each postal code

In [20]:
dtt_venues.groupby('Postal Code').count()

Unnamed: 0_level_0,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,4,4,4,4,4,4
M4K,42,42,42,42,42,42
M4L,19,19,19,19,19,19
M4M,36,36,36,36,36,36
M4N,4,4,4,4,4,4
M4P,8,8,8,8,8,8
M4R,20,20,20,20,20,20
M4S,34,34,34,34,34,34
M4T,2,2,2,2,2,2
M4V,14,14,14,14,14,14


List how many unique categories were returned 

In [21]:
print('There are {} uniques categories.'.format(len(dtt_venues['Venue Category'].unique())))

There are 217 uniques categories.


<h3>Analyze each Postal Code</h3>

In [22]:
# one hot encoding
dtt_onehot = pd.get_dummies(dtt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dtt_onehot['Postal Code'] = dtt_venues['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [dtt_onehot.columns[-1]] + list(dtt_onehot.columns[:-1])
dtt_onehot = dtt_onehot[fixed_columns]

dtt_onehot.head()

Unnamed: 0,Postal Code,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Examine the new dataframe size

In [23]:
dtt_onehot.shape

(1166, 218)

Again groupby Postal Code and calculate the mean of the fequency of occurance

In [24]:
dtt_grouped = dtt_onehot.groupby('Postal Code').mean().reset_index()
dtt_grouped

Unnamed: 0,Postal Code,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,...,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M4P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
7,M4S,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M4V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0


In [25]:
dtt_grouped.shape

(39, 218)

Print each Postal Code with the top 8 most common venues

In [26]:
num_top_venues = 12

for code in dtt_grouped['Postal Code']:
    print("----"+code+"----")
    temp = dtt_grouped[dtt_grouped['Postal Code'] == code].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4E----
                              venue  freq
0                      Neighborhood  0.25
1                               Pub  0.25
2                 Health Food Store  0.25
3                             Trail  0.25
4                    Adult Boutique  0.00
5                       Men's Store  0.00
6                Mexican Restaurant  0.00
7         Middle Eastern Restaurant  0.00
8                Miscellaneous Shop  0.00
9        Modern European Restaurant  0.00
10  Molecular Gastronomy Restaurant  0.00
11              Monument / Landmark  0.00


----M4K----
                     venue  freq
0         Greek Restaurant  0.17
1              Coffee Shop  0.10
2       Italian Restaurant  0.07
3           Ice Cream Shop  0.05
4   Furniture / Home Store  0.05
5               Restaurant  0.05
6              Yoga Studio  0.02
7              Pizza Place  0.02
8                  Brewery  0.02
9          Bubble Tea Shop  0.02
10                    Café  0.02
11    Caribbean Restaurant  0.02

11                 Bank  0.03


----M7Y----
                   venue  freq
0     Light Rail Station  0.12
1         Farmers Market  0.06
2          Auto Workshop  0.06
3   Gym / Fitness Center  0.06
4                   Park  0.06
5       Recording Studio  0.06
6             Restaurant  0.06
7          Burrito Place  0.06
8             Skate Park  0.06
9                Brewery  0.06
10            Comic Shop  0.06
11  Fast Food Restaurant  0.06




Now we transfer the same to a pandas dataframe

First we sort them in descending order

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a new dataframe and display the top 8 venues of each Postal Code

In [28]:
num_top_venues = 12

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
dtt_venues_sorted = pd.DataFrame(columns=columns)
dtt_venues_sorted['Postal Code'] = dtt_grouped['Postal Code']

for ind in np.arange(dtt_grouped.shape[0]):
    dtt_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dtt_grouped.iloc[ind, :], num_top_venues)

dtt_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
0,M4E,Neighborhood,Pub,Health Food Store,Trail,Adult Boutique,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark
1,M4K,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Restaurant,Yoga Studio,Pizza Place,Brewery,Bubble Tea Shop,Café,Caribbean Restaurant
2,M4L,Fast Food Restaurant,Gym,Sushi Restaurant,Pet Store,Park,Pub,Coffee Shop,Restaurant,Sandwich Place,Movie Theater,Brewery,Pizza Place
3,M4M,Coffee Shop,Bakery,Brewery,Gastropub,American Restaurant,Café,Yoga Studio,Neighborhood,Bookstore,Fish Market,Seafood Restaurant,Clothing Store
4,M4N,Park,Swim School,Bus Line,Business Service,Adult Boutique,Music Venue,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark


<h3>Cluster Postal Codes</h3>

Run k-means to cluster the postal codes into 4 clusters

In [29]:
# set number of clusters
kclusters = 8

dtt_grouped_clustering = dtt_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dtt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 1, 1, 7, 6, 1, 1, 4, 1])

Let's create a dataframe that includes the cluster as well as the top 8 venues for each neighbourhood

In [30]:
# add clustering labels
dtt_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dtt_merged = dtb_data

# merge dtt_grouped with dtb_data to add latitude/longitude for each neighborhood
dtt_merged = dtt_merged.join(dtt_venues_sorted.set_index('Postal Code'), on='Postal Code')


In [31]:
dtt_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Café,Theater,Cosmetics Shop,Spa,Shoe Store,Chocolate Shop,Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Sushi Restaurant,College Cafeteria,Yoga Studio,Café,Fried Chicken Joint,Beer Bar,Smoothie Shop,Burrito Place,Distribution Center,Sandwich Place,Bank
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Coffee Shop,Café,Clothing Store,Ramen Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Theater,Cosmetics Shop,Electronics Store,Shopping Mall,Bookstore,Spa
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Café,Coffee Shop,Gastropub,Cocktail Bar,Creperie,Cosmetics Shop,Bakery,American Restaurant,Farmers Market,Seafood Restaurant,French Restaurant,Fountain
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Neighborhood,Pub,Health Food Store,Trail,Adult Boutique,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark


In [32]:
dtt_merged.shape # check the last columns!

(39, 18)

Visualize the resulting cluster

In [33]:
# create map
map_clusters = folium.Map(location=[ont_lat, ont_long], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dtt_merged['Latitude'], dtt_merged['Longitude'], dtt_merged['Postal Code'], dtt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3>Examine the Clusters</h3>

<b>Cluster 1</b>

In [34]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 0, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
21,M5P,Central Toronto,0,Park,Trail,Jewelry Store,Sushi Restaurant,Adult Boutique,Museum,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant


<b>Cluster 2</b>

In [35]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 1, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
0,M5A,Downtown Toronto,1,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Café,Theater,Cosmetics Shop,Spa,Shoe Store,Chocolate Shop,Restaurant
1,M7A,Downtown Toronto,1,Coffee Shop,Sushi Restaurant,College Cafeteria,Yoga Studio,Café,Fried Chicken Joint,Beer Bar,Smoothie Shop,Burrito Place,Distribution Center,Sandwich Place,Bank
2,M5B,Downtown Toronto,1,Coffee Shop,Café,Clothing Store,Ramen Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Theater,Cosmetics Shop,Electronics Store,Shopping Mall,Bookstore,Spa
3,M5C,Downtown Toronto,1,Café,Coffee Shop,Gastropub,Cocktail Bar,Creperie,Cosmetics Shop,Bakery,American Restaurant,Farmers Market,Seafood Restaurant,French Restaurant,Fountain
5,M5E,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Bakery,Restaurant,Farmers Market,Beer Bar,Seafood Restaurant,Cheese Shop,Concert Hall,Fountain,Bistro,Steakhouse
6,M5G,Downtown Toronto,1,Coffee Shop,Café,Sandwich Place,Bubble Tea Shop,Burger Joint,Italian Restaurant,Yoga Studio,Salad Place,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Falafel Restaurant
7,M6G,Downtown Toronto,1,Grocery Store,Café,Park,Nightclub,Baby Store,Athletics & Sports,Restaurant,Italian Restaurant,Coffee Shop,Candy Store,Movie Theater,Monument / Landmark
8,M5H,Downtown Toronto,1,Café,Coffee Shop,Steakhouse,American Restaurant,Sushi Restaurant,Restaurant,Concert Hall,Pizza Place,Gluten-free Restaurant,Opera House,Burrito Place,Seafood Restaurant
9,M6H,West Toronto,1,Bakery,Pharmacy,Grocery Store,Pet Store,Coffee Shop,Park,Music Venue,Café,Brewery,Middle Eastern Restaurant,Liquor Store,Supermarket
10,M5J,Downtown Toronto,1,Aquarium,Coffee Shop,Brewery,Hotel,Plaza,Pizza Place,Park,Café,Chinese Restaurant,Lounge,Sports Bar,Sporting Goods Shop


<b>Cluster 3</b>

In [36]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 2, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
4,M4E,East Toronto,2,Neighborhood,Pub,Health Food Store,Trail,Adult Boutique,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark


<b>Cluster 4</b>

In [37]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 3, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
33,M4W,Downtown Toronto,3,Park,Playground,Trail,Music Venue,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark


<b>Cluster 5</b>

In [38]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 4, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
29,M4T,Central Toronto,4,Tennis Court,Summer Camp,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater


<b>Cluster 6</b>

In [39]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 5, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
19,M5N,Central Toronto,5,Health & Beauty Service,Garden,Adult Boutique,Neighborhood,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark


<b>Cluster 7</b>

In [40]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 6, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
20,M4P,Central Toronto,6,Department Store,Pizza Place,Gym / Fitness Center,Breakfast Spot,Park,Sandwich Place,Hotel,Food & Drink Shop,Movie Theater,Music Venue,Museum,Adult Boutique


<b>Cluster 8</b>

In [41]:
dtt_merged.loc[dtt_merged['Cluster Labels'] == 7, dtt_merged.columns[[0] + [1] + list(range(5, dtt_merged.shape[1]))]]

Unnamed: 0,Postal Code,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
18,M4N,Central Toronto,7,Park,Swim School,Bus Line,Business Service,Adult Boutique,Music Venue,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark


<b>Observations:</b> Cluster 1 have the maximum number of postal codes that are more similar to each other on the grounds of Food & Beverages, health/fitness (yoga/gym) and very little bit of outdoor activities like shopping, bookstores etc. It appears to be more commercial as there is hardly any activited related to family. Whereas, Cluster 8 is also on Eatries, food & drinks and little bit of activities for every member of families like grocery and other shopping, baby store,  museum. Indivisuals can decide which area appeals to them on the bases of their interests. 