# Segmenting and Clustering Neighborhoods in Toronto

This notebook is for week #3 of course #9 IBM Data Science Professional Certificate

## I. Get the data:

Start with importing necessary libraries

In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib import request, parse, error

Then obtain data from the web page with Urllib and BeautifulSoup libraries

In [2]:
link = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
html = request.urlopen(link).read()
soup = BeautifulSoup(html,'html.parser')

### 1. Process the raw data

The data processing is as follow: 
+ There are three columns in the dataframe: PostalCode, Borough, and Neighborhood.
+ Only rows with an assigned borough are processed. Rows with borough not assigned are ignored.
+ There is one row for each postal code. If there are more than one neighborhood for the same postal code, the neighborhoods will be combined into a list separated by commas
+ If the row has a borough but no assigned neighborhood, the neighborhood will be the same as the borough.

In [3]:
PostalCode = list()
Borough = list()
Neighborhood = list()
# neighbor_list = list()

PostalCode.clear()
Borough.clear() 
Neighborhood.clear()

tables = soup('table')
# Get to table
tags = tables[0]('tr') 

for tag in tags[1:] :
    stag = tag('td')
    
    # Get the postal code
    postcode = stag[0].string 

    # Get the Borough
    try : 
        stag_borough = stag[1]('a')
        boro = stag_borough[0].string
    except :
        boro = stag[1].string  
        
    # Get the Neighborhood
    try : 
        stag_neighborhood = stag[2]('a')
        neighbor = stag_neighborhood[0].string
    except :
        i = stag[2].string.find('\n')
        neighbor = stag[2].string[0:i]
    # If Neighborhood is not assigned then use the Borough name
    if neighbor == 'Not assigned' :  
        neighbor = boro
    # neighbor_list = [neighbor]
    
    # If Borough is not assigned then ignore line
    if boro == 'Not assigned' : 
        continue
    # If Postal Code is not in list then create new
    else : 
        PostalCode.append(postcode)
        Borough.append(boro)
        Neighborhood.append(neighbor)



### 2. Convert data lists to dataframe

In [4]:
raw_data = pd.DataFrame({'PostalCode':PostalCode,'Borough':Borough,'Neighborhood':Neighborhood})

pd.set_option('display.max_columns', 20)
pd.set_option('display.width', 2000)

raw_data.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


### 3. Group raw data by PostalCode

In [5]:
data = raw_data
data['Neighborhood'] = data.groupby(['PostalCode'])['Neighborhood'].transform(lambda x: "%s" % ', '.join(x))
data = data.drop_duplicates().reset_index().drop(columns=['index'])
data.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


And print out the shape:

In [6]:
data.shape

(103, 3)

## II. Get the latitude and longitude:

Get the geographical coordinates from provided file:

In [7]:
geo_data = pd.read_csv('https://cocl.us/Geospatial_data',header='infer')
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the latitude and longitude of the postal code to the raw data

In [8]:
data = data.merge(geo_data,how='left',left_on = 'PostalCode', right_on = 'Postal Code')
data.drop('Postal Code', axis = 1, inplace = True)
data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494





## III. Cluster data

Import the necessary libraries

In [9]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('Folium installed and imported!')

Solving environment: done

# All requested packages already installed.

Folium installed and imported!


In [10]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# import requests # library to handle requests
import json
import requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


### 1. Create a map of all in-scope neighborhoods

In [11]:
address = 'Toronto, CA' 

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [12]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map

for lat, lng, borough, neighborhood in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### 2. Use Foursquare to explore the neighborhood

Define the Foursquare Credentials and Version

In [35]:
CLIENT_ID = #this part was removed after running codes and before uploading to GitHub
CLIENT_SECRET = #this part was removed after running codes and before uploading to GitHub
VERSION = #this part was removed after running codes and before uploading to GitHub

Define function to get venues around the neighborhoods

In [14]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run code to get venues around the in-scope neighborhoods and put them into dataframe called *Toronto_venues*

In [15]:
toronto_venues = getNearbyVenues(names = data['Neighborhood'],
                                   latitudes = data['Latitude'],
                                   longitudes = data['Longitude']
                                  )

Parkwoods
Victoria Village
Harbourfront, Regent Park
Lawrence Heights, Lawrence Manor
Queen's Park
Islington Avenue
Rouge, Malvern
Don Mills North
Woodbine Gardens, Parkview Hill
Ryerson, Garden District
Glencairn
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Highland Creek, Rouge Hill, Port Union
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
East Birchmount Park, Ionview, Kennedy Park
Bayview Village
CFB Toronto, Downsview East
The D

In [16]:
print(toronto_venues.shape)
toronto_venues.head()

(2259, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


Get the number of unique categories

In [17]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 274 unique categories.


In [18]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,4,4,4,4,4,4
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",2,2,2,2,2,2
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",9,9,9,9,9,9
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Downsview North, Wilson Heights",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",23,23,23,23,23,23
Berczy Park,55,55,55,55,55,55
"Birch Cliff, Cliffside West",4,4,4,4,4,4


***Notes:*** *Since there are 103 postal codes in-scope and only 100 postal codes with venue data, the 3 postal codes with no venue will be clustered together (to be processed later)*

### 3. Analyze and cluster the neighborhoods


Analyze the neighborhoods

In [19]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(2259, 274)


Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group the neighborhood by mean of frequency of occurrence of each category

In [20]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(100, 274)


Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Print each neighborhood along with the top 5 most common venues

In [21]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
          venue  freq
0   Coffee Shop  0.08
1          Café  0.05
2    Steakhouse  0.04
3           Bar  0.04
4  Burger Joint  0.03


----Agincourt----
            venue  freq
0    Skating Rink  0.25
1          Lounge  0.25
2  Sandwich Place  0.25
3  Breakfast Spot  0.25
4     Men's Store  0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                venue  freq
0          Playground   0.5
1                Park   0.5
2  Miscellaneous Shop   0.0
3       Moving Target   0.0
4       Movie Theater   0.0


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                  venue  freq
0         Grocery Store  0.22
1  Fast Food Restaurant  0.11
2   Fried Chicken Joint  0.11
3           Coffee Shop  0.11
4            Beer Store  0.11


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.22
1             Gym  0.11
2    Skating Rink  0.11
3  Sand

Define a function to sort the venues in descending order

In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 5 venues for each neighborhood

In [23]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(100, 6)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,Cosmetics Shop
1,Agincourt,Lounge,Breakfast Spot,Sandwich Place,Skating Rink,Women's Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Playground,Park,Women's Store,Eastern European Restaurant,Discount Store
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pizza Place,Fried Chicken Joint,Coffee Shop,Sandwich Place
4,"Alderwood, Long Branch",Pizza Place,Skating Rink,Sandwich Place,Pub,Coffee Shop


### 3. Cluster the neighborhoods

Run k-means to cluster the neighborhoods with venue data availabe into 5 clusters

In [24]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 2, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [25]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = data

# merge toronto_grouped with data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,2.0,Food & Drink Shop,Fast Food Restaurant,Park,Electronics Store,Doner Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Pizza Place,Hockey Arena,Portuguese Restaurant,Coffee Shop,Intersection
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0.0,Coffee Shop,Pub,Bakery,Park,Café
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,0.0,Furniture / Home Store,Women's Store,Accessories Store,Vietnamese Restaurant,Boutique
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,0.0,Coffee Shop,Park,Gym,Diner,Yoga Studio
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,3.0,Fast Food Restaurant,Women's Store,Electronics Store,Dog Run,Doner Restaurant
7,M3B,North York,Don Mills North,43.745906,-79.352188,0.0,Caribbean Restaurant,Japanese Restaurant,Gym / Fitness Center,Café,Baseball Field
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,0.0,Pizza Place,Fast Food Restaurant,Gastropub,Pharmacy,Bank
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0.0,Clothing Store,Coffee Shop,Café,Middle Eastern Restaurant,Cosmetics Shop


Process the postal code with no venue

In [26]:
toronto_merged['Cluster Labels'].fillna(value=5, method=None, axis=None, inplace=True)
toronto_merged.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,2.0,Food & Drink Shop,Fast Food Restaurant,Park,Electronics Store,Doner Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Pizza Place,Hockey Arena,Portuguese Restaurant,Coffee Shop,Intersection
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0.0,Coffee Shop,Pub,Bakery,Park,Café
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,0.0,Furniture / Home Store,Women's Store,Accessories Store,Vietnamese Restaurant,Boutique
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,0.0,Coffee Shop,Park,Gym,Diner,Yoga Studio
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,5.0,,,,,
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,3.0,Fast Food Restaurant,Women's Store,Electronics Store,Dog Run,Doner Restaurant
7,M3B,North York,Don Mills North,43.745906,-79.352188,0.0,Caribbean Restaurant,Japanese Restaurant,Gym / Fitness Center,Café,Baseball Field
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,0.0,Pizza Place,Fast Food Restaurant,Gastropub,Pharmacy,Bank
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0.0,Clothing Store,Coffee Shop,Café,Middle Eastern Restaurant,Cosmetics Shop


Visualize the resulting clusters

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters+1)
ys = [i + x + (i*x)**2 for i in range(kclusters+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 4. Examine cluster

#### Cluster 1

In [28]:
cluster1 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0,1,2,3] + list(range(5, toronto_merged.shape[1]))]]
print("There are",cluster1.shape[0],"postal codes in Cluster 1")
cluster1

There are 85 postal codes in Cluster 1


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,M4A,North York,Victoria Village,43.725882,0.0,Pizza Place,Hockey Arena,Portuguese Restaurant,Coffee Shop,Intersection
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.654260,0.0,Coffee Shop,Pub,Bakery,Park,Café
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,0.0,Furniture / Home Store,Women's Store,Accessories Store,Vietnamese Restaurant,Boutique
4,M7A,Queen's Park,Queen's Park,43.662301,0.0,Coffee Shop,Park,Gym,Diner,Yoga Studio
7,M3B,North York,Don Mills North,43.745906,0.0,Caribbean Restaurant,Japanese Restaurant,Gym / Fitness Center,Café,Baseball Field
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,0.0,Pizza Place,Fast Food Restaurant,Gastropub,Pharmacy,Bank
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,0.0,Clothing Store,Coffee Shop,Café,Middle Eastern Restaurant,Cosmetics Shop
10,M6B,North York,Glencairn,43.709577,0.0,Pub,Japanese Restaurant,Park,Asian Restaurant,Women's Store
11,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,0.0,Bank,Gift Shop,Women's Store,Discount Store,Doner Restaurant
13,M3C,North York,"Flemingdon Park, Don Mills South",43.725900,0.0,Gym,Coffee Shop,Asian Restaurant,Beer Store,Fast Food Restaurant


#### Cluster 2

In [29]:
cluster2 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0,1,2,3] + list(range(5, toronto_merged.shape[1]))]]
print("There are",cluster2.shape[0],"postal codes in Cluster 2")
cluster2

There are 2 postal codes in Cluster 2


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
32,M1J,Scarborough,Scarborough Village,43.744734,1.0,Playground,Women's Store,Electronics Store,Dog Run,Doner Restaurant
83,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,1.0,Playground,Women's Store,Electronics Store,Dog Run,Doner Restaurant


#### Cluster 3

In [30]:
cluster3 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0,1,2,3] + list(range(5, toronto_merged.shape[1]))]]
print("There are",cluster3.shape[0],"postal codes in Cluster 3")
cluster3

There are 10 postal codes in Cluster 3


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,2.0,Food & Drink Shop,Fast Food Restaurant,Park,Electronics Store,Doner Restaurant
21,M6E,York,Caledonia-Fairbanks,43.689026,2.0,Park,Women's Store,Fast Food Restaurant,Market,Convenience Store
35,M4J,East York,East Toronto,43.685347,2.0,Park,Convenience Store,Rental Car Location,Women's Store,Eastern European Restaurant
40,M3K,North York,"CFB Toronto, Downsview East",43.737473,2.0,Playground,Airport,Park,Women's Store,Electronics Store
61,M4N,Central Toronto,Lawrence Park,43.72802,2.0,Park,Swim School,Bus Line,Women's Store,Eastern European Restaurant
64,M9N,York,Weston,43.706876,2.0,Park,Convenience Store,Women's Store,Empanada Restaurant,Doner Restaurant
66,M2P,North York,York Mills West,43.752758,2.0,Park,Convenience Store,Bank,Women's Store,Empanada Restaurant
77,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,2.0,Pizza Place,Park,Bus Line,Eastern European Restaurant,Dog Run
85,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,2.0,Playground,Park,Women's Store,Eastern European Restaurant,Discount Store
91,M4W,Downtown Toronto,Rosedale,43.679563,2.0,Park,Building,Playground,Trail,Women's Store


#### Cluster 4

In [31]:
cluster4 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0,1,2,3] + list(range(5, toronto_merged.shape[1]))]]
print("There are",cluster4.shape[0],"postal codes in Cluster 4")
cluster4

There are 1 postal codes in Cluster 4


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,M1B,Scarborough,"Rouge, Malvern",43.806686,3.0,Fast Food Restaurant,Women's Store,Electronics Store,Dog Run,Doner Restaurant


#### Cluster 5

In [32]:
cluster5 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0,1,2,3] + list(range(5, toronto_merged.shape[1]))]]
print("There are",cluster5.shape[0],"postal codes in Cluster 5")
cluster5

There are 2 postal codes in Cluster 5


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
12,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,4.0,Bar,Women's Store,Electronics Store,Doner Restaurant,Donut Shop
56,M6M,York,"Del Ray, Keelesdale, Mount Dennis, Silverthorn",43.691116,4.0,Discount Store,Restaurant,Sandwich Place,Bar,Diner


#### Cluster 6

In [33]:
cluster6 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[0,1,2,3] + list(range(5, toronto_merged.shape[1]))]]
print("There are",cluster6.shape[0],"postal codes in Cluster 6")
cluster6

There are 3 postal codes in Cluster 6


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,M9A,Etobicoke,Islington Avenue,43.667856,5.0,,,,,
52,M2M,North York,"Newtonbrook, Willowdale",43.789053,5.0,,,,,
95,M1X,Scarborough,Upper Rouge,43.836125,5.0,,,,,


<h1> </h1>
<h3> <center> This is the end of project </center> </h3>