# Segmenting and Clustering Neighbourhoods in the City of Toronto, Canada


In this assignment, segmenting and clustering of different neighbourhoods in the city of Toronto, Canada is done. 

There are three parts in the assignment. I have done all the parts in the same Notebook. The starting of each part is marked by proper Markdown cells.

# PART 1

### Importing Necessary Libraries and Packages

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


! pip install folium==0.5.0
import folium # plotting library

from bs4 import BeautifulSoup

!pip install lxml

!pip install et_xmlfile
import pandas.io.json

print('Folium installed')
print('Libraries imported.')

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 7.9 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=ab60bb21d7844cee49b6c167b8ae9422d3806faa94f79120c9ad6e15f416ab6c
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0
Collecting et_xmlfile
  Downloading et_xmlfile-1.0.1.tar.gz (8.4 kB)
Building wheels for collected packages: et-xmlfile
  Building wheel for et-xmlfile (setup.py) ... [?25ldone
[?25h  Created wheel for et-xmlfile: filename=et_xmlfile-1.0.1-py3-none-any.whl size=8915 sha256=aa386cd848f06f4bfc1

### Reading the wikipedia page to get list of Toronto Neighbourhoods and Boroughs

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'


In [4]:
wiki_page=pd.read_html(url)

### Converting the Data into a Pandas DataFrame

In [6]:
raw_table=pd.DataFrame(wiki_page[0])

In [7]:
raw_table.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [8]:
raw_table.shape

(180, 3)

### Deleting the rows where the Borough is not assigned

In [9]:
df=raw_table[raw_table.Borough != 'Not assigned']

In [10]:
df.shape

(103, 3)

In [11]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [12]:
unique_size=df['Postal Code'].unique().shape
unique_size

(103,)

### If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [13]:
df['Neighbourhood'].replace('Not assigned', df['Borough'])

2                                              Parkwoods
3                                       Victoria Village
4                              Regent Park, Harbourfront
5                       Lawrence Manor, Lawrence Heights
6            Queen's Park, Ontario Provincial Government
                             ...                        
160        The Kingsway, Montgomery Road, Old Mill North
165                                 Church and Wellesley
168    Business reply mail Processing Centre, South C...
169    Old Mill South, King's Mill Park, Sunnylea, Hu...
178    Mimico NW, The Queensway West, South of Bloor,...
Name: Neighbourhood, Length: 103, dtype: object

In [14]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [15]:
df.shape

(103, 3)

# PART 2

### Getting the Latitude and Longitude values from the given csv file

In [16]:
geo_df=pd.read_csv('http://cocl.us/Geospatial_data')

In [17]:
geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Now we have two different Dataframes , one with the list of Boroughs and Neighbourhood names and the scond one with the Latitude and Longitude values. We are merging these two tables 

In [18]:
result_df = pd.merge(df, geo_df, how='left', on='Postal Code')

In [19]:
result_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [20]:
result_df.shape

(103, 5)

# PART 3

### Entering the FourSquare Credentials

In [21]:
CLIENT_ID = 'ED4EQFEEL3JP4GJCED1WE03XIFPWLWA0FZ5S4XU0YAFAOH2C' 
CLIENT_SECRET = '3BZW24VJPXXFUVGTB10RYR4V4HSDEVBMZ45UGOHE0GHEUFIO' 
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ED4EQFEEL3JP4GJCED1WE03XIFPWLWA0FZ5S4XU0YAFAOH2C
CLIENT_SECRET:3BZW24VJPXXFUVGTB10RYR4V4HSDEVBMZ45UGOHE0GHEUFIO


### Getting the coordinates of Toronto from geolocator

In [22]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### Creating map of Toronto using latitude and longitude values

In [23]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

### Adding markers to map

In [24]:
for lat, lng, borough, neighborhood in zip(result_df['Latitude'], result_df['Longitude'], result_df['Borough'], result_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

# Explore the neighbourhoods in Toronto


In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [26]:
toronto_venues = getNearbyVenues(names=result_df['Neighbourhood'],
                                   latitudes=result_df['Latitude'],
                                   longitudes=result_df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [27]:
print(toronto_venues.shape)
toronto_venues.head()

(1320, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [28]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",7,7,7,7,7,7
"Bathurst Manor, Wilson Heights, Downsview North",21,21,21,21,21,21
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",22,22,22,22,22,22
...,...,...,...,...,...,...
"Willowdale, Willowdale West",5,5,5,5,5,5
Woburn,4,4,4,4,4,4
Woodbine Heights,8,8,8,8,8,8
York Mills West,2,2,2,2,2,2


In [29]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 237 uniques categories.


### Analysing each Neighbourhood

In [30]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
toronto_onehot.head()

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Merging the two DataFrames

In [31]:
df = pd.merge(toronto_onehot, result_df, how='left', left_on='Neighborhood',right_on='Neighbourhood')
df.head()

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,M3A,North York,Parkwoods,43.753259,-79.329656
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,M3A,North York,Parkwoods,43.753259,-79.329656
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,M4A,North York,Victoria Village,43.725882,-79.315572
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,M4A,North York,Victoria Village,43.725882,-79.315572
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,M4A,North York,Victoria Village,43.725882,-79.315572


In [32]:
df.shape

(1391, 242)

### Cleaning the DataFrame to keep only the necessary columns for analysis

In [33]:
df=df.drop(columns ='Postal Code')

In [34]:
fixed_columns= ['Neighbourhood'] + list(df.columns[:-4])
print(fixed_columns)
toronto_data=df[fixed_columns]

['Neighbourhood', 'Accessories Store', 'Airport', 'Airport Food Court', 'Airport Gate', 'Airport Lounge', 'Airport Service', 'Airport Terminal', 'American Restaurant', 'Antique Shop', 'Aquarium', 'Art Gallery', 'Art Museum', 'Arts & Crafts Store', 'Asian Restaurant', 'Athletics & Sports', 'Auto Garage', 'Auto Workshop', 'BBQ Joint', 'Baby Store', 'Bagel Shop', 'Bakery', 'Bank', 'Bar', 'Baseball Field', 'Basketball Stadium', 'Beer Bar', 'Beer Store', 'Belgian Restaurant', 'Bike Shop', 'Bistro', 'Boat or Ferry', 'Bookstore', 'Boutique', 'Breakfast Spot', 'Brewery', 'Bridal Shop', 'Bubble Tea Shop', 'Burger Joint', 'Burrito Place', 'Bus Line', 'Bus Station', 'Business Service', 'Butcher', 'Café', 'Cajun / Creole Restaurant', 'Camera Store', 'Candy Store', 'Caribbean Restaurant', 'Cheese Shop', 'Chinese Restaurant', 'Chocolate Shop', 'Climbing Gym', 'Clothing Store', 'Cocktail Bar', 'Coffee Shop', 'College Arts Building', 'College Auditorium', 'College Cafeteria', 'College Gym', 'College S

In [35]:
toronto_data.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
toronto_data.shape

(1391, 238)

### Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [47]:
toronto_grouped = toronto_data.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
94,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [48]:
toronto_grouped.shape

(96, 237)

#### Writing a function to sort the venues in descending order.

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Creating the new dataframe and display the top 10 venues for each neighborhood

In [50]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Breakfast Spot,Skating Rink,Clothing Store,Dessert Shop,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
1,"Alderwood, Long Branch",Pizza Place,Sandwich Place,Coffee Shop,Pub,Pharmacy,Gym,General Entertainment,Cuban Restaurant,Dog Run,Distribution Center
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pharmacy,Chinese Restaurant,Shopping Mall,Sandwich Place,Diner,Restaurant,Deli / Bodega,Supermarket
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sandwich Place,Sushi Restaurant,Greek Restaurant,Indian Restaurant,Juice Bar,Liquor Store,Locksmith,Comfort Food Restaurant


# Clustering the Neighbourhoods

### Import k-means from clustering stage

In [51]:
from sklearn.cluster import KMeans

### The Clustering Analysis

In [52]:
# set number of clusters
kclusters = 10

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[:] 

array([4, 1, 6, 6, 6, 6, 4, 6, 4, 4, 0, 6, 4, 6, 4, 6, 1, 6, 6, 6, 4, 6,
       6, 4, 4, 4, 0, 1, 6, 6, 4, 6, 4, 4, 4, 4, 4, 4, 1, 8, 4, 4, 6, 4,
       1, 4, 0, 6, 6, 2, 0, 4, 4, 6, 0, 6, 4, 6, 8, 4, 1, 0, 6, 6, 6, 0,
       5, 4, 6, 4, 4, 1, 6, 6, 1, 4, 6, 6, 6, 4, 4, 7, 4, 6, 6, 1, 3, 1,
       0, 4, 6, 1, 6, 4, 0, 9], dtype=int32)

### Add clustering labels

In [53]:
neighborhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

neighborhoods_venues_sorted.astype({'Cluster_Labels': 'int32'}).dtypes
toronto_merged = result_df
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')
print(toronto_merged.dtypes)
toronto_merged

Postal Code                object
Borough                    object
Neighbourhood              object
Latitude                  float64
Longitude                 float64
Cluster_Labels            float64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Pizza Place,Intersection,Coffee Shop,Hockey Arena,Portuguese Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,6.0,Coffee Shop,Park,Bakery,Breakfast Spot,Theater,Distribution Center,Spa,Dessert Shop,Farmers Market,Café
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4.0,Clothing Store,Accessories Store,Boutique,Gift Shop,Furniture / Home Store,Event Space,Coffee Shop,Women's Store,Vietnamese Restaurant,Airport Terminal
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,6.0,Coffee Shop,Yoga Studio,Café,Bar,Italian Restaurant,Beer Bar,Sandwich Place,Distribution Center,Diner,Mexican Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,7.0,Pool,River,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,6.0,Coffee Shop,Men's Store,Martial Arts School,Sushi Restaurant,Ice Cream Shop,Indian Restaurant,Café,Beer Bar,Ethiopian Restaurant,Escape Room
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,4.0,Pizza Place,Auto Workshop,Comic Shop,Recording Studio,Restaurant,Butcher,Burrito Place,Skate Park,Brewery,Park
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,8.0,Construction & Landscaping,Baseball Field,Yoga Studio,Dim Sum Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore


#### Let's check whether all the rows are assigned with a Cluster_Label

In [54]:
toronto_merged['Cluster_Labels'].unique()

array([ 0.,  1.,  6.,  4., nan,  2.,  3.,  9.,  8.,  5.,  7.])

In [55]:
toronto_merged.shape

(103, 16)

#### Let's examine the rows with a null value

In [57]:
novalue_df=toronto_merged[toronto_merged['Cluster_Labels'].isnull().values == True]
novalue_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,,,,,,,,,,,
52,M2M,North York,"Willowdale, Newtonbrook",43.789053,-79.408493,,,,,,,,,,,
95,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,,,,,,,,,,,


In [58]:
novalue_df.shape

(3, 16)

#### There are three neighbourhoods with no details available. Let's delete those rows.

In [59]:
toronto_finaldf=toronto_merged[toronto_merged['Cluster_Labels'].isnull().values == False]
toronto_finaldf.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Pizza Place,Intersection,Coffee Shop,Hockey Arena,Portuguese Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,6.0,Coffee Shop,Park,Bakery,Breakfast Spot,Theater,Distribution Center,Spa,Dessert Shop,Farmers Market,Café
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4.0,Clothing Store,Accessories Store,Boutique,Gift Shop,Furniture / Home Store,Event Space,Coffee Shop,Women's Store,Vietnamese Restaurant,Airport Terminal
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,6.0,Coffee Shop,Yoga Studio,Café,Bar,Italian Restaurant,Beer Bar,Sandwich Place,Distribution Center,Diner,Mexican Restaurant


The Cluster_Label values appear as float. Let's convert them to integers.

In [60]:
label_df=toronto_finaldf.astype({'Cluster_Labels': 'int32'})
label_df.dtypes

Postal Code                object
Borough                    object
Neighbourhood              object
Latitude                  float64
Longitude                 float64
Cluster_Labels              int32
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

In [61]:
label_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Park,Food & Drink Shop,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Pizza Place,Intersection,Coffee Shop,Hockey Arena,Portuguese Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,6,Coffee Shop,Park,Bakery,Breakfast Spot,Theater,Distribution Center,Spa,Dessert Shop,Farmers Market,Café
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4,Clothing Store,Accessories Store,Boutique,Gift Shop,Furniture / Home Store,Event Space,Coffee Shop,Women's Store,Vietnamese Restaurant,Airport Terminal
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,6,Coffee Shop,Yoga Studio,Café,Bar,Italian Restaurant,Beer Bar,Sandwich Place,Distribution Center,Diner,Mexican Restaurant


## Plotting the results in a map

In [62]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [63]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(label_df['Latitude'], label_df['Longitude'], label_df['Neighbourhood'], label_df['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining the Clusters

### Cluster 1

In [64]:
label_df.loc[label_df['Cluster_Labels'] == 0, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Park,Food & Drink Shop,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run
21,York,0,Park,Women's Store,Pool,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run
35,East York,0,Park,Convenience Store,Intersection,Yoga Studio,Dessert Shop,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
49,North York,0,Park,Construction & Landscaping,Bakery,Yoga Studio,Dessert Shop,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
61,Central Toronto,0,Park,Swim School,Bus Line,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
64,York,0,Park,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
66,North York,0,Park,Convenience Store,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run
85,Scarborough,0,Playground,Park,Bakery,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
91,Downtown Toronto,0,Park,Playground,Trail,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run


### Cluster 2

In [65]:
label_df.loc[label_df['Cluster_Labels'] == 1, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Pizza Place,Intersection,Coffee Shop,Hockey Arena,Portuguese Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
8,East York,1,Pizza Place,Athletics & Sports,Pharmacy,Bus Line,Breakfast Spot,Bank,Intersection,Pet Store,Gym / Fitness Center,Gastropub
17,Etobicoke,1,Pizza Place,Beer Store,Coffee Shop,Convenience Store,Café,Shopping Plaza,Liquor Store,Pharmacy,Gas Station,Dance Studio
50,North York,1,Pizza Place,Furniture / Home Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
70,Etobicoke,1,Pizza Place,Coffee Shop,Discount Store,Chinese Restaurant,Sandwich Place,Intersection,Department Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
72,North York,1,Grocery Store,Pharmacy,Coffee Shop,Butcher,Pizza Place,General Travel,General Entertainment,Drugstore,Donut Shop,Dog Run
77,Etobicoke,1,Pizza Place,Park,Bus Line,Sandwich Place,Deli / Bodega,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
82,Scarborough,1,Pizza Place,Fried Chicken Joint,Noodle House,Chinese Restaurant,Fast Food Restaurant,Bank,Italian Restaurant,Intersection,Thai Restaurant,Pharmacy
89,Etobicoke,1,Grocery Store,Pizza Place,Pharmacy,Liquor Store,Fast Food Restaurant,Beer Store,Sandwich Place,Fried Chicken Joint,Construction & Landscaping,Diner
90,Scarborough,1,Fast Food Restaurant,Grocery Store,Bank,Gym,Indian Restaurant,Pharmacy,Coffee Shop,Chinese Restaurant,Sandwich Place,Breakfast Spot


### Cluster 3

In [66]:
label_df.loc[label_df['Cluster_Labels'] == 2, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,2,Fast Food Restaurant,Yoga Studio,Event Space,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run


### Cluster 4

In [67]:
label_df.loc[label_df['Cluster_Labels'] == 3, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,3,Print Shop,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center


### Cluster 5

In [68]:
label_df.loc[label_df['Cluster_Labels'] == 4, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,North York,4,Clothing Store,Accessories Store,Boutique,Gift Shop,Furniture / Home Store,Event Space,Coffee Shop,Women's Store,Vietnamese Restaurant,Airport Terminal
10,North York,4,Pizza Place,Park,Pub,Japanese Restaurant,Bakery,Department Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
12,Scarborough,4,Construction & Landscaping,Bar,Yoga Studio,Dim Sum Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
14,East York,4,Skating Rink,Intersection,Park,Athletics & Sports,Beer Store,Dance Studio,Curling Ice,Distribution Center,Diner,Discount Store
16,York,4,Tennis Court,Field,Hockey Arena,Trail,Dog Run,Dumpling Restaurant,Drugstore,Donut Shop,Deli / Bodega,Distribution Center
18,Scarborough,4,Medical Center,Mexican Restaurant,Restaurant,Rental Car Location,Intersection,Bank,Electronics Store,Breakfast Spot,Distribution Center,Dog Run
19,East Toronto,4,Pub,Health Food Store,Trail,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
25,Downtown Toronto,4,Grocery Store,Café,Park,Candy Store,Baby Store,Italian Restaurant,Nightclub,Coffee Shop,Athletics & Sports,Restaurant
26,Scarborough,4,Caribbean Restaurant,Athletics & Sports,Hakka Restaurant,Thai Restaurant,Bakery,Bank,Fried Chicken Joint,Gas Station,Discount Store,Dim Sum Restaurant
27,North York,4,Athletics & Sports,Golf Course,Pool,Dog Run,Mediterranean Restaurant,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore


### Cluster 6

In [69]:
label_df.loc[label_df['Cluster_Labels'] == 5, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,5,Music Venue,Garden,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run


### Cluster 7

In [70]:
label_df.loc[label_df['Cluster_Labels'] == 6, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,6,Coffee Shop,Park,Bakery,Breakfast Spot,Theater,Distribution Center,Spa,Dessert Shop,Farmers Market,Café
4,Downtown Toronto,6,Coffee Shop,Yoga Studio,Café,Bar,Italian Restaurant,Beer Bar,Sandwich Place,Distribution Center,Diner,Mexican Restaurant
7,North York,6,Gym,Japanese Restaurant,Coffee Shop,Beer Store,Clothing Store,Sporting Goods Shop,Café,Dim Sum Restaurant,Restaurant,Caribbean Restaurant
9,Downtown Toronto,6,Café,Clothing Store,Coffee Shop,Theater,New American Restaurant,Thai Restaurant,Music Venue,Steakhouse,Bakery,Japanese Restaurant
13,North York,6,Gym,Japanese Restaurant,Coffee Shop,Beer Store,Clothing Store,Sporting Goods Shop,Café,Dim Sum Restaurant,Restaurant,Caribbean Restaurant
15,Downtown Toronto,6,Gastropub,Coffee Shop,Café,Restaurant,Farmers Market,Japanese Restaurant,Cosmetics Shop,Thai Restaurant,Camera Store,Diner
20,Downtown Toronto,6,Coffee Shop,Beer Bar,Farmers Market,Seafood Restaurant,Cocktail Bar,Thai Restaurant,Jazz Club,Bakery,Fish Market,Bistro
22,Scarborough,6,Coffee Shop,Korean BBQ Restaurant,Mexican Restaurant,Yoga Studio,Dessert Shop,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
23,East York,6,Coffee Shop,Sporting Goods Shop,Bank,Burger Joint,Furniture / Home Store,Breakfast Spot,Supermarket,Sports Bar,Beer Store,Bike Shop
24,Downtown Toronto,6,Coffee Shop,Italian Restaurant,Café,Yoga Studio,Poke Place,Seafood Restaurant,Sandwich Place,Bubble Tea Shop,Ramen Restaurant,Portuguese Restaurant


### Cluster 8

In [71]:
label_df.loc[label_df['Cluster_Labels'] == 7, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
98,Etobicoke,7,Pool,River,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run


### Cluster 9

In [72]:
label_df.loc[label_df['Cluster_Labels'] == 8, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,8,Baseball Field,Yoga Studio,Falafel Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
101,Etobicoke,8,Construction & Landscaping,Baseball Field,Yoga Studio,Dim Sum Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore


### Cluster 10

In [73]:
label_df.loc[label_df['Cluster_Labels'] == 9, label_df.columns[[1] + list(range(5, label_df.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,North York,9,Martial Arts School,Yoga Studio,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
