<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighbourhoods in Toronto</font></h1>

<hr>

### **_Please note that all three parts are herein included. They are presented as you scroll down the code (numbered)!_**

## Table of Contents

1 - <a href="#item1">Part 1: Scraping and Cleaning Data</a>

2 - <a href="#item2">Part 2: Getting Coordinates</a>  

3 - <a href="#item3">Part 3: Neighbourhood Clustering and Exploring</a>   

<a id="item1"></a>

### **1 - Scraping and Cleaning Data**

##### Since the city of Toronto has its postal codes all starting with the letter M, we'll search of the list of postal codes starting with M on Wikipedia's page.

#### Importing libraries

##### In order to scrape that website the libraries *urllib* and _BeautifulSoup **(bs4)**_ will be used.

In [2]:
# Installing bs4 package
!conda install -c anaconda beautifulsoup4 --yes
!conda install -c anaconda lxml --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.2
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /srv/conda/envs/notebook

  added / updated specs:
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.8.2       |           py37_0         161 KB  anaconda
    ca-certificates-2020.1.1   |                0         132 KB  anaconda
    certifi-2020.4.5.1         |           py37_0         159 KB  anaconda
    openssl-1.1.1              |       h7b6447c_0         5.0 MB  anaconda
    python_abi-3.7             |          1_cp37m           4 KB  conda-forge
    soupsieve-2.0              |             py_0          33 KB  anaconda
    ------------------------------------------------------------
                

In [3]:
from urllib.request import urlopen
from bs4 import BeautifulSoup

#### Downloading the web page

##### The HTML Document will be requested from web.

In [4]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urlopen(url) 
page

<http.client.HTTPResponse at 0x7fe32bce0710>

#### Parsing HTML

##### HTML Document downloaded above will be parsed to BeautifulSoup instance.

In [5]:
bs = BeautifulSoup(page, 'lxml')
bs

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"XourOApAIDAAABMB7dwAAADH","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":949497198,"wgRevisionId":949497198,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"

##### Now that's better to look for the desired table and it was found to be _\<table class="wikitable"\>_

##### Next step is to get that table only.

In [6]:
table = bs.find('table', {"class": "wikitable"})
table

<table class="wikitable">
<tbody><tr>
<th>Postal code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park / Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor / Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park / Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern / Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>
</td></tr>
<tr>
<td>M3B
</td>
<td>North York
</td>
<td>Don Mills
</td></tr>
<tr>
<td>M4B
</td>
<td>East York
<

##### Creating function for accessing data from rows.

In [7]:
def get_row_data(tr, tag='td'): # td for data or th for header       
        return [td.get_text(strip=True) for td in tr.find_all(tag)]

##### Getting rows into a list.

In [8]:
rows = []
trs = table.find_all('tr')
header = get_row_data(trs[0], 'th')
for tr in trs:
    rows.append(get_row_data(tr, 'td'))

print("Header row: ", header)
print("Data row: ", rows[:5])

Header row:  ['Postal code', 'Borough', 'Neighborhood']
Data row:  [[], ['M1A', 'Not assigned', ''], ['M2A', 'Not assigned', ''], ['M3A', 'North York', 'Parkwoods'], ['M4A', 'North York', 'Victoria Village']]


#### Transforming into a _pandas.DataFrame_

##### Since _row[0]_ from the list _rows_ is empty, we pass to the DataFrame from _row[1]_ and so on.

In [9]:
import pandas as pd

df = pd.DataFrame(rows[1:], columns=header)
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


#### Cleaning Data

##### - Only process the cells that have an assigned borough. Ignore cells with a borough that is **Not assigned**.

In [10]:
import numpy as np

df.replace(['Not assigned'], np.nan, inplace=True)
df.dropna(subset=['Borough'], axis='index', inplace=True)
df.reset_index(inplace=True, drop=True)
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


##### - More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will 
#####   notice that **M5A** is listed twice and has two neighborhoods: **Harbourfront** and **Regent Park**. These two rows will 
#####   be combined into one row with the neighborhoods separated with a comma.

In [11]:
df['Neighborhood'] = df['Neighborhood'].str.replace("/", ",")
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government"


##### - If a cell has a borough but a **Not assigned** neighborhood, then the neighborhood will be the same as the borough.

In [12]:
df['Neighborhood'].replace('Not assigned', np.nan)
df['Neighborhood'].replace('', np.nan)
df['Neighborhood'].isna().value_counts()

False    103
Name: Neighborhood, dtype: int64

##### As it seems, there are no Neighborhood fields (rows) filled with *Not assigned* 

##### - In the last cell of your notebook, use the **.shape** method to print the number of rows of your dataframe.

In [13]:
df.shape

(103, 3)

<a id="item2"></a>

<hr>

### **2 - Getting Neighbourhoods Coordinates**

### Postal Codes Coordinates

##### Accessing coordinates from a CSV file due to charging API's issues.

In [14]:
!wget -O 'postal_coords.csv' 'http://cocl.us/Geospatial_data'

--2020-04-11 22:28:59--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 158.85.108.86, 169.48.113.194, 158.85.108.83
Connecting to cocl.us (cocl.us)|158.85.108.86|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-04-11 22:28:59--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|158.85.108.86|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-04-11 22:29:02--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.26.197, 107.152.27.197
Connecting to ibm.box.com (ibm.box.com)|107.152.26.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-04-11 22:29:03--  https://ib

##### Reading from CSV file.

In [15]:
import pandas as pd

df_coords = pd.read_csv('postal_coords.csv')
df_coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


##### Checking if it has the same number of rows as _df_

In [16]:
print('#rows df: ', df.shape)
print('#rows df_coords', df_coords.shape)

#rows df:  (103, 3)
#rows df_coords (103, 3)


##### Let's set both headers equal

In [17]:
df = df.rename(columns={'Postal code': 'Postal Code'})
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government"


##### Since it has the same length and same headers it's to merge both dataframes into one.  

In [18]:
df = pd.merge(df, df_coords, on='Postal Code')
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern , Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill , Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


##### _Dataframe all set up!_

<a id='#item3'></a>

<hr>

### **3 - Neighbourhood Clustering and Exploring**

### Creating Toronto Map

##### We'll create a map visualization with _Folium_ library. Toronto coordinates were found on <a>https://www.latlong.net/place/toronto-on-canada-27230.html</a> on Apr 4th, 2020 when this part of this notebook has been written.

In [21]:
#!conda install -c anaconda folium --yes

In [22]:
import folium

print(" folium imported!")

toronto_lat  = 43.651070
toronto_long = -79.347015

toronto_map = folium.Map(location=[toronto_lat, toronto_long], zoom_start=10)
toronto_map

 folium imported!


##### Since the map of Toronto can be visualized well, some markers will be added to it. It would be better if the most dense boroughs were 

##### taken in consideration in order to clean up the map first.

In [23]:
df['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East York            5
York                 5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

##### Taking most dense boroughs

In [24]:
df_clean = df[df['Borough'].str.contains('Downtown Toronto')]
df_clean = df_clean.append(df[df['Borough'].str.contains('North York')])
df_clean = df_clean.append(df[df['Borough'].str.contains('Scarborough')])
df_clean = df_clean.append(df[df['Borough'].str.contains('Etobicoke')])
df_clean['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Name: Borough, dtype: int64

##### Adding markers to the map

In [25]:
for lat, lng, label in zip(df_clean['Latitude'],df_clean['Longitude'], df_clean['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='b',
        fill_opacity=0.5,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

#### Accessing Toronto Venues Data

##### The Foursquare API will be used for getting nearby venues information. So let the Foursquer credentials be set.

In [27]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180604'
LIMIT=50 # limit of number of venues returned by Foursquare API
radius=500 # define radius

##### The function _getNearbyVenues()_ has been provided in the lab and will be very useful here.

In [28]:
import json
import requests
print("Libraries imported")

Libraries imported


In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

##### Getting venues for each neighbourhood 

In [30]:
df_venues = getNearbyVenues(names=df_clean['Neighborhood'], latitudes=df_clean['Latitude'],
                                 longitudes=df_clean['Longitude'])

Regent Park , Harbourfront
Queen's Park , Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond , Adelaide , King
Harbourfront East , Union Station , Toronto Islands
Toronto Dominion Centre , Design Exchange
Commerce Court , Victoria Hotel
University of Toronto , Harbord
Kensington Market , Chinatown , Grange Park
CN Tower , King and Spadina , Railway Lands , Harbourfront West , BathurstQuay , South Niagara , Island airport
Rosedale
Stn A PO Boxes
St. James Town , Cabbagetown
First Canadian Place , Underground city
Church and Wellesley
Parkwoods
Victoria Village
Lawrence Manor , Lawrence Heights
Don Mills
Glencairn
Don Mills
Hillcrest Village
Bathurst Manor , Wilson Heights , Downsview North
Fairview , Henry Farm , Oriole
Northwood Park , York University
Bayview Village
Downsview
York Mills , Silver Hills
Downsview
North Park , Maple Leaf Park , Upwood Park
Humber Summit
Willowdale , Newtonbrook
Downsview
Bedford Park , L

In [31]:
df_venues.shape

(1199, 7)

##### Checking out how many venues for each neighbourhood were returned

In [32]:
df_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood , Long Branch",10,10,10,10,10,10
"Bathurst Manor , Wilson Heights , Downsview North",20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
"Bedford Park , Lawrence Manor East",28,28,28,28,28,28
Berczy Park,50,50,50,50,50,50
"Birch Cliff , Cliffside West",4,4,4,4,4,4
"CN Tower , King and Spadina , Railway Lands , Harbourfront West , BathurstQuay , South Niagara , Island airport",16,16,16,16,16,16
Cedarbrae,9,9,9,9,9,9
Central Bay Street,50,50,50,50,50,50


##### One Hot Encoding for Venues Category

In [33]:
df_onehot = pd.get_dummies(df_venues[['Venue Category']], prefix="", prefix_sep="")

##### Adding original _Neighborhood_ column to one hot encoded one and moving _Neighbourhood_ to the first column of the dataframe

In [34]:
df_onehot['Neighbourhood'] = df_venues['Neighborhood']
fix = [df_onehot.columns[-1]] + list(df_onehot.columns[:-1])
df_onehot = df_onehot[fix]
df_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [35]:
df_onehot.shape

(1199, 228)

##### Grouping rows by Neighbourhood

In [36]:
df_grouped = df_onehot.groupby('Neighbourhood').mean().reset_index()
df_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood , Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor , Wilson Heights , Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park , Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Comparing the original dataset with the onehot encoded one. Both are grouped by neighbourhoods but the second one has been shrunk

In [37]:
print("Shape of df: " , df.shape)
print("Shape of df_grouped: ", df_grouped.shape)

Shape of df:  (103, 5)
Shape of df_grouped:  (65, 228)


#### Clustering

##### By now the clustering part of the code must take turn.

##### Importing library

In [38]:
from sklearn.cluster import KMeans

##### The column _Neighbourhood_ must be dropped for clustering

In [39]:
df_cluster = df_grouped.drop('Neighbourhood', axis=1)
df_cluster.head()

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Training KMeans model with k set to 4 such as the number of boroughs included in dataframe

In [40]:
k = 4

kmeans = KMeans(n_clusters = k, random_state = 0).fit(df_cluster)
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0],
      dtype=int32)

#### Visualizing Clustered Data

##### Merging _df_grouped_ unto _df_ 

In [41]:
df_toronto=pd.merge(df, df_grouped, how='left', left_on='Neighborhood', right_on='Neighbourhood')
df_toronto.drop('Neighbourhood', axis=1, inplace=True)
df_toronto.rename(columns={'Neighborhood_x':'Neighbourhood'}, inplace=True)

In [42]:
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763,0.083333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494,0.0,0.0,0.0,0.0,0.0,...,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412


##### Adding cluster labels to _df_toronto_

In [43]:
df_cluster['Cluster'] = kmeans.labels_
df_cluster.dtypes

Accessories Store                float64
Airport                          float64
Airport Food Court               float64
Airport Gate                     float64
Airport Lounge                   float64
Airport Service                  float64
Airport Terminal                 float64
American Restaurant              float64
Antique Shop                     float64
Aquarium                         float64
Art Gallery                      float64
Arts & Crafts Store              float64
Asian Restaurant                 float64
Athletics & Sports               float64
Auto Garage                      float64
BBQ Joint                        float64
Baby Store                       float64
Bagel Shop                       float64
Bakery                           float64
Bank                             float64
Bar                              float64
Baseball Field                   float64
Baseball Stadium                 float64
Basketball Court                 float64
Basketball Stadi

In [44]:
df_toronto.shape

(103, 232)

##### Creating a map for the clustered data

In [48]:
cluster_map = folium.Map(location=[toronto_lat, toronto_long], zoom_start=10)

##### Setting colours scheme

In [46]:
colours = ['red', 'green', 'blue', 'yellow']

##### Finally adding the markers to the map

In [49]:
for latitude, longitude, neigh, cluster in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighbourhood'],
                                                 df_cluster['Cluster']):
    label = folium.Popup(neigh, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color=colours[cluster],
        fill_opacity=0.7).add_to(cluster_map)  

cluster_map