# SEGMENTING AND CLUSTERING NEIGHBOURHOODS IN TORONTO

### Step 1

Download all libraries we will need in this task:

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

In [2]:
from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


### Step 2

Scraping the Wikipedia page for Toronto neighbourhood data and creating a dataframe with Postal Codes, Boroughs, and Neighborhoods.

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')

#print(soup.prettify())

In [4]:
table = soup.find('table')


table_contents=[]

for row in table.findAll('td'):
    cell = {}
    if row.span.text == 'Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)
        
#print(table_contents)

df = pd.DataFrame(table_contents)
print(df.head(10))

df['Borough'] = df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

  PostalCode           Borough                      Neighborhood
0        M3A        North York                         Parkwoods
1        M4A        North York                  Victoria Village
2        M5A  Downtown Toronto         Regent Park, Harbourfront
3        M6A        North York  Lawrence Manor, Lawrence Heights
4        M7A      Queen's Park     Ontario Provincial Government
5        M9A         Etobicoke                  Islington Avenue
6        M1B       Scarborough                    Malvern, Rouge
7        M3B        North York                   Don Mills North
8        M4B         East York   Parkview Hill, Woodbine Gardens
9        M5B  Downtown Toronto          Garden District, Ryerson


### Step 3

Now that a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name is built, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. 

In [5]:
geospacial_data = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv'
coordinates = pd.read_csv(geospacial_data)

df2 = pd.merge(df, coordinates, left_on = ['PostalCode'], right_on = ['Postal Code'], how = 'left')

In [6]:
df2.drop(columns = ['Postal Code'], inplace = True)
print(df2.head(10))

  PostalCode           Borough                      Neighborhood   Latitude  \
0        M3A        North York                         Parkwoods  43.753259   
1        M4A        North York                  Victoria Village  43.725882   
2        M5A  Downtown Toronto         Regent Park, Harbourfront  43.654260   
3        M6A        North York  Lawrence Manor, Lawrence Heights  43.718518   
4        M7A      Queen's Park     Ontario Provincial Government  43.662301   
5        M9A         Etobicoke                  Islington Avenue  43.667856   
6        M1B       Scarborough                    Malvern, Rouge  43.806686   
7        M3B        North York                   Don Mills North  43.745906   
8        M4B         East York   Parkview Hill, Woodbine Gardens  43.706397   
9        M5B  Downtown Toronto          Garden District, Ryerson  43.657162   

   Longitude  
0 -79.329656  
1 -79.315572  
2 -79.360636  
3 -79.464763  
4 -79.389494  
5 -79.532242  
6 -79.194353  
7 -79.3521

### Step 4

We now explore and cluster the neighborhoods in Toronto. 

But first, count the number of Boroughs and Neighborhoods.

In [7]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df2['Borough'].unique()),
        df2.shape[0]
    )
)

The dataframe has 15 boroughs and 103 neighborhoods.


### 4.1
Use geopy library to get the latitude and longitude values of Toronto.

In [8]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [9]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### 4.2
#### Create a map of Toronto with neighborhoods superimposed on top.

In [10]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment if necessary
import folium # map rendering library

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.13.0             |     pyhd8ed1ab_0          97 KB  conda-forge
    aiohttp-3.7.4.post0        |   py37h5e8e339_0  

In [11]:
# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location = [latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df2['Latitude'], df2['Longitude'], df2['Borough'], df2['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_Toronto)  
    
map_Toronto

#### Define Foursquare Credentials and Version

In [12]:
CLIENT_ID = '3YSYZDN33OA2YAUZJSOAPVBVKNO1BJMY53IJGT4ZL3YK2G10' # your Foursquare ID
CLIENT_SECRET = 'BSRXIDKQFN1BXCB0XZR131L45Z32MAH4FY3RAB2JWCBUNEZS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3YSYZDN33OA2YAUZJSOAPVBVKNO1BJMY53IJGT4ZL3YK2G10
CLIENT_SECRET:BSRXIDKQFN1BXCB0XZR131L45Z32MAH4FY3RAB2JWCBUNEZS


#### Explore the first neighbourhood in Toronto

In [13]:
print('First neighbourhood on the list is {}.'.format(df2.loc[0,'Neighborhood']))

neighborhood_latitude = df2.loc[0, 'Latitude'] # neighbourhood's latitude value
neighborhood_longitude = df2.loc[0, 'Longitude'] # neighbourhood's longitude value

neighborhood_name = df2.loc[0, 'Neighborhood'] # neighbourhood's name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

First neighbourhood on the list is Parkwoods.
Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


Lets get the first 100 venues in this neighbourhood within a radius of 500 meters and examine the results, if needed.

In [14]:
radius = 500
url = 'http://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, LIMIT)

results = requests.get(url).json()
#results

### Explore all neighbourhoods in Toronto

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Write the code to run the above function on each neighbourhood and create a new dataframe called _toronto_venues_:

In [16]:
toronto_venues = getNearbyVenues(names = df2['Neighborhood'], latitudes = df2['Latitude'], longitudes = df2['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

In [17]:
print(toronto_venues.shape)
print(toronto_venues.head())

(1984, 7)
       Neighborhood  Neighborhood Latitude  Neighborhood Longitude  \
0         Parkwoods              43.753259              -79.329656   
1         Parkwoods              43.753259              -79.329656   
2         Parkwoods              43.753259              -79.329656   
3         Parkwoods              43.753259              -79.329656   
4  Victoria Village              43.725882              -79.315572   

                    Venue  Venue Latitude  Venue Longitude  \
0                     KFC       43.754387       -79.333021   
1         Brookbanks Park       43.751976       -79.332140   
2     Towns On The Ravine       43.754754       -79.332552   
3           Variety Store       43.751974       -79.333114   
4  Victoria Village Arena       43.723481       -79.315635   

         Venue Category  
0  Fast Food Restaurant  
1                  Park  
2                 Hotel  
3     Food & Drink Shop  
4          Hockey Arena  


Let's check how many venues were returned for each neighborhood.

In [21]:
print(toronto_venues.groupby('Neighborhood').count())

                                                    Neighborhood Latitude  \
Neighborhood                                                                
Agincourt                                                               4   
Alderwood, Long Branch                                                  7   
Bathurst Manor, Wilson Heights, Downsview North                        20   
Bayview Village                                                         4   
Bedford Park, Lawrence Manor East                                      21   
Berczy Park                                                            46   
Birch Cliff, Cliffside West                                             4   
Brockton, Parkdale Village, Exhibition Place                           22   
CN Tower, King and Spadina, Railway Lands, Harb...                     16   
Caledonia-Fairbanks                                                     4   
Cedarbrae                                                               7   

In [22]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 256 uniques categories.


We are now analyzing each neighbourhood.

In [30]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.head())

   Yoga Studio  Accessories Store  Adult Boutique  Afghan Restaurant  Airport  \
0            0                  0               0                  0        0   
1            0                  0               0                  0        0   
2            0                  0               0                  0        0   
3            0                  0               0                  0        0   
4            0                  0               0                  0        0   

   Airport Food Court  Airport Gate  Airport Lounge  Airport Service  \
0                   0             0               0                0   
1                   0             0               0                0   
2                   0             0               0                0   
3                   0             0               0                0   
4                   0             0               0                0   

   Airport Terminal  American Restaurant  Antique Shop  Aquarium  Art Gallery  \

In [31]:
toronto_onehot.shape

(1984, 256)

Next, let's group rows by neighbourhood and by taking the mean of the frequency of occurrence of each category.

In [32]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
print(toronto_grouped)

                                         Neighborhood  Yoga Studio  \
0                                           Agincourt     0.000000   
1                              Alderwood, Long Branch     0.000000   
2     Bathurst Manor, Wilson Heights, Downsview North     0.000000   
3                                     Bayview Village     0.000000   
4                   Bedford Park, Lawrence Manor East     0.000000   
5                                         Berczy Park     0.000000   
6                         Birch Cliff, Cliffside West     0.000000   
7        Brockton, Parkdale Village, Exhibition Place     0.000000   
8   CN Tower, King and Spadina, Railway Lands, Har...     0.000000   
9                                 Caledonia-Fairbanks     0.000000   
10                                          Cedarbrae     0.000000   
11                                 Central Bay Street     0.015625   
12                                           Christie     0.000000   
13                  

In [33]:
toronto_grouped.shape

(99, 256)

Let's print each neighbourhood along with the top 5 most common venues.

In [59]:
# separate Etobicoke neighbourhood in the form of a small table that contains info about all venues and their frequencies
temp = toronto_grouped[toronto_grouped['Neighborhood'] == 'Agincourt'].T.reset_index()
temp = temp.iloc[1:] # remove the first line that contains the name of the neighbourhood
temp.columns = ['venue','freq'] # rename the columns into venue and freq
temp['freq'] = temp['freq'].astype(float) # convert frequencies values into floats
temp = temp.round({'freq': 2}) # round the frequencies to 2 digits after comma
t = temp.sort_values('freq', ascending=False).reset_index(drop=True).head(10) # order the frequesncies from high to low

print(t)

                       venue  freq
0                     Lounge  0.25
1               Skating Rink  0.25
2  Latin American Restaurant  0.25
3             Breakfast Spot  0.25
4                Yoga Studio  0.00
5        Moroccan Restaurant  0.00
6   Mediterranean Restaurant  0.00
7                Men's Store  0.00
8              Metro Station  0.00
9         Mexican Restaurant  0.00


In [46]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge  0.25
1               Skating Rink  0.25
2  Latin American Restaurant  0.25
3             Breakfast Spot  0.25
4                Yoga Studio  0.00


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.29
1        Pharmacy  0.14
2    Skating Rink  0.14
3             Pub  0.14
4  Sandwich Place  0.14


----Bathurst Manor, Wilson Heights, Downsview North----
                       venue  freq
0                       Bank  0.10
1                Coffee Shop  0.10
2                Pizza Place  0.05
3                  Gift Shop  0.05
4  Middle Eastern Restaurant  0.05


----Bayview Village----
                 venue  freq
0                 Café  0.25
1  Japanese Restaurant  0.25
2                 Bank  0.25
3   Chinese Restaurant  0.25
4               Museum  0.00


----Bedford Park, Lawrence Manor East----
                     venue  freq
0              Coffee Shop  0.10
1               R

Let's put this info _pandas_ dataframe. First, write a function to sort the venues in descending order.

In [47]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [48]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind + 1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.head())

                                      Neighborhood 1st Most Common Venue  \
0                                        Agincourt                Lounge   
1                           Alderwood, Long Branch           Pizza Place   
2  Bathurst Manor, Wilson Heights, Downsview North                  Bank   
3                                  Bayview Village    Chinese Restaurant   
4                Bedford Park, Lawrence Manor East        Sandwich Place   

  2nd Most Common Venue      3rd Most Common Venue 4th Most Common Venue  \
0        Breakfast Spot  Latin American Restaurant          Skating Rink   
1              Pharmacy               Skating Rink                   Pub   
2           Coffee Shop          Convenience Store        Ice Cream Shop   
3                  Bank                       Café   Japanese Restaurant   
4           Coffee Shop                 Restaurant                  Café   

     5th Most Common Venue 6th Most Common Venue 7th Most Common Venue  \
0           

#### Cluster neighbourhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [53]:
# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(len(kmeans.labels_))#[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 5, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       3, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1,
       1, 1, 1, 5, 1, 6, 1, 0, 4, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 5, 1, 1,
       1, 1, 2, 1, 1, 1, 1, 5, 1, 1, 5], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [54]:
# add clustering labels
# uncomment for the first run
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df2

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')

  PostalCode           Borough                      Neighborhood   Latitude  \
0        M3A        North York                         Parkwoods  43.753259   
1        M4A        North York                  Victoria Village  43.725882   
2        M5A  Downtown Toronto         Regent Park, Harbourfront  43.654260   
3        M6A        North York  Lawrence Manor, Lawrence Heights  43.718518   
4        M7A      Queen's Park     Ontario Provincial Government  43.662301   

   Longitude  Cluster Labels 1st Most Common Venue  2nd Most Common Venue  \
0 -79.329656             1.0     Food & Drink Shop                   Park   
1 -79.315572             1.0           Coffee Shop  Portuguese Restaurant   
2 -79.360636             1.0           Coffee Shop                 Bakery   
3 -79.464763             1.0        Clothing Store            Coffee Shop   
4 -79.389494             1.0           Coffee Shop       Sushi Restaurant   

  3rd Most Common Venue   4th Most Common Venue  5th Most Comm

In [57]:
print(toronto_merged.head(10)) # check the last columns!

  PostalCode           Borough                      Neighborhood   Latitude  \
0        M3A        North York                         Parkwoods  43.753259   
1        M4A        North York                  Victoria Village  43.725882   
2        M5A  Downtown Toronto         Regent Park, Harbourfront  43.654260   
3        M6A        North York  Lawrence Manor, Lawrence Heights  43.718518   
4        M7A      Queen's Park     Ontario Provincial Government  43.662301   
5        M9A         Etobicoke                  Islington Avenue  43.667856   
6        M1B       Scarborough                    Malvern, Rouge  43.806686   
7        M3B        North York                   Don Mills North  43.745906   
8        M4B         East York   Parkview Hill, Woodbine Gardens  43.706397   
9        M5B  Downtown Toronto          Garden District, Ryerson  43.657162   

   Longitude  Cluster Labels 1st Most Common Venue  2nd Most Common Venue  \
0 -79.329656             1.0     Food & Drink Shop   

Finally, let's visualize the resulting clusters.

In [60]:
#print(toronto_merged['Cluster Labels'])
# remove neighborhoods that have no venues and were not clustered
toronto_merged.dropna(inplace = True)
toronto_merged['Cluster Labels']

0      1.0
1      1.0
2      1.0
3      1.0
4      1.0
6      1.0
7      1.0
8      1.0
9      1.0
10     0.0
11     2.0
12     6.0
13     1.0
14     1.0
15     1.0
16     1.0
17     1.0
18     1.0
19     1.0
20     1.0
21     5.0
22     1.0
23     1.0
24     1.0
25     0.0
26     1.0
27     1.0
28     1.0
29     1.0
30     1.0
31     1.0
32     4.0
33     1.0
34     1.0
35     5.0
36     1.0
37     1.0
38     1.0
39     1.0
40     5.0
41     1.0
42     1.0
43     1.0
44     1.0
46     0.0
47     1.0
48     1.0
49     1.0
50     1.0
51     1.0
52     5.0
53     1.0
54     1.0
55     1.0
56     1.0
57     3.0
58     1.0
59     1.0
60     1.0
61     1.0
62     1.0
63     0.0
65     1.0
66     5.0
67     1.0
68     1.0
69     1.0
70     1.0
71     1.0
72     1.0
73     1.0
74     1.0
75     1.0
76     1.0
77     5.0
78     1.0
79     1.0
80     1.0
81     1.0
82     1.0
83     1.0
84     1.0
85     1.0
86     1.0
87     1.0
88     1.0
89     1.0
90     1.0
91     5.0
92     1.0
93     1.0

In [61]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[int(cluster)-1],
        fill = True,
        fill_color = rainbow[int(cluster)-1],
        fill_opacity = 0.7).add_to(map_clusters)
       
map_clusters

### Examine the clusters
#### Cluster 1

In [62]:
print(toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]])

             Borough  Cluster Labels 1st Most Common Venue  \
10        North York             0.0    Italian Restaurant   
25  Downtown Toronto             0.0         Grocery Store   
46        North York             0.0         Grocery Store   
63              York             0.0              Bus Line   

   2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue  \
10            Playground                  Park        Discount Store   
25                  Café                  Park           Coffee Shop   
46         Shopping Mall                  Park                  Bank   
63         Grocery Store     Convenience Store               Brewery   

   5th Most Common Venue 6th Most Common Venue 7th Most Common Venue  \
10         Deli / Bodega      Department Store          Dessert Shop   
25             Nightclub            Baby Store    Italian Restaurant   
46      Department Store          Dessert Shop    Dim Sum Restaurant   
63        Discount Store      Department

#### Cluster 2

In [63]:
print(toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]])

                    Borough  Cluster Labels       1st Most Common Venue  \
0                North York             1.0           Food & Drink Shop   
1                North York             1.0                 Coffee Shop   
2          Downtown Toronto             1.0                 Coffee Shop   
3                North York             1.0              Clothing Store   
4              Queen's Park             1.0                 Coffee Shop   
6               Scarborough             1.0        Fast Food Restaurant   
7                North York             1.0                         Gym   
8                 East York             1.0                 Pizza Place   
9          Downtown Toronto             1.0                 Coffee Shop   
13               North York             1.0                         Gym   
14                East York             1.0                Dance Studio   
15         Downtown Toronto             1.0                 Coffee Shop   
16                     Yo

#### Cluster 3

In [41]:
print(toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]])

      Borough  Cluster Labels 1st Most Common Venue 2nd Most Common Venue  \
11  Etobicoke             2.0               Brewery         Women's Store   

   3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  \
11    College Rec Center           Event Space  Ethiopian Restaurant   

   6th Most Common Venue 7th Most Common Venue        8th Most Common Venue  \
11           Escape Room     Electronics Store  Eastern European Restaurant   

   9th Most Common Venue 10th Most Common Venue  
11             Drugstore             Donut Shop  


#### Cluster 4

In [42]:
print(toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]])

        Borough  Cluster Labels 1st Most Common Venue   2nd Most Common Venue  \
53   North York             3.0        Baseball Field              Food Truck   
57   North York             3.0        Baseball Field  Furniture / Home Store   
101   Etobicoke             3.0        Baseball Field           Women's Store   

    3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue  \
53          Women's Store    Dim Sum Restaurant           Event Space   
57          Women's Store    Dim Sum Restaurant           Event Space   
101        Farmers Market           Event Space  Ethiopian Restaurant   

    6th Most Common Venue 7th Most Common Venue        8th Most Common Venue  \
53   Ethiopian Restaurant           Escape Room            Electronics Store   
57   Ethiopian Restaurant           Escape Room            Electronics Store   
101           Escape Room     Electronics Store  Eastern European Restaurant   

           9th Most Common Venue 10th Most Common Venue  
53 

#### Cluster 5

In [43]:
print(toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]])

         Borough  Cluster Labels 1st Most Common Venue 2nd Most Common Venue  \
1     North York             4.0          Intersection           Pizza Place   
6    Scarborough             4.0  Fast Food Restaurant    Falafel Restaurant   
8      East York             4.0           Pizza Place             Pet Store   
17     Etobicoke             4.0              Pharmacy           Coffee Shop   
29     East York             4.0     Indian Restaurant        Sandwich Place   
47  East Toronto             4.0                  Park  Fast Food Restaurant   
56          York             4.0  Fast Food Restaurant        Discount Store   
63          York             4.0     Convenience Store           Pizza Place   
70     Etobicoke             4.0           Pizza Place          Intersection   
72    North York             4.0              Pharmacy           Coffee Shop   
77     Etobicoke             4.0                  Park           Pizza Place   
82   Scarborough             4.0        