## Clustering data from Toronto Neighbours

This part was explained in previous documents, from the manipulation in data to coordinations in boroughs. 

In [37]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

In [2]:
html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(html, 'html.parser')

In [3]:
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

In [4]:
df_toronto = pd.DataFrame(table_contents)
df_toronto['Borough']=df_toronto['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

In [5]:
df_latlag = pd.read_csv('Geospatial_Coordinates.csv')
df_toronto = df_toronto.merge(df_latlag, left_on = 'PostalCode', right_on = 'Postal Code')
df_toronto.drop('Postal Code', axis = 1, inplace = True)

Using Folium, it was plotted a map with all dots regarding neighbourhoods in Toronto. To do this, the map needed a reference from any place in Toronto to be used as reference in map.

In [6]:
import folium

In [7]:
latitude = float(df_toronto['Latitude'][0])
longitude = float(df_toronto['Longitude'][0])

This loop is getting the all data from the dataframe and popping blue dots.

In [8]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#00008b',
        fill_opacity=0.6,
        parse_html=False).add_to(map_toronto)  

In [9]:
map_toronto

<h3>Understanding the data:</h3>

From the biggest perspective, I took a sample from a singular borough, which I used the one with the most dots in the dataset. So I decided to group by borough and sorted the data.

In [10]:
df_toronto.groupby('Borough').count().sort_values(by = ['PostalCode'], ascending = False)

Unnamed: 0_level_0,PostalCode,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
North York,24,24,24,24
Downtown Toronto,17,17,17,17
Scarborough,17,17,17,17
Etobicoke,11,11,11,11
Central Toronto,9,9,9,9
West Toronto,6,6,6,6
York,5,5,5,5
East Toronto,4,4,4,4
East York,4,4,4,4
Downtown Toronto Stn A,1,1,1,1


In [11]:
df_northyork = df_toronto[df_toronto['Borough'] == 'North York'].reset_index(drop = True)
df_northyork.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills North,43.745906,-79.352188
4,M6B,North York,Glencairn,43.709577,-79.445073


Using the same method to design this map, I decided to explore the data from North York and plotted all dots as presented in dataset.

In [12]:
latitude = float(df_northyork['Latitude'][3])
longitude = float(df_northyork['Longitude'][3])
map_northyork = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, borough, neighborhood in zip(df_northyork['Latitude'], df_northyork['Longitude'], df_northyork['Borough'], df_northyork['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#00012b',
        fill_opacity=0.6,
        parse_html=False).add_to(map_northyork)
    
map_northyork

<h3>Foursquare API</h3>

From this point, data was collected from Foursquare venues. In order to achieve the objective, I set a radius of 1000, because lower number was uneffective.

In [16]:
CLIENT_ID = # your Foursquare ID
CLIENT_SECRET = # your Foursquare Secret
VERSION = # Foursquare API version

In [19]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60a1c90b813ea13b70e034ac'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 29,
  'suggestedBounds': {'ne': {'lat': 43.762258609000014,
    'lng': -79.31721997969855},
   'sw': {'lat': 43.74425859099999, 'lng': -79.34209302030145}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b8991cbf964a520814232e3',
       'name': "Allwyn's Bakery",
       'location': {'address': '81 Underhill drive',
        'lat': 43.75984035203157,
        'lng': -79.32471879917513,
        'labeledLatLngs': [{'label': 'display'

A function to organize the categories.

In [21]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

All column is organized to created a dataframe with all the information needed for further analysis.

In [23]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Allwyn's Bakery,Caribbean Restaurant,43.75984,-79.324719
1,Brookbanks Park,Park,43.751976,-79.33214
2,Tim Hortons,Café,43.760668,-79.326368
3,Bruno's valu-mart,Grocery Store,43.746143,-79.32463
4,High Street Fish & Chips,Fish & Chips Shop,43.74526,-79.324949


In [24]:
nearby_venues.shape

(29, 4)

This function collects all venues from nearby locations in Toronto limited to 100 results, in order to assure a better performance.

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

In [31]:
toronto_venues.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2122 entries, 0 to 2121
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Neighborhood            2122 non-null   object 
 1   Neighborhood Latitude   2122 non-null   float64
 2   Neighborhood Longitude  2122 non-null   float64
 3   Venue                   2122 non-null   object 
 4   Venue Latitude          2122 non-null   float64
 5   Venue Longitude         2122 non-null   float64
 6   Venue Category          2122 non-null   object 
dtypes: float64(4), object(3)
memory usage: 116.2+ KB


Finally, it has become the dataset with venues and neighborhoods, including geolocation for both attributes.

In [32]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [33]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",6,6,6,6,6,6
"Bathurst Manor, Wilson Heights, Downsview North",22,22,22,22,22,22
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",26,26,26,26,26,26
...,...,...,...,...,...,...
"Willowdale, Newtonbrook",1,1,1,1,1,1
Woburn,4,4,4,4,4,4
Woodbine Heights,5,5,5,5,5,5
York Mills West,3,3,3,3,3,3


After creating the dataset, venue categories should be transformed in a dummy variable, in order to develop an frequency analysis for each one of them.

In [34]:
toronto_dum = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_dum['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_dum.columns[-1]] + list(toronto_dum.columns[:-1])
toronto_dum = toronto_dum[fixed_columns]
toronto_grouped = toronto_dum.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [35]:
toronto_grouped.describe()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
mean,0.003338,0.000714,0.000127,0.003088,0.000588,0.000588,0.001176,0.001765,0.001176,0.0103,...,0.0003,0.002,0.002429,0.000465,0.008023,0.000476,0.001187,0.000208,0.000714,0.0035
std,0.009805,0.007143,0.001266,0.025625,0.005882,0.005882,0.011765,0.017647,0.011765,0.051921,...,0.001714,0.02,0.008191,0.003515,0.031644,0.004762,0.004013,0.002083,0.007143,0.026052
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,0.047619,0.071429,0.012658,0.25,0.058824,0.058824,0.117647,0.176471,0.117647,0.5,...,0.01,0.2,0.051724,0.032258,0.2,0.047619,0.022727,0.020833,0.071429,0.25


In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [44]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Lounge,Breakfast Spot,Clothing Store,Skating Rink,Latin American Restaurant
1,"Alderwood, Long Branch",Pizza Place,Sandwich Place,Coffee Shop,Pub,Gym
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pet Store,Gas Station,Park
3,Bayview Village,Café,Japanese Restaurant,Bank,Chinese Restaurant,Yoga Studio
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Restaurant,Sandwich Place,Indian Restaurant


In [51]:
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

After designing the most common venue per neighborhood, it is time to create a k-means clusters for all this categories at once. It was chosen 6 clusters, after make some tests with other numbers, which represents a better view of clusters in Toronto.

In [136]:
# set number of clusters
kclusters = 6

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([5, 0, 5, 5, 0, 5, 5, 5, 5, 1])

This line below was created just to ensure multiple tests in kclusters. After the column was dropped, is inserted again in the dataset.

In [137]:
neighborhoods_venues_sorted.drop('Cluster Labels', axis = 1, inplace = True)

In [138]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [139]:
neighborhoods_venues_sorted['Cluster Labels']

0     5
1     0
2     5
3     5
4     0
     ..
95    1
96    5
97    5
98    1
99    2
Name: Cluster Labels, Length: 100, dtype: int32

In [140]:
toronto_merged.drop('Cluster Labels', axis = 1, inplace = True)

In [141]:
toronto_merged = df_toronto
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Park,Food & Drink Shop,Yoga Studio,Mexican Restaurant,Molecular Gastronomy Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,5.0,Intersection,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,5.0,Coffee Shop,Park,Bakery,Theater,Pub
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,5.0,Clothing Store,Furniture / Home Store,Women's Store,Coffee Shop,Miscellaneous Shop
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,5.0,Coffee Shop,Sushi Restaurant,Yoga Studio,Italian Restaurant,Burrito Place


After multiple attempts to create a graph showing the clusters created, I realized that data was missing. In order to simplify the process, I decided to drop rows with NaN values.

In [142]:
toronto_merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 103 entries, 0 to 102
Data columns (total 11 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   PostalCode             103 non-null    object 
 1   Borough                103 non-null    object 
 2   Neighborhood           103 non-null    object 
 3   Latitude               103 non-null    float64
 4   Longitude              103 non-null    float64
 5   Cluster Labels         100 non-null    float64
 6   1st Most Common Venue  100 non-null    object 
 7   2nd Most Common Venue  100 non-null    object 
 8   3rd Most Common Venue  100 non-null    object 
 9   4th Most Common Venue  100 non-null    object 
 10  5th Most Common Venue  100 non-null    object 
dtypes: float64(3), object(8)
memory usage: 13.7+ KB


In [144]:
toronto_merged.dropna(subset=['Cluster Labels'], axis = 0, inplace = True)

In [145]:
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1.0,Park,Food & Drink Shop,Yoga Studio,Mexican Restaurant,Molecular Gastronomy Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,5.0,Intersection,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,5.0,Coffee Shop,Park,Bakery,Theater,Pub
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,5.0,Clothing Store,Furniture / Home Store,Women's Store,Coffee Shop,Miscellaneous Shop
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,5.0,Coffee Shop,Sushi Restaurant,Yoga Studio,Italian Restaurant,Burrito Place


Finally, this map shows the distribution of clusters in the map.

In [146]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters