# Segmenting and Clustering Neighborhoods in Toronto
In this section, we use the clean dataset ``result.csv`` built in the notebook ``data_science_cap.ipynb``. We first load the data and then, we generate maps to visualize the neighbourhoods and we analyze how they are clustered.

## import libraries

In [23]:
import numpy as np
import pandas as pd
import folium
import requests
import json

## Load data
We import the dataset build in the notebook ``data_science_cap.ipynb``.

In [3]:
result = pd.read_csv('result.csv')
result.drop(['Unnamed: 0'], axis=1, inplace=True)
print('Data dimension:', result.shape)
result.head(12)

Data dimension: (103, 5)


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens,Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937


## Display the map of Toronto
To display the map of Toronto, we need to find the coordinates of Toronto on the website https://www.latlong.net/place/toronto-on-canada-27230.html#:~:text=The%20latitude%20of%20Toronto%2C%20ON,%C2%B0%2020'%2049.2540''%20W. Then we use the library ``folium`` to dispay the map.

In [5]:
latitude = 43.651070
longitude = -79.347015

toronto_map = folium.Map(location=[latitude, longitude], zoom_start=12)
toronto_map

The next step is to put the neighbourhood from the dataset ``result`` with the features latitude and longitude.

In [7]:
# neighbourhood on Toronto map
toronto_loader = zip(result['Latitude'], result['Longitude'], result['Borough'], result['Neighbourhood'])

for lat, long, bor, neigh in toronto_loader:
    label = '{}, {}'.format(neigh, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(toronto_map) 
    
toronto_map

## Foursquare API and preprocess
The next step is to use the Foursquare API to explore the neighbourhood and preprocess the data.

In [12]:
# Foursquare credential
client_id = 'KC0ERLLNKX5LMBP2YGIE3JYDJWUHTLXCVIXQ10GCD5M0UUPU'
client_secret = 'H2SRB4BVVC1T332FFL3CRGK3JL520JR0R0JWGEVWMZQBCVMC'
version = '20180605'
print('Client ID:', client_id)
print('Client Secret:', client_secret)
print('Version:', version)
# get latitude and longitude of one Neighbourhood
neigh_lat = result.loc[0, 'Latitude']
neigh_long = result.loc[0, 'Longitude']
neigh_name = result.loc[0, 'Neighbourhood']
print('Neighbourhood: {}, Latitude: {}, Longitude: {}'.format(neigh_name, neigh_lat, neigh_long))

Client ID: KC0ERLLNKX5LMBP2YGIE3JYDJWUHTLXCVIXQ10GCD5M0UUPU
Client Secret: H2SRB4BVVC1T332FFL3CRGK3JL520JR0R0JWGEVWMZQBCVMC
Version: 20180605
Neighbourhood: Parkwoods, Latitude: 43.7532586, Longitude: -79.3296565


We get the $100$ venues in a radius of $500$ meters.

In [15]:
limit = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    client_id, 
    client_secret, 
    version, 
    neigh_lat, 
    neigh_long, 
    radius, 
    limit)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=KC0ERLLNKX5LMBP2YGIE3JYDJWUHTLXCVIXQ10GCD5M0UUPU&client_secret=H2SRB4BVVC1T332FFL3CRGK3JL520JR0R0JWGEVWMZQBCVMC&v=20180605&ll=43.7532586,-79.3296565&radius=500&limit=100'

In [19]:
# get the result from API
url_req = requests.get(url).json()

# extract category of the venues
def extract_category(row):
    try:
        cat_list = row['categories']
    except:
        cat_list = row['venue.categories']
        
    if len(cat_list) == 0:
        return None
    else:
        return cat_list[0]['name']
    


In [22]:
venues = url_req['response']['groups'][0]['items']
venues

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
   'name': 'Brookbanks Park',
   'location': {'address': 'Toronto',
    'lat': 43.751976046055574,
    'lng': -79.33214044722958,
    'labeledLatLngs': [{'label': 'display',
      'lat': 43.751976046055574,
      'lng': -79.33214044722958}],
    'distance': 245,
    'cc': 'CA',
    'city': 'Toronto',
    'state': 'ON',
    'country': 'Canada',
    'formattedAddress': ['Toronto', 'Toronto ON', 'Canada']},
   'categories': [{'id': '4bf58dd8d48988d163941735',
     'name': 'Park',
     'pluralName': 'Parks',
     'shortName': 'Park',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'count': 0, 'groups': []}},
  'referralId': 'e-0-4e8d9dcdd5fbbbb6b3003c7b-0'},
 {'reasons': {'count': 0,
   'items': [{'

In [27]:
# convert json to pandas dataframe
from pandas.io.json import json_normalize
venues_df = json_normalize(venues)
venues_df

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups
0,e-0-4e8d9dcdd5fbbbb6b3003c7b-0,0,"[{'summary': 'This spot is popular', 'type': '...",4e8d9dcdd5fbbbb6b3003c7b,Brookbanks Park,Toronto,43.751976,-79.33214,"[{'label': 'display', 'lat': 43.75197604605557...",245,CA,Toronto,ON,Canada,"[Toronto, Toronto ON, Canada]","[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",0,[]
1,e-0-4cb11e2075ebb60cd1c4caad-1,0,"[{'summary': 'This spot is popular', 'type': '...",4cb11e2075ebb60cd1c4caad,Variety Store,29 Valley Woods Road,43.751974,-79.333114,"[{'label': 'display', 'lat': 43.75197441585782...",312,CA,Toronto,ON,Canada,"[29 Valley Woods Road, Toronto ON, Canada]","[{'id': '4bf58dd8d48988d1f9941735', 'name': 'F...",0,[]
2,e-0-53622a89498ed84d6853265e-2,0,"[{'summary': 'This spot is popular', 'type': '...",53622a89498ed84d6853265e,TTC stop - 44 Valley Woods,44 Valley Woods Rd,43.755402,-79.333741,"[{'label': 'display', 'lat': 43.75540238129278...",405,CA,Toronto,ON,Canada,"[44 Valley Woods Rd, Toronto ON, Canada]","[{'id': '52f2ab2ebcbc57f1066b8b4f', 'name': 'B...",0,[]


In [28]:
# choose specific columns
columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venues_df = venues_df.loc[:, columns]
venues_df

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Brookbanks Park,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",43.751976,-79.33214
1,Variety Store,"[{'id': '4bf58dd8d48988d1f9941735', 'name': 'F...",43.751974,-79.333114
2,TTC stop - 44 Valley Woods,"[{'id': '52f2ab2ebcbc57f1066b8b4f', 'name': 'B...",43.755402,-79.333741


In [29]:
# get the categories
venues_df['venue.categories'] = venues_df.apply(extract_category, axis=1)
venues_df

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,TTC stop - 44 Valley Woods,Bus Stop,43.755402,-79.333741


We observe that we have 3 venues returned by Foursquare.

The next is to apply this procedure to all neighbourhoods.

In [44]:
# repeat procedure for all neighbourhood
client_id = 'KC0ERLLNKX5LMBP2YGIE3JYDJWUHTLXCVIXQ10GCD5M0UUPU'
client_secret = 'H2SRB4BVVC1T332FFL3CRGK3JL520JR0R0JWGEVWMZQBCVMC'
radius = 500
limit = 100
latitude = result['Latitude']
longitude = result['Longitude']
neighbourhood = result['Neighbourhood']

venues_list = []

data_loader = zip(neighbourhood, latitude, longitude)

for neigh, lat, long in data_loader:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    client_id, 
    client_secret, 
    version, 
    lat, 
    long, 
    radius, 
    limit)
    
    req = requests.get(url).json()
    
    venues = req['response']['groups'][0]['items']
    
    venues_list.append([(neigh, lat, long, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],
                       v['venue']['categories'][0]['name']) for v in venues])
    
    venues_df = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues_df.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 
                  'Venue Category']
    
venues_df.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [49]:
#shape of venues_df
venues_df.shape

(2223, 7)

In [46]:
venues_df.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Agincourt,4,4,4,4,4,4
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",2,2,2,2,2,2
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",8,8,8,8,8,8
"Alderwood,Long Branch",9,9,9,9,9,9
...,...,...,...,...,...,...
Willowdale West,5,5,5,5,5,5
Woburn,4,4,4,4,4,4
"Woodbine Gardens,Parkview Hill",12,12,12,12,12,12
Woodbine Heights,9,9,9,9,9,9


In [48]:
# number of categories
len(venues_df['Venue Category'].unique())

268

In [50]:
# convert to cvs
venues_df.to_csv('venues.csv')

## Neighborhood analysis

In [51]:
# convert categorical values
venue_one_hot = pd.get_dummies(venues_df[['Venue Category']], prefix="", prefix_sep="")

# add to dataframe
venue_one_hot['Neighborhood'] = venues_df['Neighborhood']
venue_one_hot.head()

Unnamed: 0,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [52]:
fix_col = [venue_one_hot.columns[-1]] + list(venue_one_hot.columns[:-1])
venue_one_hot = venue_one_hot[fix_col]
venue_one_hot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [53]:
# compute mean of frequency by neighborhood 
venue_group = venue_one_hot.groupby('Neighborhood').mean().reset_index()
venue_group.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [57]:
# save the dataset without neighborhood
venue_group.drop(['Neighborhood'], axis = 1, inplace=True)
venue_group.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [59]:
# save to csv-file
venue_group.to_csv('venue_group.csv')

The next step is to explore the most frequented venues in each neighborhood.

In [54]:
# extract the index of the most frequented venues
def most_common_venue(row, top):
    row_cat = row.iloc[1:]
    row_cat_sort = row_cat.sort_values(ascending=False)
    return row_cat_sort.index.values[0:top]

In [55]:
# create dataframe with the 10 most frequented venues
top_venue = 10

columns = ['Neighborhood']

for i in range(top_venue):
    columns.append('{} most common venue'.format(i+1))
    
venues_sort = pd.DataFrame(columns=columns)

venues_sort['Neighborhood'] = venue_group['Neighborhood']

for i in range(venue_group.shape[0]):
    venues_sort.iloc[i,1:] = most_common_venue(venue_group.iloc[i,:], top_venue)

venues_sort.head()

Unnamed: 0,Neighborhood,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Bar,Thai Restaurant,Restaurant,Sushi Restaurant,Breakfast Spot,Clothing Store,Gastropub,Seafood Restaurant
1,Agincourt,Breakfast Spot,Skating Rink,Latin American Restaurant,Lounge,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Park,Playground,Women's Store,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Grocery Store,Pharmacy,Fried Chicken Joint,Sandwich Place,Pizza Place,Fast Food Restaurant,Beer Store,Doner Restaurant,Donut Shop,Deli / Bodega
4,"Alderwood,Long Branch",Pizza Place,Gym,Pharmacy,Sandwich Place,Skating Rink,Pub,Pool,Coffee Shop,Comic Shop,Comfort Food Restaurant


As with the other dataframe, we save this one into a csv-file and reload it for the next task.

In [56]:
# save dataframe
venues_sort.to_csv('venues_sort.csv')

## Cluster neighborhood
Since the preprocessing part is done, the next step is apply the clustering method on the dataset and display the clusters on the Toronto map. The first step is to load our clean dataset. We first create 5 clusters.

In [60]:
# load cluster model
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs

In [None]:
# load the dataset, if re-run notebook
venue_group = pd.read_csv('venue_group.csv')
venues_sort = pd.read_csv('venues_sort.csv')
result = pd.read_csv('result.csv')

In [62]:
# create cluster model and train the model
n_clust = 5

X = venue_group

kmeans = KMeans(n_clusters=n_clust, random_state=42).fit(X)

The next step is to include the cluster labels in the dataset ``venues_sort``.

In [63]:
venues_sort['Cluster Labels'] = kmeans.labels_
venues_sort.head()

Unnamed: 0,Neighborhood,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
0,"Adelaide,King,Richmond",Coffee Shop,Café,Bar,Thai Restaurant,Restaurant,Sushi Restaurant,Breakfast Spot,Clothing Store,Gastropub,Seafood Restaurant,0
1,Agincourt,Breakfast Spot,Skating Rink,Latin American Restaurant,Lounge,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Park,Playground,Women's Store,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,1
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Grocery Store,Pharmacy,Fried Chicken Joint,Sandwich Place,Pizza Place,Fast Food Restaurant,Beer Store,Doner Restaurant,Donut Shop,Deli / Bodega,0
4,"Alderwood,Long Branch",Pizza Place,Gym,Pharmacy,Sandwich Place,Skating Rink,Pub,Pool,Coffee Shop,Comic Shop,Comfort Food Restaurant,0


In [65]:
# merge venues_sort with result
result.rename({'Neighbourhood': 'Neighborhood'}, axis=1, inplace=True)
merge_data = pd.merge(result, venues_sort, on='Neighborhood')
merge_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,1 most common venue,2 most common venue,3 most common venue,4 most common venue,5 most common venue,6 most common venue,7 most common venue,8 most common venue,9 most common venue,10 most common venue,Cluster Labels
0,M3A,North York,Parkwoods,43.753259,-79.329656,Park,Food & Drink Shop,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant,4
1,M4A,North York,Victoria Village,43.725882,-79.315572,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,Donut Shop,Doner Restaurant,Drugstore,Dog Run,Dance Studio,0
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,Coffee Shop,Park,Café,Pub,Bakery,Mexican Restaurant,Breakfast Spot,Theater,Yoga Studio,Electronics Store,0
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763,Clothing Store,Furniture / Home Store,Women's Store,Event Space,Miscellaneous Shop,Athletics & Sports,Carpet Store,Arts & Crafts Store,Boutique,Accessories Store,0
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,Coffee Shop,Park,Burger Joint,Gym,Fast Food Restaurant,Portuguese Restaurant,Nightclub,Music Venue,Mexican Restaurant,Juice Bar,0


After the merging of each dataset, we display the map with the clusters.

## Display map with clusters

In [75]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [80]:
# display the Toronto map with the clusters.
latitude = 43.651070
longitude = -79.347015

x = np.arange(n_clust)
ys = [i + x + (i*x)**2 for i in range(n_clust)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


toronto_map = folium.Map(location=[latitude, longitude], zoom_start=12)

toronto_loader = zip(merge_data['Latitude'], merge_data['Longitude'], merge_data['Neighborhood'], merge_data['Cluster Labels'])

for lat, long, neigh, lab in toronto_loader:
    label = folium.Popup(str(neigh) + 'Cluster' + str(lab), parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=rainbow[lab-1],
        fill=True,
        fill_color=rainbow[lab-1],
        fill_opacity=0.7).add_to(toronto_map)
    
toronto_map