# Segmenting and Clustering Neighborhoods in Toronto

In this coursera project, I downloaded Toronto Neighborhood data from Wikipedia. Also, I used the Foursquare API to explore the neighborhoods in Toronto. Furthermore, I used the *k*-means clustering algorithm to complete this project. Finally, the emerging clusters are visualized with Folium

## Table of Contents
- Get Toronto Neighborhoods from Wikipedia
- Download and explore venue data from Foursquare
- Cluster venues

In [130]:
#import packages
import requests, sys
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import folium # map rendering library

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

from geopy.geocoders import Nominatim

## Get Toronto Neighborhoods from Wikipedia

In [25]:
#getting wiki page as text object
page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
#interpreting html with BeautifulSoup
soup = BeautifulSoup(page)
soup.prettify();

In [26]:
#filter for the first table
table = soup.findAll("table")[0]

In [27]:
#html table to pandas dataframe
df = pd.read_html(str(table), header = 0)[0]

In [28]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 289 entries, 0 to 288
Data columns (total 3 columns):
Postcode         289 non-null object
Borough          289 non-null object
Neighbourhood    289 non-null object
dtypes: object(3)
memory usage: 6.9+ KB


In [30]:
dfclean = df.query("Borough != 'Not assigned'") #keep only the ones that have an assigned Borough

In [31]:
#replace non assigned neighbourhoods with the values from the Borough column
dfclean.loc[dfclean['Neighbourhood'] == "Not assigned" , "Neighbourhood"] = dfclean.loc[dfclean['Neighbourhood'] == "Not assigned" , "Borough"]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [32]:
#grouping the Borgoughs and putting the neighbourhoods in lists
dfgrouped = dfclean.groupby(['Borough', 'Postcode'], as_index = False).agg({'Neighbourhood': lambda x: ', '.join(set(x))})

In [33]:
dfgrouped.head()

Unnamed: 0,Borough,Postcode,Neighbourhood
0,Central Toronto,M4N,Lawrence Park
1,Central Toronto,M4P,Davisville North
2,Central Toronto,M4R,North Toronto West
3,Central Toronto,M4S,Davisville
4,Central Toronto,M4T,"Moore Park, Summerhill East"


In [34]:
dfgrouped.shape

(103, 3)

In [35]:
import geocoder # import geocoder

In [36]:
def get_geo(postal_code):
    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
      lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude, longitude

In [37]:
#get_geo('M4N')

In [38]:
#read the geo data from the provided file (the geocoder function gives a timeout)
geodf = pd.read_csv("https://cocl.us/Geospatial_data")
geodf.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [39]:
#Merging the two dataframes
df2 = pd.merge(dfgrouped, geodf, how='left', left_on=['Postcode'], right_on=['Postal Code'])
df2.drop('Postal Code', axis = 1, inplace = True)

In [40]:
df2.sample(3)

Unnamed: 0,Borough,Postcode,Neighbourhood,Latitude,Longitude
3,Central Toronto,M4S,Davisville,43.704324,-79.38879
88,Scarborough,M1T,"Sullivan, Tam O'Shanter, Clarks Corners",43.781638,-79.304302
49,Mississauga,M7R,Canada Post Gateway Processing Centre,43.636966,-79.615819


In [41]:
df2.shape

(103, 5)

In [45]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [44]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df2['Latitude'], df2['Longitude'], df2['Borough'], df2['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto

## Download venue data from Foursquare and explore it

In [48]:
with open("foursquareid.txt") as f:
    content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content] 

CLIENT_ID = content[0] # your Foursquare ID
CLIENT_SECRET = content[1] # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EB2TZB0KBU4KO0OEFCO1HQ5HKHNINMNNWH0IYZNGQYL4YSPB
CLIENT_SECRET:QEB1MUCKDQIUXTDSS0SORZ1NV5NLT5OZTZR3MRLF3DWO5SJJ


In [49]:
#reusing function from the previous lab
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [59]:
toronto_venues = getNearbyVenues(names=df2['Neighbourhood'],
                                   latitudes=df2['Latitude'],
                                   longitudes=df2['Longitude']
                                  )

Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Summerhill West, South Hill, Rathnelly, Deer Park, Forest Hill SE
Roselawn
Forest Hill West, Forest Hill North
The Annex, Yorkville, North Midtown
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Union Station, Harbourfront East, Toronto Islands
Design Exchange, Toronto Dominion Centre
Victoria Hotel, Commerce Court
University of Toronto, Harbord
Chinatown, Grange Park, Kensington Market
Bathurst Quay, Island airport, Harbourfront West, Railway Lands, South Niagara, CN Tower, King and Spadina
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Business Reply Mail Processing Centre 969 Eastern
Parkview Hill, Woodbine Gardens
Woodbine Heights
Leaside
Thorncl

Checking the output

In [60]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Lake,43.72791,-79.386857,Lake
2,Lawrence Park,43.72802,-79.38879,Dim Sum Deluxe,43.726953,-79.39426,Dim Sum Restaurant
3,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
4,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line


In [61]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,5,5,5,5,5,5
"Agincourt North, Milliken, L'Amoreaux East, Steeles East",2,2,2,2,2,2
"Bathurst Quay, Island airport, Harbourfront West, Railway Lands, South Niagara, CN Tower, King and Spadina",14,14,14,14,14,14
Bayview Village,4,4,4,4,4,4
Berczy Park,55,55,55,55,55,55
"Brockton, Exhibition Place, Parkdale Village",19,19,19,19,19,19
Business Reply Mail Processing Centre 969 Eastern,18,18,18,18,18,18
Caledonia-Fairbanks,6,6,6,6,6,6
Canada Post Gateway Processing Centre,11,11,11,11,11,11


In [103]:
#one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

In [104]:
# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
#print(toronto_onehot['Neighborhood'])
# move neighborhood column to the first column
all_columns = list(toronto_onehot.columns.values)
print(all_columns[-1])
idx = toronto_onehot.columns.get_loc('Neighborhood')
all_columns.pop(idx)

fixed_columns = ['Neighborhood'] + all_columns
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Yoga Studio


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Let's look at some statistics

In [105]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.010000,0.000000,0.0000,0.000000,0.000000,0.010000,0.000000,0.010000,0.000000
1,Agincourt,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,"Agincourt North, Milliken, L'Amoreaux East, St...",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,"Bathurst Quay, Island airport, Harbourfront We...",0.0,0.000000,0.000000,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,Bayview Village,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,Berczy Park,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,"Brockton, Exhibition Place, Parkdale Village",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.052632
7,Business Reply Mail Processing Centre 969 Eastern,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Caledonia-Fairbanks,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.166667,0.000000
9,Canada Post Gateway Processing Centre,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


#### Top venues in the neighborhoods

In [107]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2           Steakhouse  0.04
3      Thai Restaurant  0.04
4  American Restaurant  0.04


----Agincourt----
                venue  freq
0      Sandwich Place   0.2
1      Breakfast Spot   0.2
2  Chinese Restaurant   0.2
3              Lounge   0.2
4        Skating Rink   0.2


----Agincourt North, Milliken, L'Amoreaux East, Steeles East----
                       venue  freq
0                 Playground   0.5
1                       Park   0.5
2  Middle Eastern Restaurant   0.0
3                      Motel   0.0
4        Monument / Landmark   0.0


----Bathurst Quay, Island airport, Harbourfront West, Railway Lands, South Niagara, CN Tower, King and Spadina----
              venue  freq
0    Airport Lounge  0.14
1  Airport Terminal  0.14
2   Airport Service  0.14
3             Plane  0.07
4   Harbor / Marina  0.07


----Bayview Village----
                 venue  freq




----Lawrence Manor East, Bedford Park----
                  venue  freq
0           Coffee Shop  0.08
1    Italian Restaurant  0.08
2  Fast Food Restaurant  0.08
3        Cosmetics Shop  0.04
4      Greek Restaurant  0.04


----Lawrence Manor, Lawrence Heights----
                    venue  freq
0          Clothing Store   0.2
1       Accessories Store   0.1
2      Miscellaneous Shop   0.1
3                Boutique   0.1
4  Furniture / Home Store   0.1


----Lawrence Park----
                venue  freq
0         Swim School   0.2
1  Dim Sum Restaurant   0.2
2                Park   0.2
3            Bus Line   0.2
4                Lake   0.2


----Leaside----
                    venue  freq
0             Coffee Shop  0.09
1     Sporting Goods Shop  0.09
2            Burger Joint  0.06
3           Grocery Store  0.06
4  Furniture / Home Store  0.06


----Long Branch, Alderwood----
            venue  freq
0     Pizza Place  0.25
1             Gym  0.12
2    Skating Rink  0.12
3  Sandwic

         venue  freq
0  Coffee Shop  0.14
1     Aquarium  0.05
2        Hotel  0.05
3         Café  0.04
4  Pizza Place  0.04


----University of Toronto, Harbord----
                 venue  freq
0                 Café  0.11
1               Bakery  0.06
2           Restaurant  0.06
3  Japanese Restaurant  0.06
4            Bookstore  0.06


----Victoria Hotel, Commerce Court----
                 venue  freq
0          Coffee Shop  0.10
1                 Café  0.07
2           Restaurant  0.06
3                Hotel  0.06
4  American Restaurant  0.04


----Victoria Village----
                   venue  freq
0           Hockey Arena  0.25
1           Intersection  0.25
2  Portuguese Restaurant  0.25
3            Coffee Shop  0.25
4      Accessories Store  0.00


----Westmount----
                       venue  freq
0                Pizza Place  0.29
1             Sandwich Place  0.14
2               Intersection  0.14
3  Middle Eastern Restaurant  0.14
4                Coffee Shop  0.14



#### Put that information in a dataframe

In [117]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [143]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,Steakhouse,American Restaurant,Restaurant,Clothing Store,Gym,Asian Restaurant,Bar
1,Agincourt,Chinese Restaurant,Lounge,Sandwich Place,Skating Rink,Breakfast Spot,Yoga Studio,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
2,"Agincourt North, Milliken, L'Amoreaux East, St...",Playground,Park,Yoga Studio,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
3,"Bathurst Quay, Island airport, Harbourfront We...",Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Plane,Sculpture Garden,Boutique,Boat or Ferry,Airport Gate,Airport
4,Bayview Village,Café,Japanese Restaurant,Chinese Restaurant,Bank,Yoga Studio,Eastern European Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store


# Cluster Neighborhoods

In [141]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 1, 3, 1, 1, 1, 1, 1])

In [144]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df2

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Postcode,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,M4N,Lawrence Park,43.72802,-79.38879,1.0,Bus Line,Park,Dim Sum Restaurant,Swim School,Lake,Yoga Studio,Eastern European Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
1,Central Toronto,M4P,Davisville North,43.712751,-79.390197,1.0,Restaurant,Food & Drink Shop,Park,Gym,Breakfast Spot,Sandwich Place,Burger Joint,Hotel,Dog Run,Doner Restaurant
2,Central Toronto,M4R,North Toronto West,43.715383,-79.405678,1.0,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Cosmetics Shop,Sandwich Place,Salon / Barbershop,Chinese Restaurant,Park,Dessert Shop
3,Central Toronto,M4S,Davisville,43.704324,-79.38879,1.0,Sandwich Place,Dessert Shop,Thai Restaurant,Coffee Shop,Italian Restaurant,Café,Pizza Place,Sushi Restaurant,Gym,Seafood Restaurant
4,Central Toronto,M4T,"Moore Park, Summerhill East",43.689574,-79.38316,1.0,Playground,Gym,Trail,Yoga Studio,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore


In [145]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [146]:
# create map still zoomed on Toronto
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    try:
        cluster = int(cluster)
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)
    except:
        print('{} didnt work'.format(poi))
    
map_clusters

Islington Avenue didnt work
York Mills, Silver Hills didnt work
Willowdale, Newtonbrook didnt work
Upper Rouge didnt work


## Examine Clusters

#### Cluster 1
Parks & Playground

In [147]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,M4W,0.0,Park,Playground,Trail,Yoga Studio,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
36,M4J,0.0,Park,Metro Station,Convenience Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
39,M8X,0.0,River,Park,Yoga Studio,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
80,M1J,0.0,Playground,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
89,M1V,0.0,Playground,Park,Yoga Studio,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
102,M9N,0.0,Park,Convenience Store,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant


#### Cluster 2
Restaurants

In [148]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,1.0,Bus Line,Park,Dim Sum Restaurant,Swim School,Lake,Yoga Studio,Eastern European Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
1,M4P,1.0,Restaurant,Food & Drink Shop,Park,Gym,Breakfast Spot,Sandwich Place,Burger Joint,Hotel,Dog Run,Doner Restaurant
2,M4R,1.0,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Cosmetics Shop,Sandwich Place,Salon / Barbershop,Chinese Restaurant,Park,Dessert Shop
3,M4S,1.0,Sandwich Place,Dessert Shop,Thai Restaurant,Coffee Shop,Italian Restaurant,Café,Pizza Place,Sushi Restaurant,Gym,Seafood Restaurant
4,M4T,1.0,Playground,Gym,Trail,Yoga Studio,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
5,M4V,1.0,Coffee Shop,Convenience Store,Pub,Supermarket,Bagel Shop,Sports Bar,Sushi Restaurant,American Restaurant,Pizza Place,Vietnamese Restaurant
6,M5N,1.0,Garden,Home Service,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
7,M5P,1.0,Mexican Restaurant,Sushi Restaurant,Jewelry Store,Trail,Discount Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Yoga Studio
8,M5R,1.0,Sandwich Place,Café,Coffee Shop,Pizza Place,Pub,Indian Restaurant,Burger Joint,American Restaurant,BBQ Joint,Liquor Store
10,M4X,1.0,Coffee Shop,Restaurant,Bakery,Café,Italian Restaurant,Pub,Pizza Place,Pet Store,Sandwich Place,Butcher


#### Cluster 3
Sports

In [149]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,M8Y,2.0,Baseball Field,Construction & Landscaping,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Yoga Studio
73,M9M,2.0,Baseball Field,Empanada Restaurant,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Yoga Studio,Discount Store


#### Cluster 4
Banks

In [150]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
43,M9B,3.0,Bank,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Discount Store
52,M2K,3.0,Café,Japanese Restaurant,Chinese Restaurant,Bank,Yoga Studio,Eastern European Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
56,M2P,3.0,Park,Bank,Bar,Yoga Studio,Electronics Store,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
64,M3L,3.0,Grocery Store,Bank,Shopping Mall,Hotel,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore


#### Cluster 5
Fast Food

In [151]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
75,M1B,4.0,Fast Food Restaurant,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
