# Segmenting and Clustering Neighborhoods in Toronto

## Section One

Import required libraries

In [1]:
import pandas as pd
import requests
from IPython.display import display, HTML

Fetch "List of postal codeds of Canada: M" then parse it into Pandas DataFrame

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
r = requests.get(url)
wiki_table = pd.read_html(r.text, flavor='html5lib')
df = wiki_table[0]
df.columns = ['PostalCode', 'Borough', 'Neighborhood']
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
...,...,...,...
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...


Drop unassigned Borough

In [3]:
df.drop(df[df['Borough'] == 'Not assigned'].index, inplace=True) 
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


Sort Postcode, Borough, and Neighbourhood then group by Postcode and Borough then aggregate the Neighbourhood columns by joining them into a string separated by "comma". Then check for "Not assigned" neighbourhood.

In [4]:
df.sort_values(['PostalCode', 'Borough', 'Neighborhood'], inplace=True)
df_grouped = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()
df_grouped[df_grouped['Neighborhood'] == 'Not assigned']

Unnamed: 0,PostalCode,Borough,Neighborhood


Final DataFrame

In [5]:
df_grouped

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,Kingsview Village / St. Phillips / Martin Grov...
101,M9V,Etobicoke,South Steeles / Silverstone / Humbergate / Jam...


In [6]:
df_grouped.shape

(103, 3)

## Section Two

Import required libraries

In [7]:
# !conda install -c conda-forge geopy --yes

In [8]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Toronto Geolocator")

In [9]:
df_location = df_grouped.copy()
# Because the geopy is unreliable I won't add new column manually
# df_location['Latitude'] = ''
# df_location['Longitude'] = ''
df_location

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,Kingsview Village / St. Phillips / Martin Grov...
101,M9V,Etobicoke,South Steeles / Silverstone / Humbergate / Jam...


__Note__: Unreliability proof; I limit the trial to about 10 times per postal code because each trial takes considerable time if you take into the account the time needed to get all the data for every postal code

In [10]:
lat_lon = []
for idx, row in df_location.iterrows():
    print(idx)
    try:
        postcode = df_location.at[idx, 'PostalCode']
        geo = None
        for i in range(10):
            geo = geolocator.geocode(f'{postcode}, Toronto, Ontario')
            if geo: break
        print(idx, postcode, geo)
        # Save
        if geo:
            lat_lon.append(idx, geo.latitude, geo.longitude)
    except:
        continue

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102


As it said in the assignment page, the package is very unreliable. Fallback using provided data.

In [11]:
# !wget -q -O geo_data.csv https://cocl.us/Geospatial_data

Parse the geo data

In [12]:
df_geo = pd.read_csv('geo_data.csv')
df_geo.columns = ['PostalCode', 'Latitude', 'Longitude']
df_geo

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [13]:
df_toronto = df_location.merge(df_geo, left_on='PostalCode', right_on='PostalCode')
df_toronto

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,Kingsview Village / St. Phillips / Martin Grov...,43.688905,-79.554724
101,M9V,Etobicoke,South Steeles / Silverstone / Humbergate / Jam...,43.739416,-79.588437


## Section Three

Set Foursquare variables

In [14]:
CLIENT_ID = 'EM0NULKILDUZUGSXYVR1TWWDQHMCB3CPMMB3CS0EWOSBDKML' # your Foursquare ID
CLIENT_SECRET = '4OMQKSEUD2IPNSM2WQZ144IHJNMDEDZG2GL1OHZ2YDRB5PWC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EM0NULKILDUZUGSXYVR1TWWDQHMCB3CPMMB3CS0EWOSBDKML
CLIENT_SECRET: 4OMQKSEUD2IPNSM2WQZ144IHJNMDEDZG2GL1OHZ2YDRB5PWC


In [16]:
df_toronto['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East York            5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=200):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Get 200 venues for each neighborhood.

In [18]:
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                latitudes=df_toronto['Latitude'],
                                longitudes=df_toronto['Longitude'])

Malvern / Rouge
Rouge Hill / Port Union / Highland Creek
Guildwood / Morningside / West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park / Ionview / East Birchmount Park
Golden Mile / Clairlea / Oakridge
Cliffside / Cliffcrest / Scarborough Village West
Birch Cliff / Cliffside West
Dorset Park / Wexford Heights / Scarborough Town Centre
Wexford / Maryvale
Agincourt
Clarks Corners / Tam O'Shanter / Sullivan
Milliken / Agincourt North / Steeles East / L'Amoreaux East
Steeles West / L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview / Henry Farm / Oriole
Bayview Village
York Mills / Silver Hills
Willowdale / Newtonbrook
Willowdale
York Mills West
Willowdale
Parkwoods
Don Mills
Don Mills
Bathurst Manor / Wilson Heights / Downsview North
Northwood Park / York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill / Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West / Riverdale
India Bazaar / The Beaches 

Save to CSV

In [19]:
toronto_venues.to_csv('toronto_venues.csv')

In [20]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
Alderwood / Long Branch,10,10,10,10,10,10
Bathurst Manor / Wilson Heights / Downsview North,19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
Bedford Park / Lawrence Manor East,26,26,26,26,26,26
...,...,...,...,...,...,...
Willowdale,40,40,40,40,40,40
Woburn,4,4,4,4,4,4
Woodbine Heights,8,8,8,8,8,8
York Mills / Silver Hills,2,2,2,2,2,2


In [21]:
len(toronto_venues['Venue Category'].unique())

268

In my case, `Venue Category` named `Neighborhood` must be get rid in order to avoid some error when transforming the DataFrame into one-hot form.

In [22]:
toronto_venues[toronto_venues['Venue Category'].str.contains('Nei')]

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
309,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
439,Studio District,43.659526,-79.340923,Leslieville,43.66207,-79.337856,Neighborhood
1026,Richmond / Adelaide / King,43.650571,-79.384568,Downtown Toronto,43.653232,-79.385296,Neighborhood
1108,Harbourfront East / Union Station / Toronto Is...,43.640816,-79.381752,Harbourfront,43.639526,-79.380688,Neighborhood


In [23]:
toronto_venues.drop(toronto_venues[toronto_venues['Venue Category'].str.contains('Nei')].index, inplace=True)
toronto_venues[toronto_venues['Venue Category'].str.contains('Nei')]

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category


In [24]:
toronto_venues['Venue Category'].value_counts()[0:20]

Coffee Shop             175
Café                    100
Restaurant               71
Italian Restaurant       52
Park                     49
Pizza Place              48
Japanese Restaurant      41
Hotel                    40
Sandwich Place           39
Bakery                   38
Clothing Store           34
Bar                      33
Gym                      32
American Restaurant      29
Fast Food Restaurant     29
Sushi Restaurant         28
Grocery Store            27
Breakfast Spot           25
Bank                     25
Pub                      24
Name: Venue Category, dtype: int64

Transform to one-hot form to make it easier to cluster then.

In [25]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
list_columns = list(filter(lambda x: x != 'Neighborhood', list(toronto_onehot.columns)))
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']
new_columns = ['Neighborhood'] + list_columns
toronto_onehot = toronto_onehot[new_columns]
toronto_onehot

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Malvern / Rouge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Rouge Hill / Port Union / Highland Creek,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Rouge Hill / Port Union / Highland Creek,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Guildwood / Morningside / West Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Guildwood / Morningside / West Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2119,South Steeles / Silverstone / Humbergate / Jam...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2120,South Steeles / Silverstone / Humbergate / Jam...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2121,South Steeles / Silverstone / Humbergate / Jam...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2122,Northwest,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Grouping same neighborhood name, since initially it based on postal code and each neighborhood may have several postal code if it has big area.

In [26]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
2,Bathurst Manor / Wilson Heights / Downsview North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.052632,0.000,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
4,Bedford Park / Lawrence Manor East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,Willowdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.025,0.0,0.0,0.0,0.0,0.0
90,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
91,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0
92,York Mills / Silver Hills,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0


In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

top_venues = 10

columns = ['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']
columns = [i + ' most common' for i in columns]
columns = ['Neighborhood'] + columns
columns
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']
for idx, row in toronto_grouped.iterrows():
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    toronto_venues_sorted.loc[idx, 1:] = row_categories_sorted.index.values[:10]
toronto_venues_sorted

Unnamed: 0,Neighborhood,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common
0,Agincourt,Latin American Restaurant,Breakfast Spot,Skating Rink,Lounge,Yoga Studio,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore
1,Alderwood / Long Branch,Pizza Place,Pub,Pharmacy,Sandwich Place,Pool,Athletics & Sports,Skating Rink,Coffee Shop,Gym,Convenience Store
2,Bathurst Manor / Wilson Heights / Downsview North,Coffee Shop,Bank,Pizza Place,Bridal Shop,Sandwich Place,Restaurant,Diner,Ice Cream Shop,Supermarket,Sushi Restaurant
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
4,Bedford Park / Lawrence Manor East,Sandwich Place,Italian Restaurant,Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Comfort Food Restaurant,Thai Restaurant,Juice Bar,Fast Food Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
89,Willowdale,Pizza Place,Coffee Shop,Ramen Restaurant,Sushi Restaurant,Sandwich Place,Café,Restaurant,Grocery Store,Indonesian Restaurant,Steakhouse
90,Woburn,Coffee Shop,Convenience Store,Korean Restaurant,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
91,Woodbine Heights,Skating Rink,Spa,Athletics & Sports,Curling Ice,Cosmetics Shop,Beer Store,Pharmacy,Park,Empanada Restaurant,Ethiopian Restaurant
92,York Mills / Silver Hills,Park,Cafeteria,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Fast Food Restaurant,Dumpling Restaurant


In [28]:
from sklearn.cluster import KMeans
toronto_cluster = toronto_grouped.drop('Neighborhood', axis=1)
cluster_size = 5
kmeans = KMeans(n_clusters=cluster_size, random_state=42).fit(toronto_cluster)
kmeans.labels_[:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [29]:
toronto_data1 = df_toronto[['Neighborhood', 'Latitude', 'Longitude']].groupby('Neighborhood').mean()
toronto_data1

Unnamed: 0_level_0,Latitude,Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Agincourt,43.794200,-79.262029
Alderwood / Long Branch,43.602414,-79.543484
Bathurst Manor / Wilson Heights / Downsview North,43.754328,-79.442259
Bayview Village,43.786947,-79.385975
Bedford Park / Lawrence Manor East,43.733283,-79.419750
...,...,...
Willowdale / Newtonbrook,43.789053,-79.408493
Woburn,43.770992,-79.216917
Woodbine Heights,43.695344,-79.318389
York Mills / Silver Hills,43.757490,-79.374714


In [30]:
toronto_data2 = toronto_venues_sorted
toronto_data2

Unnamed: 0,Neighborhood,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common
0,Agincourt,Latin American Restaurant,Breakfast Spot,Skating Rink,Lounge,Yoga Studio,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore
1,Alderwood / Long Branch,Pizza Place,Pub,Pharmacy,Sandwich Place,Pool,Athletics & Sports,Skating Rink,Coffee Shop,Gym,Convenience Store
2,Bathurst Manor / Wilson Heights / Downsview North,Coffee Shop,Bank,Pizza Place,Bridal Shop,Sandwich Place,Restaurant,Diner,Ice Cream Shop,Supermarket,Sushi Restaurant
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
4,Bedford Park / Lawrence Manor East,Sandwich Place,Italian Restaurant,Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Comfort Food Restaurant,Thai Restaurant,Juice Bar,Fast Food Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
89,Willowdale,Pizza Place,Coffee Shop,Ramen Restaurant,Sushi Restaurant,Sandwich Place,Café,Restaurant,Grocery Store,Indonesian Restaurant,Steakhouse
90,Woburn,Coffee Shop,Convenience Store,Korean Restaurant,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
91,Woodbine Heights,Skating Rink,Spa,Athletics & Sports,Curling Ice,Cosmetics Shop,Beer Store,Pharmacy,Park,Empanada Restaurant,Ethiopian Restaurant
92,York Mills / Silver Hills,Park,Cafeteria,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Fast Food Restaurant,Dumpling Restaurant


In [31]:
toronto_final_data = toronto_data1.merge(toronto_data2, left_on='Neighborhood', right_on='Neighborhood')
toronto_final_data['Cluster'] = kmeans.labels_
toronto_final_data

Unnamed: 0,Neighborhood,Latitude,Longitude,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common,Cluster
0,Agincourt,43.794200,-79.262029,Latin American Restaurant,Breakfast Spot,Skating Rink,Lounge,Yoga Studio,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,1
1,Alderwood / Long Branch,43.602414,-79.543484,Pizza Place,Pub,Pharmacy,Sandwich Place,Pool,Athletics & Sports,Skating Rink,Coffee Shop,Gym,Convenience Store,1
2,Bathurst Manor / Wilson Heights / Downsview North,43.754328,-79.442259,Coffee Shop,Bank,Pizza Place,Bridal Shop,Sandwich Place,Restaurant,Diner,Ice Cream Shop,Supermarket,Sushi Restaurant,1
3,Bayview Village,43.786947,-79.385975,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,1
4,Bedford Park / Lawrence Manor East,43.733283,-79.419750,Sandwich Place,Italian Restaurant,Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Comfort Food Restaurant,Thai Restaurant,Juice Bar,Fast Food Restaurant,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,Willowdale,43.776428,-79.425376,Pizza Place,Coffee Shop,Ramen Restaurant,Sushi Restaurant,Sandwich Place,Café,Restaurant,Grocery Store,Indonesian Restaurant,Steakhouse,1
90,Woburn,43.770992,-79.216917,Coffee Shop,Convenience Store,Korean Restaurant,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,1
91,Woodbine Heights,43.695344,-79.318389,Skating Rink,Spa,Athletics & Sports,Curling Ice,Cosmetics Shop,Beer Store,Pharmacy,Park,Empanada Restaurant,Ethiopian Restaurant,1
92,York Mills / Silver Hills,43.757490,-79.374714,Park,Cafeteria,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Fast Food Restaurant,Dumpling Restaurant,3


In [32]:
# !conda install -c conda-forge folium --yes

In [33]:
import folium
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

latitude = 43.722365
longitude = -79.412422
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(cluster_size)
ys = [i + x + (i*x)**2 for i in range(cluster_size)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for idx, row in toronto_final_data.iterrows():
    poi = row[0]
    lat = row[1]
    lon = row[2]
    most_common = row[3]
    cluster = row[-1]
    label = folium.Popup(f'{poi} cluster {cluster} most common {most_common}', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7
    ).add_to(map_clusters)
map_clusters

In [34]:
map_clusters.save('toronto_cluster_map.html')

In case the map is not showed, it can be seen in the [toronto_cluster_map.html](https://gpratama.github.io/toronto_cluster_map.html)

Based on the cluster showed in rendered map it seems that the most dominant cluster, cluster 1, is centered at the city center and not so dense when it far from the city center. There also another dominant cluster, cluster 3, that seems to have no identifiable cluster center. The other cluster seems to not dominant compared to the first two. It can be said that there are two interesting cluster, cluster 1 and cluster 3.

In [42]:
toronto_final_data[toronto_final_data['Cluster'] == 0]

Unnamed: 0,Neighborhood,Latitude,Longitude,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common,Cluster
70,Scarborough Village,43.744734,-79.239476,Playground,Yoga Studio,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,0


In [43]:
toronto_final_data[toronto_final_data['Cluster'] == 1]

Unnamed: 0,Neighborhood,Latitude,Longitude,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common,Cluster
0,Agincourt,43.794200,-79.262029,Latin American Restaurant,Breakfast Spot,Skating Rink,Lounge,Yoga Studio,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,1
1,Alderwood / Long Branch,43.602414,-79.543484,Pizza Place,Pub,Pharmacy,Sandwich Place,Pool,Athletics & Sports,Skating Rink,Coffee Shop,Gym,Convenience Store,1
2,Bathurst Manor / Wilson Heights / Downsview North,43.754328,-79.442259,Coffee Shop,Bank,Pizza Place,Bridal Shop,Sandwich Place,Restaurant,Diner,Ice Cream Shop,Supermarket,Sushi Restaurant,1
3,Bayview Village,43.786947,-79.385975,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,1
4,Bedford Park / Lawrence Manor East,43.733283,-79.419750,Sandwich Place,Italian Restaurant,Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Comfort Food Restaurant,Thai Restaurant,Juice Bar,Fast Food Restaurant,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,Westmount,43.696319,-79.532242,Pizza Place,Discount Store,Middle Eastern Restaurant,Intersection,Sandwich Place,Coffee Shop,Chinese Restaurant,Doner Restaurant,Distribution Center,Dog Run,1
88,Wexford / Maryvale,43.750072,-79.295849,Middle Eastern Restaurant,Breakfast Spot,Auto Garage,Sandwich Place,Bakery,Shopping Mall,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,1
89,Willowdale,43.776428,-79.425376,Pizza Place,Coffee Shop,Ramen Restaurant,Sushi Restaurant,Sandwich Place,Café,Restaurant,Grocery Store,Indonesian Restaurant,Steakhouse,1
90,Woburn,43.770992,-79.216917,Coffee Shop,Convenience Store,Korean Restaurant,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,1


Cluster 1 seems to have most various kind of common venues apparently.

In [44]:
toronto_final_data[toronto_final_data['Cluster'] == 1]['1st most common'].value_counts()

Coffee Shop                  16
Café                          8
Pizza Place                   6
Grocery Store                 6
Bakery                        3
Clothing Store                3
Fast Food Restaurant          3
Trail                         2
Indian Restaurant             2
Bar                           2
Park                          2
Airport Service               1
Discount Store                1
Hotel                         1
Hakka Restaurant              1
Latin American Restaurant     1
Skating Rink                  1
Pool                          1
Pub                           1
Dessert Shop                  1
Asian Restaurant              1
Greek Restaurant              1
Farmers Market                1
Garden                        1
College Stadium               1
Caribbean Restaurant          1
Restaurant                    1
Sandwich Place                1
Middle Eastern Restaurant     1
Mexican Restaurant            1
Field                         1
Drugstor

But when we see the count of most common venues it shows that it dominated by Coffee Shop

In [45]:
toronto_final_data[toronto_final_data['Cluster'] == 2]

Unnamed: 0,Neighborhood,Latitude,Longitude,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common,Cluster
49,Malvern / Rouge,43.806686,-79.194353,Fast Food Restaurant,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,2


In [46]:
toronto_final_data[toronto_final_data['Cluster'] == 3]

Unnamed: 0,Neighborhood,Latitude,Longitude,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common,Cluster
10,Caledonia-Fairbanks,43.689026,-79.453512,Park,Pool,Women's Store,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,3
26,East Toronto,43.685347,-79.338106,Park,Convenience Store,Metro Station,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,3
46,Lawrence Park,43.72802,-79.38879,Park,Swim School,Bus Line,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Diner,3
50,Milliken / Agincourt North / Steeles East / L'...,43.815252,-79.284577,Park,Playground,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,3
54,North Park / Maple Leaf Park / Upwood Park,43.713756,-79.490074,Park,Bakery,Construction & Landscaping,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,3
61,Parkwoods,43.753259,-79.329656,Park,Fireworks Store,Food & Drink Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Farmers Market,Dumpling Restaurant,3
65,Rosedale,43.679563,-79.377529,Park,Trail,Playground,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,3
81,The Kingsway / Montgomery Road / Old Mill North,43.653654,-79.506944,Park,Pool,River,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,3
87,Weston,43.706876,-79.518188,Park,Convenience Store,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,3
92,York Mills / Silver Hills,43.75749,-79.374714,Park,Cafeteria,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Fast Food Restaurant,Dumpling Restaurant,3


Cluster 3 showed that most common venue there is Park

In [47]:
toronto_final_data[toronto_final_data['Cluster'] == 4]

Unnamed: 0,Neighborhood,Latitude,Longitude,1st most common,2nd most common,3rd most common,4th most common,5th most common,6th most common,7th most common,8th most common,9th most common,10th most common,Cluster
39,Humberlea / Emery,43.724766,-79.532242,Food Service,Baseball Field,Yoga Studio,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Fast Food Restaurant,4


### Cluster Summary

| Cluster | Size | Most common          |
|---------|------|----------------------|
| 0       | 1    | Playground           |
| 1       | 80   | Coffee Shop          |
| 2       | 1    | Fast Food Restaurant |
| 3       | 11   | Park                 |
| 4       | 1    | Food Service         |