## Segmenting and Clustering Neighborhoods in Toronto

### Part 1: Get and clean the Toronto postal codes, boroughs, and neighborhoods.

#### First import required libraries.

In [5]:
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd

#### Get a local copy of the Wikipedia article about the Toronto postal codes.

In [6]:

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
req = urllib.request.urlopen(url)
article = req.read().decode()

with open('List_of_postal_codes_of_Canada:_M', 'w') as fo:
    fo.write(article)

#### Parse the Toronto postal code table using Beautiful Soup and load the table data into a Pandas DataFrame.

In [7]:
# Load article, turn into soup and get the <table>s.
article = open('List_of_postal_codes_of_Canada:_M').read()
soup = BeautifulSoup(article, 'html.parser')
tables = soup.find_all('table', class_='sortable')

# Search through the tables for the one with the headings we want.
for table in tables:
    ths = table.find_all('th')
    headings = [th.text.strip() for th in ths]
    if headings[:3] == ['Postal Code', 'Borough', 'Neighbourhood']:
        break

# Extract the rows we want and load into a Pandas DataFrame.
cols = 'Postal_Code', 'Borough', 'Neighbourhood'
ls = []
for tr in table.find_all('tr'):
    tds = tr.find_all('td')
    if not tds:
        continue
    data = Postal_Code, Borough, Neighbourhood = [td.text.strip() for td in tds[:3]]
    ls.append(data)

df_Toronto = pd.DataFrame(ls, columns = cols)

df_Toronto

Unnamed: 0,Postal_Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


#### Drop the rows where the Borough = 'Not assigned'.

In [8]:
for x in df_Toronto.index:
  if df_Toronto.loc[x, 'Borough'] == 'Not assigned':
    df_Toronto.drop(x, inplace = True)
    
df_Toronto

Unnamed: 0,Postal_Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


#### Check if any of the Neighbourhoods = 'Not assigned'.

In [9]:
df_Toronto.Neighbourhood[df_Toronto.Neighbourhood == 'Not assigned'].count()

0

#### None of the Neighbourhood values equals 'Not assigned' so there is no need to assign the Borough value to be the Neighbourhood value for any rows in the dataframe.
##### Let's see what the final shape of the dataframe is.

In [10]:
df_Toronto.shape

(103, 3)

### Part 2: Add the latitude and longitude coordinates.

#### First install the geocoder Python package.

In [11]:
pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 6.4 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Note: you may need to restart the kernel to use updated packages.


#### Now import the geocoder Python package.

In [12]:
import geocoder

 #### Use the geocoder ArcGIS API to get the latitudes and longitudes and add them to the dataframe.

In [13]:
# define latitude and longitude as lists
latitude = []
longitude = []

# loop through the postal codes in the dataframe and for each postal code loop until coordinates are obtained
for postal_code in df_Toronto['Postal_Code']:
    g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
    print(postal_code, g.latlng)
    while(g.latlng is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        print(postal_code, g.latlng)
    lat_lng_coords = g.latlng
    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])

df_Toronto['Latitude'] = latitude
df_Toronto['Longitude'] = longitude

df_Toronto


M3A [43.75245000000007, -79.32990999999998]
M4A [43.73057000000006, -79.31305999999995]
M5A [43.65512000000007, -79.36263999999994]
M6A [43.72327000000007, -79.45041999999995]
M7A [43.66253000000006, -79.39187999999996]
M9A [43.662630000000036, -79.52830999999998]
M1B [43.811390000000074, -79.19661999999994]
M3B [43.74923000000007, -79.36185999999998]
M4B [43.70718000000005, -79.31191999999999]
M5B [43.65739000000008, -79.37803999999994]
M6B [43.70687000000004, -79.44811999999996]
M9B [43.65034000000003, -79.55361999999997]
M1C [43.78574000000003, -79.15874999999994]
M3C [43.72168000000005, -79.34351999999996]
M4C [43.68970000000007, -79.30681999999996]
M5C [43.65215000000006, -79.37586999999996]
M6C [43.69211000000007, -79.43035999999995]
M9C [43.64857000000006, -79.57824999999997]
M1E [43.765750000000025, -79.17469999999997]
M4E [43.67709000000008, -79.29546999999997]
M5E [43.64536000000004, -79.37305999999995]
M6E [43.68784000000005, -79.45045999999996]
M1G [43.76812000000007, -79.2

Unnamed: 0,Postal_Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.75245,-79.32991
3,M4A,North York,Victoria Village,43.73057,-79.31306
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
...,...,...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.65319,-79.51113
165,M4Y,Downtown Toronto,Church and Wellesley,43.66659,-79.38133
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.64869,-79.38544
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.63278,-79.48945


### Part 3: Explore and cluster the Toronto neighborhoods similar to how we did the New York City analysis and show how the clustered neighbourhoods appear on a map.

#### Install folium library for making maps.

In [14]:
pip install folium

Note: you may need to restart the kernel to use updated packages.


#### Import Folium.

In [15]:
import folium

#### Use ArcGIS API to get coordinates of Toronto and make a map of Toronto with the neighbourhoods superimposed on top.

In [19]:
address = 'Toronto, Ontario'

Toronto_location = geocoder.arcgis('{}, Toronto, Ontario')
Toronto_coords = Toronto_location.latlng
Toronto_latitude = Toronto_coords[0]
Toronto_longitude = Toronto_coords[1]

print('The geographical coordinates of Toronto are {}, {}.'.format(Toronto_latitude, Toronto_longitude))

map_Toronto = folium.Map(location = [Toronto_latitude, Toronto_longitude], zoom_start = 11)

for lat, lng, borough, neighbourhood in zip(df_Toronto['Latitude'], df_Toronto['Longitude'], df_Toronto['Borough'], df_Toronto['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_Toronto)

map_Toronto

The geographical coordinates of Toronto are 43.648690000000045, -79.38543999999996.


#### Use Foursquare to explore the Toronto neighborhoods.

In [62]:
# The code was removed by Watson Studio for sharing.

#### Import libraries for handling requests and creating color categories later on.

In [21]:
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors

#### Create a function to get the nearby venues.

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Run the above function on each neighbourhood in Toronto and create a new dataframe called df_Toronto_venues.

In [27]:
df_Toronto_venues = getNearbyVenues(names = df_Toronto['Neighbourhood'],
                                   latitudes = df_Toronto['Latitude'],
                                   longitudes = df_Toronto['Longitude']
                                   )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

#### Check the size of the resulting venues dataframe.

In [28]:
df_Toronto_venues

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.75245,-79.32991,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.75245,-79.32991,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.73057,-79.31306,Wigmore Park,43.731023,-79.310771,Park
3,Victoria Village,43.73057,-79.31306,Memories of Africa,43.726602,-79.312427,Grocery Store
4,"Regent Park, Harbourfront",43.65512,-79.36264,Roselle Desserts,43.653447,-79.362017,Bakery
...,...,...,...,...,...,...,...
2367,"Mimico NW, The Queensway West, South of Bloor,...",43.62513,-79.52681,Kingsway Boxing Club,43.627254,-79.526684,Gym
2368,"Mimico NW, The Queensway West, South of Bloor,...",43.62513,-79.52681,Tactical Products Canada,43.626801,-79.529388,Miscellaneous Shop
2369,"Mimico NW, The Queensway West, South of Bloor,...",43.62513,-79.52681,Queensway Fish & Chips,43.621720,-79.524588,Fish & Chips Shop
2370,"Mimico NW, The Queensway West, South of Bloor,...",43.62513,-79.52681,Sleep Country,43.621340,-79.526708,Mattress Store


#### Check how many venues were returned for each neighbourhood.

In [29]:
df_Toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,13,13,13,13,13,13
"Alderwood, Long Branch",4,4,4,4,4,4
"Bathurst Manor, Wilson Heights, Downsview North",2,2,2,2,2,2
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",21,21,21,21,21,21
...,...,...,...,...,...,...
"Willowdale, Willowdale West",5,5,5,5,5,5
Woburn,4,4,4,4,4,4
Woodbine Heights,16,16,16,16,16,16
York Mills West,4,4,4,4,4,4


#### Find out how many unique categories can be curated from all the returned venues.

In [30]:
print('There are {} unique categories.'.format(len(df_Toronto_venues['Venue Category'].unique())))

There are 261 unique categories.


#### Analyze each neighbourhood.

In [31]:
# one hot encoding
df_Toronto_onehot = pd.get_dummies(df_Toronto_venues[['Venue Category']], prefix = "", prefix_sep = "")

# add neighbourhood column back to dataframe
df_Toronto_onehot['Neighbourhood'] = df_Toronto_venues['Neighbourhood']

# move neighbourhood column to the first column
fixed_columns = [df_Toronto_onehot.columns[-1]] + list(df_Toronto_onehot.columns[:-1])
df_Toronto_onehot = df_Toronto_onehot[fixed_columns]

df_Toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Examine dataframe size.

In [32]:
df_Toronto_onehot.shape

(2372, 262)

#### Group rows by neighbourhood and by taking the mean of the frequency of occurrence of each category.

In [33]:
df_Toronto_grouped = df_Toronto_onehot.groupby('Neighbourhood').mean().reset_index()
df_Toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
93,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
94,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
95,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0


#### Print each neighbourhood along with the top 5 most common venues.

In [34]:
num_top_venues = 5

for hood in df_Toronto_grouped['Neighbourhood']:
    print('----' + hood + '----')
    temp = df_Toronto_grouped[df_Toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0  Hong Kong Restaurant  0.08
1         Grocery Store  0.08
2      Department Store  0.08
3        Discount Store  0.08
4    Chinese Restaurant  0.08


----Alderwood, Long Branch----
                   venue  freq
0      Convenience Store  0.25
1  Performing Arts Venue  0.25
2                    Pub  0.25
3                    Gym  0.25
4     Miscellaneous Shop  0.00


----Bathurst Manor, Wilson Heights, Downsview North----
                       venue  freq
0                Men's Store   0.5
1           Business Service   0.5
2    New American Restaurant   0.0
3         Mexican Restaurant   0.0
4  Middle Eastern Restaurant   0.0


----Bayview Village----
                        venue  freq
0                       Trail  0.50
1  Construction & Landscaping  0.25
2                        Park  0.25
3   Middle Eastern Restaurant  0.00
4          Miscellaneous Shop  0.00


----Bedford Park, Lawrence Manor East----
                venue  freq
0

#### Put that into a _pandas_ dataframe.

First, write a function to sort the venues in descending order.

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Oh, we need to import numpy; better do that now.

In [36]:
import numpy as np

Now create a new dataframe and display the top 10 venues for each neighbourhood.

In [38]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind + 1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind + 1))
        
# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns = columns)
neighbourhoods_venues_sorted['Neighbourhood'] = df_Toronto_grouped['Neighbourhood']

for ind in np.arange(df_Toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_Toronto_grouped.iloc[ind, :], num_top_venues)
    
neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Discount Store,Skating Rink,Shopping Mall,Supermarket,Vietnamese Restaurant,Grocery Store,Hong Kong Restaurant,Bakery,Bubble Tea Shop
1,"Alderwood, Long Branch",Convenience Store,Performing Arts Venue,Gym,Pub,Yoga Studio,Event Space,Dry Cleaner,Eastern European Restaurant,Electronics Store,Escape Room
2,"Bathurst Manor, Wilson Heights, Downsview North",Men's Store,Business Service,Yoga Studio,Dry Cleaner,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm
3,Bayview Village,Trail,Construction & Landscaping,Park,Event Space,Dry Cleaner,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Falafel Restaurant
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Hobby Shop,Sushi Restaurant,Greek Restaurant,Indian Restaurant,Juice Bar,Liquor Store,Comfort Food Restaurant


#### Cluster the neighbourhoods.

First import k-means.

In [39]:
from sklearn.cluster import KMeans

Use k-means to cluster the neighbourhoods into 5 clusters.

In [40]:
# set number of clusters
kclusters = 5

df_Toronto_grouped_clustering = df_Toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(df_Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 2, 1, 1, 1, 1, 1, 1], dtype=int32)

Create new dataframe that includes the cluster as well as the top 10 venues for each neighbourhood.

In [51]:
# add clustering lables
neighbourhoods_venues_sorted.insert(0, 'Cluster labels', kmeans.labels_)

ValueError: cannot insert Cluster labels, already exists

In [53]:
df_Toronto_merged = df_Toronto

df_Toronto_merged = df_Toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on = 'Neighbourhood', how = 'right')

df_Toronto_merged.head()

Unnamed: 0,Postal_Code,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M3A,North York,Parkwoods,43.75245,-79.32991,2,Food & Drink Shop,Park,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Farmers Market
3,M4A,North York,Victoria Village,43.73057,-79.31306,2,Grocery Store,Park,Farm,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Yoga Studio
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,1,Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Spa,Event Space,Food Truck,Electronics Store,Restaurant,Bakery
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042,1,Clothing Store,Women's Store,Food Court,Furniture / Home Store,Bookstore,Toy / Game Store,Restaurant,American Restaurant,Men's Store,Cosmetics Shop
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188,1,Coffee Shop,Sandwich Place,Park,Theater,Mediterranean Restaurant,Falafel Restaurant,Café,Fried Chicken Joint,Bank,Moving Target


Visualize the resulting clusters.

In [54]:
# create map
map_clusters = folium.Map(location = [Toronto_latitude, Toronto_longitude], zoom_start = 11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i * x) ** 2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_Toronto_merged['Latitude'], df_Toronto_merged['Longitude'], df_Toronto_merged['Neighbourhood'], df_Toronto_merged['Cluster labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster - 1],
        fill = True,
        fill_color = rainbow[cluster - 1],
        fill_opacity = 0.7).add_to(map_clusters)
    
map_clusters

### Examine clusters

#### Cluster 1: fast food & yoga

In [56]:
df_Toronto_merged.loc[df_Toronto_merged['Cluster labels'] == 0, df_Toronto_merged.columns[[2] + list(range(5, df_Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,"Malvern, Rouge",0,Fast Food Restaurant,Yoga Studio,Flower Shop,Fish Market,Fish & Chips Shop,Field,Farmers Market,Farm,Falafel Restaurant,Event Space


#### Cluster 2: coffee shops & cafes

In [57]:
df_Toronto_merged.loc[df_Toronto_merged['Cluster labels'] == 1, df_Toronto_merged.columns[[2] + list(range(5, df_Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"Regent Park, Harbourfront",1,Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Spa,Event Space,Food Truck,Electronics Store,Restaurant,Bakery
5,"Lawrence Manor, Lawrence Heights",1,Clothing Store,Women's Store,Food Court,Furniture / Home Store,Bookstore,Toy / Game Store,Restaurant,American Restaurant,Men's Store,Cosmetics Shop
6,"Queen's Park, Ontario Provincial Government",1,Coffee Shop,Sandwich Place,Park,Theater,Mediterranean Restaurant,Falafel Restaurant,Café,Fried Chicken Joint,Bank,Moving Target
8,"Islington Avenue, Humber Valley Village",1,Pharmacy,Grocery Store,Café,Skating Rink,Shopping Mall,Park,Bank,Farm,Falafel Restaurant,Farmers Market
11,Don Mills,1,Coffee Shop,Intersection,Park,Soccer Field,Spa,Supermarket,Beer Store,Bubble Tea Shop,Gas Station,Clothing Store
...,...,...,...,...,...,...,...,...,...,...,...,...
157,"First Canadian Place, Underground city",1,Coffee Shop,Hotel,Café,Restaurant,Gym,American Restaurant,Japanese Restaurant,Seafood Restaurant,Pizza Place,Bar
165,Church and Wellesley,1,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Gay Bar,Café,Mediterranean Restaurant,Hotel,Fast Food Restaurant,Dance Studio
168,"Business reply mail Processing Centre, South C...",1,Coffee Shop,Hotel,Café,Restaurant,Asian Restaurant,Italian Restaurant,Bar,Steakhouse,Concert Hall,Sushi Restaurant
169,"Old Mill South, King's Mill Park, Sunnylea, Hu...",1,Coffee Shop,Flower Shop,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Bank,Ethiopian Restaurant,Electronics Store,Escape Room


#### Cluster 3: parks

In [58]:
df_Toronto_merged.loc[df_Toronto_merged['Cluster labels'] == 2, df_Toronto_merged.columns[[2] + list(range(5, df_Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Parkwoods,2,Food & Drink Shop,Park,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Farmers Market
3,Victoria Village,2,Grocery Store,Park,Farm,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Yoga Studio
27,"Guildwood, Morningside, West Hill",2,Construction & Landscaping,Gym / Fitness Center,Park,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Farmers Market
46,Hillcrest Village,2,Bus Stop,Park,Residential Building (Apartment / Condo),Yoga Studio,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant
57,"East Toronto, Broadview North (Old East York)",2,Park,Playground,Intersection,Yoga Studio,Event Space,Dry Cleaner,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant
64,Bayview Village,2,Trail,Construction & Landscaping,Park,Event Space,Dry Cleaner,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Falafel Restaurant
73,"York Mills, Silver Hills",2,Park,Yoga Studio,Donut Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
77,"North Park, Maple Leaf Park, Upwood Park",2,Bakery,Basketball Court,Park,Farm,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Yoga Studio
103,"Forest Hill North & West, Forest Hill Road Park",2,Park,Yoga Studio,Donut Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
104,"High Park, The Junction South",2,Convenience Store,Park,Donut Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant


#### Cluster 4: pharmacy

In [59]:
df_Toronto_merged.loc[df_Toronto_merged['Cluster labels'] == 3, df_Toronto_merged.columns[[2] + list(range(5, df_Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
135,"Milliken, Agincourt North, Steeles East, L'Amo...",3,Pharmacy,Intersection,Yoga Studio,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Farm


#### Cluster 5: pizza places

In [60]:
df_Toronto_merged.loc[df_Toronto_merged['Cluster labels'] == 4, df_Toronto_merged.columns[[2] + list(range(5, df_Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,"West Deane Park, Princess Gardens, Martin Grov...",4,Pizza Place,Chinese Restaurant,Sandwich Place,Tea Room,Yoga Studio,Escape Room,Donut Shop,Dry Cleaner,Eastern European Restaurant,Electronics Store
107,Westmount,4,Pizza Place,Coffee Shop,Chinese Restaurant,Sandwich Place,Dry Cleaner,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Dog Run
116,"Kingsview Village, St. Phillips, Martin Grove ...",4,Pizza Place,Arts & Crafts Store,Bus Line,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
