# Segmenting and Clustering Neighborhoods in Toronto

This notebook serves as my solution the Applied Data Science Capstone - Week 3 peer graded assignment.

First thing to do is to read the webpage. Using the `BeautifulSoup` package, I'll parse the webpage into `lxml`.

In [1]:
import urllib.request
from bs4 import BeautifulSoup


url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")

Looking at the Wikipedia page, there's only one table in it and its class attribute is `wikitable sortable`. Using that information and the `find` function of the `BeautifulSoup` package, we find the table xml

In [2]:
postal_code_table = soup.find('table', class_='wikitable sortable')

Knowing how HTML tables are written, I looped through each `tr` of the HTML table and use follow the instructions:

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [3]:
PostalCode = []
Borough = []
Neighborhood = []

for row in postal_code_table.findAll('tr')[1:]:
    cells = row.findAll('td')
    borough = cells[1].find(text=True).strip()
    
    if borough != 'Not assigned':
        PostalCode.append(cells[0].find(text=True).strip())
        Borough.append(borough)
        neighborhood = cells[2].find(text=True).strip()
        if neighborhood == 'Not assigned': 
            Neighborhood.append(borough)
        else:
            Neighborhood.append(neighborhood)

In [4]:
import pandas as pd

df = pd.DataFrame(PostalCode,columns=['PostalCode'])
df['Borough'] = Borough
df['Neighborhood'] = Neighborhood

In [5]:
df.shape

(103, 3)

## Getting the latitude and longitude

Tried using the `geocoder` package, but it keeps returning `<[REQUEST_DENIED] Google - Geocode [empty]>`. So I'll be using the geospatial data.

After reading the CSV file, I created a new column called `PostalCode`, which is the same column name with the original dataframe `df`, and use the `pandas.merge` function to create the dataframe

In [6]:
geospatial_df = pd.read_csv('Geospatial_Coordinates.csv')
geospatial_df['PostalCode'] = geospatial_df['Postal Code']
geospatial_df.drop(['Postal Code'], axis=1, inplace=True)

df = pd.merge(df, geospatial_df, on='PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## Exploring the Dataset

In [7]:
print("The dataframe has {} boroughs and {} neighborhoods".format(len(df['Borough'].unique()), df.shape[0]))

The dataframe has 10 boroughs and 103 neighborhoods


First thing, we need to import libraries that we're going to use.

In [8]:
import numpy as np
import pandas as pd
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans

print("Libraries imported.")

Libraries imported.


Using the `geopy` library, we can get the latitude and longitude of Toronto. We can use `toronto_explorer` as the user agent.

In [9]:
address = "Toronto, Canada"
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Let's visualize the map of Toronto with neighborhoods superimposed on top

In [10]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    

map_toronto

## Foursquare API

Using the Foursquare API, let's explore the types of venues in Toronto, Canada.

Note: Before running the next code cell, make sure to populate the `.env` file with your Foursquare API credentials.

In [11]:
%load_ext dotenv
%dotenv

import os


CLIENT_ID = os.environ['CLIENT_ID']
CLIENT_SECRET = os.environ['CLIENT_SECRET']
VERSION = os.environ['VERSION']

Now for the fun part! Let's get the top 100 venues within 500 meters in Toronto, Canada.

In [12]:
import json

radius = 500
LIMIT = 100
url = "https://api.foursquare.com/v2/venues/explore?ll={},{}&radius={}&limit={}&client_id={}&client_secret={}&v={}".format(
    latitude, 
    longitude,
    radius,
    LIMIT,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION
)

response = urllib.request.urlopen(url)
data = response.read()
encoding = response.info().get_content_charset('utf-8')
results = json.loads(data.decode(encoding))
results

{'meta': {'code': 200, 'requestId': '5f0016e4f1677070aeff3746'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 77,
  'suggestedBounds': {'ne': {'lat': 43.6579817045, 'lng': -79.37772678059432},
   'sw': {'lat': 43.6489816955, 'lng': -79.39014261940568}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          'lng'

Using the function provided in the labs to help with the project, the next cell loads the get_category_type function

In [13]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [14]:
from pandas import json_normalize

venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Chatime 日出茶太,Bubble Tea Shop,43.655542,-79.384684
3,Textile Museum of Canada,Art Museum,43.654396,-79.3865
4,Indigo,Bookstore,43.653515,-79.380696


Let's see the number of venues returned by Foursquare.

In [15]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

77 venues were returned by Foursquare.


## Exploring the Neighborhoods in Toronto

The following function, again, came from the labs.

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        response = urllib.request.urlopen(url)
        data = response.read()
        encoding = response.info().get_content_charset('utf-8')
        results = json.loads(data.decode(encoding))["response"]['groups'][0]['items']
        results
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
from functools import reduce

boroughs = df['Borough'].unique()

venues = getNearbyVenues(names=boroughs,latitudes=df['Latitude'],longitudes=df['Longitude'])
venues

North York
Downtown Toronto
Etobicoke
Scarborough
East York
York
East Toronto
West Toronto
Central Toronto
Mississauga


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,North York,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,North York,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Downtown Toronto,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Downtown Toronto,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Downtown Toronto,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
...,...,...,...,...,...,...,...
209,Mississauga,43.657162,-79.378937,Starbucks,43.659080,-79.380562,Coffee Shop
210,Mississauga,43.657162,-79.378937,Shisha&Co,43.656748,-79.374337,Smoke Shop
211,Mississauga,43.657162,-79.378937,EB Games,43.655293,-79.380328,Video Game Store
212,Mississauga,43.657162,-79.378937,Good Earth Coffeehouse,43.656850,-79.374719,Coffee Shop


In [18]:
print(venues.shape)
venues.head()

(214, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,North York,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,North York,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Downtown Toronto,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Downtown Toronto,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Downtown Toronto,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


Grouping by neighborhood, we can see how many venues per neighborhood are returned

In [19]:
venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,11,11,11,11,11,11
Downtown Toronto,6,6,6,6,6,6
East Toronto,1,1,1,1,1,1
East York,32,32,32,32,32,32
Etobicoke,44,44,44,44,44,44
Mississauga,100,100,100,100,100,100
North York,2,2,2,2,2,2
Scarborough,14,14,14,14,14,14
West Toronto,4,4,4,4,4,4


In [20]:
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

There are 96 uniques categories.


### Analyze the venue category per neighborhood

In [21]:
onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Borough'] = venues['Borough']
fixed_columns = [onehot.columns[96]] + list(onehot.columns[:96]) + list(onehot.columns[97:])
onehot = onehot[fixed_columns]
onehot.head()

Unnamed: 0,Borough,Accessories Store,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,...,Steakhouse,Sushi Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Downtown Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Downtown Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Downtown Toronto,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
onehot.shape

(214, 97)

Next, let's group rows by borough and by taking the mean of the frequency of occurrence of each category

In [23]:
grouped = onehot.groupby('Borough').mean().reset_index()
grouped

Unnamed: 0,Borough,Accessories Store,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,...,Steakhouse,Sushi Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Downtown Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,East York,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.03125,...,0.0,0.0625,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125
4,Etobicoke,0.0,0.022727,0.022727,0.0,0.022727,0.0,0.068182,0.022727,0.0,...,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.022727
5,Mississauga,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,...,0.01,0.0,0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.0
6,North York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Scarborough,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0
8,West Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
grouped.shape

(9, 97)

In [25]:
num_top_venues = 5

for hood in grouped['Borough']:
    print("----"+hood+"----")
    temp = grouped[grouped['Borough'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
                  venue  freq
0           Pizza Place  0.18
1                  Bank  0.09
2              Pharmacy  0.09
3             Pet Store  0.09
4  Fast Food Restaurant  0.09


----Downtown Toronto----
               venue  freq
0       Hockey Arena  0.17
1        Coffee Shop  0.17
2       Intersection  0.17
3  French Restaurant  0.17
4        Pizza Place  0.17


----East Toronto----
                        venue  freq
0        Fast Food Restaurant   1.0
1           Accessories Store   0.0
2  Modern European Restaurant   0.0
3                    Pharmacy   0.0
4                   Pet Store   0.0


----East York----
              venue  freq
0       Coffee Shop  0.25
1  Sushi Restaurant  0.06
2             Diner  0.06
3    Sandwich Place  0.03
4               Gym  0.03


----Etobicoke----
            venue  freq
0     Coffee Shop  0.16
1            Park  0.07
2          Bakery  0.07
3             Pub  0.07
4  Breakfast Spot  0.05


----Mississauga----
      

Next, let's create a dataframe to display the top 10 most common venues.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
borough_venues_sorted = pd.DataFrame(columns=columns)
borough_venues_sorted['Borough'] = grouped['Borough']

for ind in np.arange(grouped.shape[0]):
    borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(grouped.iloc[ind, :], num_top_venues)

borough_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Pizza Place,Pharmacy,Bank,Gastropub,Intersection,Fast Food Restaurant,Pet Store,Breakfast Spot,Gym / Fitness Center,Athletics & Sports
1,Downtown Toronto,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,French Restaurant,Distribution Center,Comic Shop,Cosmetics Shop,Creperie
2,East Toronto,Fast Food Restaurant,Yoga Studio,Ethiopian Restaurant,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner,Discount Store
3,East York,Coffee Shop,Diner,Sushi Restaurant,College Auditorium,Italian Restaurant,Gym,Fried Chicken Joint,Distribution Center,Discount Store,Creperie
4,Etobicoke,Coffee Shop,Park,Bakery,Pub,Café,Breakfast Spot,Theater,Distribution Center,Performing Arts Venue,Cosmetics Shop


### Using kMeans to cluster neighborhoods

In [28]:
# set number of clusters
kclusters = 4

grouped_clustering = grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 0, 0, 0, 2, 0, 3], dtype=int32)

Let's visualize the data

In [29]:
# add clustering labels
borough_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
merged = merged.join(borough_venues_sorted.set_index('Borough'), on='Borough')

merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,2.0,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
1,M4A,North York,Victoria Village,43.725882,-79.315572,2.0,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,French Restaurant,Distribution Center,Comic Shop,Cosmetics Shop,Creperie
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2.0,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,French Restaurant,Distribution Center,Comic Shop,Cosmetics Shop,Creperie


Looks like there are some NaN in the `Cluster Labels` column. Let's drop those.

In [30]:
merged.dropna(axis=0, inplace=True)

Let's also convert the `Cluster Labels` column to integers.

In [31]:
merged['Cluster Labels'] = merged['Cluster Labels'].astype('int32')
merged.dtypes

PostalCode                 object
Borough                    object
Neighborhood               object
Latitude                  float64
Longitude                 float64
Cluster Labels              int32
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

In [32]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged['Latitude'], merged['Longitude'], merged['Borough'], merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examining the clusters

Cluster 1

In [33]:
merged.loc[merged['Cluster Labels'] == 0, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,French Restaurant,Distribution Center,Comic Shop,Cosmetics Shop,Creperie
4,Downtown Toronto,0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,French Restaurant,Distribution Center,Comic Shop,Cosmetics Shop,Creperie
5,Etobicoke,0,Coffee Shop,Park,Bakery,Pub,Café,Breakfast Spot,Theater,Distribution Center,Performing Arts Venue,Cosmetics Shop
6,Scarborough,0,Clothing Store,Accessories Store,Furniture / Home Store,Boutique,Event Space,Miscellaneous Shop,Coffee Shop,Vietnamese Restaurant,Athletics & Sports,Antique Shop
8,East York,0,Coffee Shop,Diner,Sushi Restaurant,College Auditorium,Italian Restaurant,Gym,Fried Chicken Joint,Distribution Center,Discount Store,Creperie
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,French Restaurant,Distribution Center,Comic Shop,Cosmetics Shop,Creperie
98,Etobicoke,0,Coffee Shop,Park,Bakery,Pub,Café,Breakfast Spot,Theater,Distribution Center,Performing Arts Venue,Cosmetics Shop
99,Downtown Toronto,0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,French Restaurant,Distribution Center,Comic Shop,Cosmetics Shop,Creperie
101,Etobicoke,0,Coffee Shop,Park,Bakery,Pub,Café,Breakfast Spot,Theater,Distribution Center,Performing Arts Venue,Cosmetics Shop


Cluster 2

In [34]:
merged.loc[merged['Cluster Labels'] == 1, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,East Toronto,1,Fast Food Restaurant,Yoga Studio,Ethiopian Restaurant,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner,Discount Store
41,East Toronto,1,Fast Food Restaurant,Yoga Studio,Ethiopian Restaurant,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner,Discount Store
47,East Toronto,1,Fast Food Restaurant,Yoga Studio,Ethiopian Restaurant,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner,Discount Store
54,East Toronto,1,Fast Food Restaurant,Yoga Studio,Ethiopian Restaurant,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner,Discount Store
100,East Toronto,1,Fast Food Restaurant,Yoga Studio,Ethiopian Restaurant,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner,Discount Store


Cluster 3

In [35]:
merged.loc[merged['Cluster Labels'] == 2, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
1,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
3,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
7,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
10,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
13,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
27,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
28,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
33,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner
34,North York,2,Park,Food & Drink Shop,Yoga Studio,Electronics Store,Comic Shop,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Diner


Cluster 4

In [36]:
merged.loc[merged['Cluster Labels'] == 3, merged.columns[[1] + list(range(5, merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,West Toronto,3,Gym,Japanese Restaurant,Caribbean Restaurant,Café,Yoga Studio,Electronics Store,Cosmetics Shop,Creperie,Department Store,Dessert Shop
37,West Toronto,3,Gym,Japanese Restaurant,Caribbean Restaurant,Café,Yoga Studio,Electronics Store,Cosmetics Shop,Creperie,Department Store,Dessert Shop
43,West Toronto,3,Gym,Japanese Restaurant,Caribbean Restaurant,Café,Yoga Studio,Electronics Store,Cosmetics Shop,Creperie,Department Store,Dessert Shop
69,West Toronto,3,Gym,Japanese Restaurant,Caribbean Restaurant,Café,Yoga Studio,Electronics Store,Cosmetics Shop,Creperie,Department Store,Dessert Shop
75,West Toronto,3,Gym,Japanese Restaurant,Caribbean Restaurant,Café,Yoga Studio,Electronics Store,Cosmetics Shop,Creperie,Department Store,Dessert Shop
81,West Toronto,3,Gym,Japanese Restaurant,Caribbean Restaurant,Café,Yoga Studio,Electronics Store,Cosmetics Shop,Creperie,Department Store,Dessert Shop
