<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto - Peer Graded Assignment</font></h1>
<h1 align=center><font size = 3>Aakash Vasudevan</font>


## Introduction
In this assignment, we will explore, segment, and cluster the neighborhoods in the city of Toronto. For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. We will scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format.


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

0.  <a href="#item1">Install and Import all dependencies</a>

1.  <a href="#item1">Web Scrape the Toronto Neighborhood Dataset</a>

2.  <a href="#item2">Add Latitude and Longitude coordinates</a>

3.  <a href="#item3">Explore, Analyze, Cluster and Visualize</a>
    
    </font>
    </div>


## 0. Install and Import all dependencies

In [1]:
# Import all libraries
#!pip install numpy
import numpy as np # library to handle data in a vectorized manner

#!pip install pandas
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#!pip install pgeocode
import pgeocode

# Matplotlib and associated plotting modules
#!pip install -U matplotlib
#!pip install seaborn
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# import k-means from clustering stage
#!pip install -U scikit-learn
from sklearn.cluster import KMeans

#!pip install folium
import folium # map rendering library

# Import Beautiful Soup package for web scraping
#!pip install beautifulsoup4
#!pip install lxml
from urllib.request import urlopen
from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


## 1. Web Scrape the Toronto Neighborhood Dataset

In [2]:
# Define the wikipedia url
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html = urlopen(url)

Using the Beautiful Soup Package to parse the url and scrape contents

In [3]:
# Create Beautiful Soup Object
soup = BeautifulSoup(html, 'html.parser')
type(soup)

bs4.BeautifulSoup

In [4]:
# Scrape each row in table and store in list_rows
list_rows = []

for tr in soup.table('tr')[1:]:
    for tag in tr(['span', 'sup']):
        tag.decompose()
    list_rows.append([ td.text for td in tr('td') ])


In [5]:
# Store in a Data Frame

df = pd.DataFrame(list_rows)

df.columns = ['PostalCode','Borough','Neighborhood']
df.drop(df[df.Borough == 'Not assigned\n'].index,inplace = True) # Drop all tows that have "Not assigned" boroughs
df.replace({'\n' : ''}, inplace = True, regex = True) # Clean up the cells
df.reset_index(drop = True,inplace = True)

df.head(11)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [6]:
# Print the shape of the Data Frame
print('Data Frame has {} rows and {} columns'.format(df.shape[0],df.shape[1]))

Data Frame has 103 rows and 3 columns


## 2. Add Latitude and Longitude coordinates

Using the pgeocoder library to retrieve latitude and longitude for each postal code

In [7]:
nomi = pgeocode.Nominatim('ca') #Create Object and Initialize country to "Canada"
lat_lng = nomi.query_postal_code(df[['PostalCode']].to_numpy()) #Obtain latitudes and longitudes

# Store in Data Frame
df['Latitude'] = lat_lng['latitude']
df['Longitude'] = lat_lng['longitude']
df.head()


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889


In [8]:
# Check for any NaN values for Latitude and Longitude
df_NaN = df[df.isna().any(axis = 1)]
df_NaN


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
76,M7R,Mississauga,Canada Post Gateway Processing Centre,,


Looks like row 76 does not have a valid Latitude and Longitude. This is likely an issue with the pgeocoder library. Since we have only one problem row, we will manually obtain the latitude and longitude from Google and replace the NaN.

In [9]:
# Obtain Latitude and Longitude from Google
M7R_Latitude = 43.6370
M7R_Longitude = -79.6158

# Replace NaN with the correct latitude and longitude values
df['Latitude'] = df['Latitude'].replace(np.nan,M7R_Latitude)
df['Longitude'] = df['Longitude'].replace(np.nan,M7R_Longitude)

df.isnull().values.any() # Check to make sure there aren't any other null values

False

Since we will be grouping the dataset by the Neighbourhood column later on for the clustering algorithm, we need to ensure that there aren't any duplicate neighborhoods.

In [10]:
# Check for duplicate neighborhoods
df_dup = df[df['Neighborhood'].duplicated()]

# Modify the four duplicate neighborhoods to make them unique
Replace_Duplicates = ['Don Mills (M3C)' , 'Downsview (M3L)' , 'Downsview (M3M)' , 'Downsview (M3N)']

# Replace the duplicate neighborhoods in the Data Frame
i = 0
for ind in df_dup.index:
    df.loc[ind,'Neighborhood'] = Replace_Duplicates[i]
    i = i + 1


## 3. Explore, Analyze, Cluster and Visualize

Let's visualize all the neighborhoods in Toronto on a map

In [11]:
# create map of Toronto using latitude and longitude values
latitude = df.loc[0,'Latitude']
longitude = df.loc[0,'Longitude']

map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

<b><font size = 3>For the neighborhood segmentation, we will use the Foursquare API to generate the most popular venues near each neighborhood. The neighborhoods that have "similar" venues will be clustered together under a label.</b>

In [33]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [13]:
# Function to return a Data Frame of all venues retrieved from the Foursquare API near a given latitude and longitude location. Radius is set to 500m by default.
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
# Pass all neighborhoods in Toronto to the getNearbyVenues function and store the resulting dataframe containing all the venues within 500m of the neighborhood in Toronto_venues
Toronto_venues = getNearbyVenues(df['Neighborhood'] , df['Latitude'] , df['Longitude'])

Toronto_venues.head()

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills (M3C)
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East B

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.7545,-79.33,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.7545,-79.33,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.7276,-79.3148,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.7276,-79.3148,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.7276,-79.3148,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [15]:
# Drop any neighborhoods that did not retrieve any venues from the Foursquare API
ex = np.setxor1d(venues_Neigh , df_Neigh)
i, = np.nonzero(np.in1d(df_Neigh, ex))

df.drop(i,axis = 0,inplace = True)
df.reset_index(drop = True)
df.shape # Ensure the no. of rows of the dataframe matches the venues dataframe


(101, 5)

In [16]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 260 uniques categories.


We will now encode all the categories for each neighborhood using one-hot encoding. The result is a new dataframe containing all the categories as separate columns with either a "1" indicating that category is in proximity to the neighborhood or "0" otherwise.

In [18]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighborhood'] = Toronto_venues['Neighborhood']

# move neighborhood column to the first column
cols = Toronto_onehot.columns.tolist()
old_index = cols.index('Neighborhood')
cols.insert(0,cols.pop(old_index))

Toronto_onehot = Toronto_onehot[cols]

Toronto_onehot.shape

(2174, 260)

In [20]:
# Group by neighborhood and average
Toronto_grouped = Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Toronto_grouped.shape

(101, 260)

In [22]:
# Function to return a set number of most common venues near a neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Toronto_grouped['Neighborhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.shape


(101, 11)

We are finally ready to implement the K-Means clustering algorithm on the Toronto_grouped Data Frame to obtain the clusters. We will pick k = 5 as the number of clusters for the segmentation.

In [24]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, init = 'k-means++', n_init = 12, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 3, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 2,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 4, 0, 0, 0, 0, 0, 2, 0])

In [25]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted['Cluster Labels'] = kmeans.labels_
Toronto_merged = df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M3A,North York,Parkwoods,43.7545,-79.33,Food & Drink Shop,Park,Yoga Studio,Eastern European Restaurant,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,2
1,M4A,North York,Victoria Village,43.7276,-79.3148,Hockey Arena,Pizza Place,Coffee Shop,Intersection,Portuguese Restaurant,French Restaurant,Park,Yoga Studio,Falafel Restaurant,Electronics Store,0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,Coffee Shop,Breakfast Spot,Yoga Studio,Gym / Fitness Center,Restaurant,Event Space,Bakery,Thai Restaurant,Theater,Pub,0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504,Clothing Store,Coffee Shop,Restaurant,Women's Store,Cosmetics Shop,Men's Store,Food Court,Sandwich Place,Sushi Restaurant,Bakery,0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,Coffee Shop,Gym,Escape Room,Ethiopian Restaurant,Restaurant,Sushi Restaurant,Beer Bar,Ramen Restaurant,Café,Bubble Tea Shop,0


In [26]:
# Check to make sure that the Toronto_merged Data Frame doesn't contain any null values
df_NaN = df[Toronto_merged.isnull().any(axis = 1)]
df_NaN

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude


<b><font size = 3> Visualize the clusters on the Toronto map </b>

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighborhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [28]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
1,North York,Hockey Arena,Pizza Place,Coffee Shop,Intersection,Portuguese Restaurant,French Restaurant,Park,Yoga Studio,Falafel Restaurant,Electronics Store,0
2,Downtown Toronto,Coffee Shop,Breakfast Spot,Yoga Studio,Gym / Fitness Center,Restaurant,Event Space,Bakery,Thai Restaurant,Theater,Pub,0
3,North York,Clothing Store,Coffee Shop,Restaurant,Women's Store,Cosmetics Shop,Men's Store,Food Court,Sandwich Place,Sushi Restaurant,Bakery,0
4,Downtown Toronto,Coffee Shop,Gym,Escape Room,Ethiopian Restaurant,Restaurant,Sushi Restaurant,Beer Bar,Ramen Restaurant,Café,Bubble Tea Shop,0
5,Etobicoke,Pharmacy,Bank,Grocery Store,Park,Skating Rink,Farmers Market,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,0
7,North York,Pool,Gym,Gym / Fitness Center,Park,Yoga Studio,Falafel Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,0
8,East York,Pizza Place,Breakfast Spot,Bus Line,Pharmacy,Pet Store,Gastropub,Intersection,Bank,Gym / Fitness Center,Farmers Market,0
9,Downtown Toronto,Coffee Shop,Clothing Store,Café,Japanese Restaurant,Cosmetics Shop,Theater,Pizza Place,Ramen Restaurant,Italian Restaurant,Fast Food Restaurant,0
10,North York,Pizza Place,Japanese Restaurant,Mediterranean Restaurant,Ice Cream Shop,Gas Station,Fast Food Restaurant,Bakery,Latin American Restaurant,Rental Car Location,Grocery Store,0
11,Etobicoke,Pizza Place,Tea Room,Sandwich Place,Chinese Restaurant,Print Shop,Construction & Landscaping,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant,0


In [29]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
6,Scarborough,Home Service,Yoga Studio,Eastern European Restaurant,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,1


In [30]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,North York,Food & Drink Shop,Park,Yoga Studio,Eastern European Restaurant,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,2
27,North York,Residential Building (Apartment / Condo),Park,Yoga Studio,Farmers Market,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Fast Food Restaurant,2
35,East York,Park,Convenience Store,Intersection,Field,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,2
61,Central Toronto,Park,Photography Studio,Fast Food Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Yoga Studio,2
64,York,Park,Yoga Studio,Eastern European Restaurant,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,2
66,North York,Park,Convenience Store,Field,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Yoga Studio,2
69,West Toronto,Bowling Alley,Residential Building (Apartment / Condo),Park,Yoga Studio,Farmers Market,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Field,2


In [31]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
49,North York,Bakery,Eastern European Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,3
65,Scarborough,Bakery,Asian Restaurant,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Eastern European Restaurant,Field,3


In [32]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
71,Scarborough,Auto Garage,Yoga Studio,Fast Food Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Field,Eastern European Restaurant,4


The results of the segmentation yields some interesting patterns in the clusters. For example, it is obvious that neighborhoods with a park close by are clustered together under cluster 2. The two neighborhoods in Cluster 3 share a lot of common venues. Cluster 4 and Cluster 1 can be interpreted as anomalies that have a pretty distinct combination of venues. Finally cluster 0 has the most neighborhoods that have somewhat similar neighborhoods but it is possible that cluster 0 has some internal structure that can be further decomposed. 