# Segmenting & Clustering Neighborhoods in Toronto

### Introduction

#### This notebook has steps to explore, segment and cluster the neighborhoods in the city of Toronto. There are steps to get the raw html data from web, scrape through the data and extract the required details. Once the details are available, 

###  1. Import all required libraries

In [1]:
# Import the required libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup as Soup

### 2. Get HTML data into dataframe

##### This step will go out to the website and get the raw HTML code which will then be converted to readable format and eventually to a dataframe

In [2]:
# Initialize the path and get data
web_path = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
html_source = requests.get(web_path).text

#Use lxml parser
html_data = Soup(html_source,'lxml')

#Find all the table entries from the html source and create table list
table = html_data.find_all('table', class_="wikitable sortable")
table_ls = pd.read_html(str(table))


# Create Dataframe with the columns from the html source
columns = [table_ls[0][0][0],table_ls[0][1][0],table_ls[0][2][0]]
table_df = pd.DataFrame(columns = columns)

#Loop through the table list and append each entry to the dataframe
for i in range(len(table_ls[0])-1):
    data = {table_ls[0][0][0]:table_ls[0][0][i+1],table_ls[0][1][0]:table_ls[0][1][i+1],table_ls[0][2][0]:table_ls[0][2][i+1]}
    table_df = table_df.append(data,ignore_index = True)

table_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### 3. Pre-process the data

##### This step will remove the rows that has Borough value set as "Not Assigned". The duplicate rows in the dataframe are flagged using the _Is Duplicate_ value. Move the non-duplicate values to the final data frame

In [3]:
#Drop the rows that has Borough = Not Assigned
drop_idx = table_df[table_df["Borough"] == "Not assigned"].index
table_df.drop(drop_idx,inplace = True)
table_df.reset_index(drop=True,inplace = True)
table_df["Is_Duplicate"] = table_df.duplicated(subset = "Postcode",keep=False) 

#Create dataframe with only the duplicate values
table_df2 = table_df[table_df["Is_Duplicate"]==True]
table_df2.reset_index(drop=True,inplace = True)
table_df2.set_index("Postcode")
table_df2.sort_index(inplace=True)

#Move non-duplicates to final dataframe
neigh_df = table_df[table_df["Is_Duplicate"] == False ]
neigh_df.reset_index(drop=True,inplace = True)
neigh_df.set_index("Postcode")
neigh_df.sort_index(inplace=True)
neigh_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Is_Duplicate
0,M3A,North York,Parkwoods,False
1,M4A,North York,Victoria Village,False
2,M7A,Queen's Park,Not assigned,False
3,M9A,Etobicoke,Islington Avenue,False
4,M3B,North York,Don Mills North,False


### 4. Process duplicates and combine values

##### This step will loop through the duplicates dataframe and combine the _Neighbourhood_ values. The combined values along with _Postcode_ and _Borough_ are appended to the final dataframe. 

In [4]:
#Get the values of the first row
code = table_df2.iloc[0]["Postcode"]
hood = table_df2.iloc[0]["Neighbourhood"]

#Loop through the data frame and combine neighbourhood values
for i in range(1,len(table_df2)):

    if (table_df2.iloc[i]["Postcode"] == str(code)):
        hood = hood + ',' + str(table_df2.iloc[i]["Neighbourhood"]) #Combine values for same postcode.
    else:
        #Set the values and append to dataframe
        code = str(table_df2.iloc[(i-1)]["Postcode"]) 
        boro = str(table_df2.iloc[(i-1)]["Borough"])
        dup = str(table_df2.iloc[(i-1)]["Is_Duplicate"])
        neigh_df =neigh_df.append({"Postcode":code,"Borough":boro,"Neighbourhood":hood,"Is_Duplicate":dup},ignore_index=True)
        
        #Move next postcode values
        hood = table_df2.iloc[i]["Neighbourhood"]
        code = table_df2.iloc[i]["Postcode"]

#Append last record to dataframe
code = str(table_df2.iloc[(i-1)]["Postcode"])
boro = str(table_df2.iloc[(i-1)]["Borough"])
dup = str(table_df2.iloc[(i-1)]["Is_Duplicate"])
neigh_df =neigh_df.append({"Postcode":code,"Borough":boro,"Neighbourhood":hood,"Is_Duplicate":dup},ignore_index=True)

neigh_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Is_Duplicate
0,M3A,North York,Parkwoods,False
1,M4A,North York,Victoria Village,False
2,M7A,Queen's Park,Not assigned,False
3,M9A,Etobicoke,Islington Avenue,False
4,M3B,North York,Don Mills North,False


### 5. Final cleanup

##### Now the dataframe should have the final result. Reset the index, drop the _Is Duplicate_ column and display the result.

In [5]:
#Set/Reset indexes
neigh_df.reset_index(drop=True,inplace = True)
neigh_df.set_index("Postcode")

#Drop Duplicate flag column
del neigh_df["Is_Duplicate"]
neigh_df.sort_index(inplace=True)

neigh_df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M7A,Queen's Park,Not assigned
3,M9A,Etobicoke,Islington Avenue
4,M3B,North York,Don Mills North
5,M6B,North York,Glencairn
6,M4C,East York,Woodbine Heights
7,M5C,Downtown Toronto,St. James Town
8,M6C,York,Humewood-Cedarvale
9,M4E,East Toronto,The Beaches


### 6. Display result size

In [6]:
neigh_df.shape

(103, 3)

### 7. Add location coordinates to the dataframe

##### Get the Latitude and Longitude for every postcode & merge that with the Results dataframe from above. The geo coordinates from the csv file will be used.
##### _Postcode_ column will be used as the key to merge both dataframes.

In [128]:
# Initialize path
geo_path = "http://cocl.us/Geospatial_data"
geo_df = pd.read_csv(geo_path) #Read csv to dataframe

#Merge the data
merge_df = neigh_df.merge(geo_df,left_on ="Postcode",right_on="Postal Code")
del merge_df["Postal Code"]
merge_df.sort_index(inplace=True)
merge_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M7A,Queen's Park,Not assigned,43.662301,-79.389494
3,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
4,M3B,North York,Don Mills North,43.745906,-79.352188


## Toronto Neighbourhood Analysis

##### In this section, data from above will be used to perform analysis of places around Toronto.

In [141]:
#Import additional libraries
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import json
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
%matplotlib inline
from pandas.io.json import json_normalize

### 1. Process one borough

##### Get the count of postal codes by _Borough_ and use the one that has maximum count.

In [179]:
#Find the count of postal codes under each borough
print(merge_df.groupby(["Borough"])["Postcode"].count())

Borough
Central Toronto      9
Downtown Toronto    18
East Toronto         5
East York            5
Etobicoke           12
Mississauga          1
North York          24
Queen's Park         1
Scarborough         17
West Toronto         6
York                 5
Name: Postcode, dtype: int64


##### _North York_ has the most postal codes. We will use this for further analysis.

In [133]:
#Create dataframe for North York data
northyork_data = merge_df[merge_df['Borough'] == 'North York'].reset_index(drop=True)
northyork_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M3B,North York,Don Mills North,43.745906,-79.352188
3,M6B,North York,Glencairn,43.709577,-79.445073
4,M2H,North York,Hillcrest Village,43.803762,-79.363452


### 2. Map the Borough

##### Get the coordinates of _North York_ and create a map highlighting the postal codes.

In [136]:
#Get the latitude and longitude
boro = 'North York, Canada'
geolocator = Nominatim(user_agent="ny_explorer")
lat_lon = geolocator.geocode(boro)
lat = lat_lon.latitude
lon = lat_lon.longitude
#longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(lat, lon))

The geograpical coordinate of North York are 43.7709163, -79.4124102.


In [137]:
# create map of North York using latitude and longitude values
map_northyork = folium.Map(location=[lat, lon], zoom_start=12)

# add markers to map
for lat, lng, label in zip(northyork_data['Latitude'], northyork_data['Longitude'], northyork_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_northyork)  
    
map_northyork

### 3. Explore Neighborhoods of _North York_

##### Here we will connect to Foursquare and get details of neighborhoods and venues in the neighborhoods of _North York_. Geo data of _North York_ will be passed to Foursquare for getting details. 

In [139]:
#Set the Client ID details 
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

##### Define functions for getting all venues around _North York_ and venue category type

In [142]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [143]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

##### Use latitue & longitude of _North York_ to get all venue names around that neighbourhood

In [144]:
northyork_venues = getNearbyVenues(names=northyork_data['Neighbourhood'],
                                   latitudes=northyork_data['Latitude'],
                                   longitudes=northyork_data['Longitude']
                                  )

Parkwoods
Victoria Village
Don Mills North
Glencairn
Hillcrest Village
Bayview Village
Downsview West
Humber Summit
Downsview Central
Willowdale South
Downsview Northwest
York Mills West
Willowdale West
Lawrence Heights,Lawrence Manor
Flemingdon Park,Don Mills South
Bathurst Manor,Downsview North,Wilson Heights
Fairview,Henry Farm,Oriole
Northwood Park,York University
CFB Toronto,Downsview East
Silver Hills,York Mills
Maple Leaf Park,North Park,Upwood Park
Newtonbrook,Willowdale
Bedford Park,Lawrence Manor East
Emery,Humberlea


### 4. Pre-process & Analyze Neighborhood data

##### The information received from Foursquare will require some cleaning in order to perform analysis and cluster them.

In [145]:
#Check how many venues are there and print the unique categories()
print(northyork_venues.shape)
print('There are {} uniques categories.'.format(len(northyork_venues['Venue Category'].unique())))

(233, 7)
There are 106 uniques categories.


##### Convert the Venue information to numeric and create a new dataframe with Neighborhood and Venue information.

In [146]:
# one hot encoding
northyork_onehot = pd.get_dummies(northyork_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
northyork_onehot['Neighborhood'] = northyork_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [northyork_onehot.columns[-1]] + list(northyork_onehot.columns[:-1])
northyork_onehot = northyork_onehot[fixed_columns]

northyork_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,...,Tailor Shop,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


##### Group the data by neighbourhood and use the mean to represent the precense of venue in a neighbourhood.

In [147]:
northyork_grouped = northyork_onehot.groupby('Neighborhood').mean().reset_index()
northyork_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,...,Tailor Shop,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,"Bathurst Manor,Downsview North,Wilson Heights",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,...,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park,Lawrence Manor East",0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CFB Toronto,Downsview East",0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Create a dataframe to hold the top 10 venues of each neighbourhood

In [148]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [171]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = northyork_grouped['Neighborhood']

for ind in np.arange(northyork_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(northyork_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor,Downsview North,Wilson Heights",Coffee Shop,Diner,Bank,Pharmacy,Pizza Place,Bridal Shop,Deli / Bodega,Restaurant,Sandwich Place,Shopping Mall
1,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
2,"Bedford Park,Lawrence Manor East",Coffee Shop,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,Comfort Food Restaurant,Liquor Store,Café,Butcher,Juice Bar,Pharmacy
3,"CFB Toronto,Downsview East",Electronics Store,Airport,Bus Stop,Park,Discount Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
4,Don Mills North,Japanese Restaurant,Caribbean Restaurant,Gym / Fitness Center,Café,Basketball Court,Electronics Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
5,Downsview Central,Home Service,Business Service,Baseball Field,Korean Restaurant,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
6,Downsview Northwest,Grocery Store,Athletics & Sports,Liquor Store,Discount Store,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
7,Downsview West,Grocery Store,Bank,Shopping Mall,Moving Target,Furniture / Home Store,Frozen Yogurt Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
8,"Emery,Humberlea",Baseball Field,Women's Store,Electronics Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store
9,"Fairview,Henry Farm,Oriole",Clothing Store,Fast Food Restaurant,Coffee Shop,Women's Store,Asian Restaurant,Food Court,Bakery,Shoe Store,Electronics Store,Toy / Game Store


### 5. Clustering the data

##### Use _kMeans_ clustering and cluster the neighbourhood data

In [172]:
# set number of clusters
kclusters = 4
northyork_grouped_clustering = northyork_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(northyork_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0])

##### Add Clustering data and merge the Venue & Postal code information. Drop any rows that does not have a cluster label.

In [173]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
northyork_merged = northyork_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
northyork_merged = northyork_merged.join(neighborhoods_venues_sorted)
northyork_merged.dropna(axis = "rows",inplace = True) #Drop any NaN
northyork_merged.head() 

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,"Bathurst Manor,Downsview North,Wilson Heights",Coffee Shop,Diner,Bank,Pharmacy,Pizza Place,Bridal Shop,Deli / Bodega,Restaurant,Sandwich Place,Shopping Mall
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
2,M3B,North York,Don Mills North,43.745906,-79.352188,0.0,"Bedford Park,Lawrence Manor East",Coffee Shop,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,Comfort Food Restaurant,Liquor Store,Café,Butcher,Juice Bar,Pharmacy
3,M6B,North York,Glencairn,43.709577,-79.445073,0.0,"CFB Toronto,Downsview East",Electronics Store,Airport,Bus Stop,Park,Discount Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
4,M2H,North York,Hillcrest Village,43.803762,-79.363452,0.0,Don Mills North,Japanese Restaurant,Caribbean Restaurant,Gym / Fitness Center,Café,Basketball Court,Electronics Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega


##### Create a map with the clusters

In [174]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(northyork_merged['Latitude'], northyork_merged['Longitude'], northyork_merged['Neighborhood'], northyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 6. Examine and Name the clusters

##### Review the data grouped under each cluster and determine whether there is any common attribute to provide the cluster a name

##### Cluster 1

In [175]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 0, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0.0,"Bathurst Manor,Downsview North,Wilson Heights",Coffee Shop,Diner,Bank,Pharmacy,Pizza Place,Bridal Shop,Deli / Bodega,Restaurant,Sandwich Place,Shopping Mall
1,North York,0.0,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
2,North York,0.0,"Bedford Park,Lawrence Manor East",Coffee Shop,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,Comfort Food Restaurant,Liquor Store,Café,Butcher,Juice Bar,Pharmacy
3,North York,0.0,"CFB Toronto,Downsview East",Electronics Store,Airport,Bus Stop,Park,Discount Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
4,North York,0.0,Don Mills North,Japanese Restaurant,Caribbean Restaurant,Gym / Fitness Center,Café,Basketball Court,Electronics Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
5,North York,0.0,Downsview Central,Home Service,Business Service,Baseball Field,Korean Restaurant,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
6,North York,0.0,Downsview Northwest,Grocery Store,Athletics & Sports,Liquor Store,Discount Store,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
7,North York,0.0,Downsview West,Grocery Store,Bank,Shopping Mall,Moving Target,Furniture / Home Store,Frozen Yogurt Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
9,North York,0.0,"Fairview,Henry Farm,Oriole",Clothing Store,Fast Food Restaurant,Coffee Shop,Women's Store,Asian Restaurant,Food Court,Bakery,Shoe Store,Electronics Store,Toy / Game Store
10,North York,0.0,"Flemingdon Park,Don Mills South",Beer Store,Gym,Asian Restaurant,Coffee Shop,Grocery Store,Fast Food Restaurant,Italian Restaurant,Japanese Restaurant,Dim Sum Restaurant,Clothing Store


##### Cluster 2

In [176]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 1, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,North York,1.0,"Emery,Humberlea",Baseball Field,Women's Store,Electronics Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store


##### Cluster 3

In [177]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 2, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,North York,2.0,Humber Summit,Empanada Restaurant,Women's Store,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store


##### Cluster 4

In [178]:
northyork_merged.loc[northyork_merged['Cluster Labels'] == 3, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,North York,3.0,"Newtonbrook,Willowdale",Gym,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant


### Conclusion

##### Based on the clustering data, it looks like most of the neighbourhoods fall under _Cluster 1_. All the other three clusters only have one neighbourhood associated. We can probably have only two groupings here:
#####     1) Cluster 1 - HIGH ACTIVITY NEIGHBORHOOD
#####     2) Clusters 2,3 & 4 - LOW ACTIVITY NEIGHBORHOOD