## Neighborhoods Segmentation in Toronto ##

This code is used to segment neighborhoods in Toronto into clusters of similar characteristics. <br>
This capstone project is a part of the IBM data science certificate program on Coursera.

### Part I: Obtain Neighborhoods in Toronto ###
Step 1: Obtain raw data from wikipedia <br>
Step 2: Create DataFrame of Postal Code, Borough, and Neighborhood in Toronto <br>
Step 3: Clean and format the data<br>

In [60]:
#library installation section
import sys
!{sys.executable} -m pip install beautifulsoup4
!{sys.executable} -m pip install geopy
!{sys.executable} -m pip install lxml
!{sys.executable} -m pip install geocoder



In [61]:
#import librabries
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import json
import requests
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
import matplotlib.pyplot as plt
print('all libraries are imported')

all libraries are imported


**Step 1: Obtain raw data from wikipedia** 

In [62]:
#Scrape Postal code, Borough, and Neighborhood name in Toronto from Wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
result = requests.get(url).text
soup = BeautifulSoup(result, 'lxml')
print(soup)

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );</script>
<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":890001695,"wgRevisionId":890001695,"wgArticleId":539066,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","w

**Step 2: Create DataFrame of Postal Code, Borough, and Neighborhood in Toronto**

In [63]:
finalList = []
#get all rows
rows = soup.find_all('tr')
#for each row in rows, get each column
for row in rows:
    row_td = row.find_all('td')
    str_cells = str(row_td)
    cleantext = BeautifulSoup(str_cells, 'lxml').get_text() #clean html text
    #we know that postal code in toronto will start with letter M, so let's get only row with the data we want
    try:
        cleanlist = cleantext[1:-2].split(',') #get data into the list form
        if cleanlist[0][0] == 'M':
            #this is the row for a neighborhood in Toronto
            cleanlist = list(map(lambda x: str.strip(x), cleanlist)) #remove space bar
            finalList.append(cleanlist)
    except:
        pass

#convert finalList to a DataFrame
df_raw = pd.DataFrame(finalList)
df_raw.columns = ['PostalCode', 'Borough', 'Neighborhood'] #rename the columns
print(df_raw.shape)
df_raw.head()

(288, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


**Step 3: Clean and format the data**<br>
- Remove row without Borough name
- Use Borough name for row without Neighborhood name
- Merge row with the same Postal Code

In [64]:
#remove rows without Borough
boo = [not i for i in (df_raw['Borough'] == 'Not assigned')]
df_raw2 = df_raw[boo]
df_raw2.reset_index(drop = True, inplace = True)
df_raw2.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


In [65]:
#for rows without Neighborhood, use the Borough name
for i, row in df_raw2.iterrows():
    if row.Neighborhood == 'Not assigned':
        df_raw2.loc[i, 'Neighborhood'] = row.Borough

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


In [66]:
#merge Neighborhoods for row with the same Borough
df = pd.DataFrame(columns = ['PostalCode', 'Borough', 'Neighborhood'])

for i, row in df_raw2.iterrows():
    PostalCode = row.PostalCode
    Borough = row.Borough
    Neighborhood = row.Neighborhood
    if PostalCode in list(df.PostalCode):
        #if that borough is in the df already, add it to that row
        oldname = df['Neighborhood'][df['PostalCode'] == PostalCode]
        df['Neighborhood'][df['PostalCode'] == PostalCode] = oldname + ", " + Neighborhood
    else:
        df = df.append({'PostalCode':PostalCode, 'Borough':Borough, 'Neighborhood':Neighborhood},ignore_index = True)
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [67]:
df.shape

(103, 3)

### Part II: Obtain Latitude and Longitude ###

In [68]:
#add latitude, longitude
location = pd.read_csv('Geospatial_Coordinates.csv')
df = df.merge(location, how = 'left', left_on = 'PostalCode', right_on = 'Postal Code')
df.drop('Postal Code', inplace = True, axis = 1)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


In [69]:
#filter only Borough in Toronto
#toronto_data = df[list(map(lambda x: 'Toronto' in x, df.Borough))]
toronto_data = df
toronto_data.shape

(103, 5)

### Part III: Neighborhood Clustering ###
Step 1: Use FourSquare to explore each neighborhood <br>
Step 2: Use one hot encoding technique to obtain feature df for K means clustering <br>
Step 3: Clustering and making visualization<br>

**Step 1: Use FourSquare to explore each neighborhood**<br>
- Identify user credentials
- Use FourSquare API to send request url to obtain data regarding the popular venues in each neighborhood


In [70]:
CLIENT_ID = 'QWVZSL1DJ0IJFLJCXBBPCKRLEPRZSV5B0IC1M2QDJ0TNFQSF' # your Foursquare ID
CLIENT_SECRET = 'DHHRSP5XUVGOZLNR3E0Z3NARGVJ1YSJVGXUB4FO52J34F13F' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QWVZSL1DJ0IJFLJCXBBPCKRLEPRZSV5B0IC1M2QDJ0TNFQSF
CLIENT_SECRET:DHHRSP5XUVGOZLNR3E0Z3NARGVJ1YSJVGXUB4FO52J34F13F


In [71]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [72]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [73]:
# type your answer here
LIMIT  =100
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )



Parkwoods
Victoria Village
Harbourfront, Regent Park
Lawrence Heights, Lawrence Manor
Queen's Park
Islington Avenue
Rouge, Malvern
Don Mills North
Woodbine Gardens, Parkview Hill
Ryerson, Garden District
Glencairn
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Highland Creek, Rouge Hill, Port Union
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
East Birchmount Park, Ionview, Kennedy Park
Bayview Village
CFB Toronto, Downsview East
The D

In [74]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


**Step 2: Use one hot encoding technique to obtain feature df for K means clustering**<br>
- Use get_dummies function in pandas to get DataFrame containing dummy features of venue category in each neighborhood
- Group venues in the same neighborhood together to find the frequency of venues in each category

In [75]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues['Venue Category'])

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns)[:-1]
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head(10)

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [76]:
#groupby neighborhood to get frequency of each venue in each neighborhood
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.010000,0.000000,0.000000,0.000000,0.000000,0.010000,0.0,0.000000,0.000000
1,Agincourt,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
4,"Alderwood, Long Branch",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
5,"Bathurst Manor, Downsview North, Wilson Heights",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.058824,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
6,Bayview Village,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
7,"Bedford Park, Lawrence Manor East",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
8,Berczy Park,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.017857,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000
9,"Birch Cliff, Cliffside West",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000


To aid in exploring and creating summary from our information, we will rank top categories in each neighborhood

In [77]:
def return_most_common_venues(row, num):
    row_temp = row[1:].astype(float)
    row_temp_sorted = row_temp.sort_values(ascending = False)
    return row_temp_sorted.index.values[:num]

In [78]:
num = 10 #get only top 10 venue categories in each neighborhood
suffix = ['st', 'nd', 'rd']
columns_name = ['Neighborhood']
for i in range(num):
    try:
        columns_name.append('{}{} Most Common Venue'.format(i+1, suffix[i]))
    except:
        columns_name.append('{}th Most Common Venue'.format(i+1))
        
neighborhoods_venues_sorted = pd.DataFrame(columns = columns_name)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']
for i,row in toronto_grouped.iterrows():
    neighborhoods_venues_sorted.iloc[i, 1:]=return_most_common_venues(row, num)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,American Restaurant,Steakhouse,Sushi Restaurant,Hotel,Bakery,Bar,Burger Joint
1,Agincourt,Lounge,Breakfast Spot,Sandwich Place,Skating Rink,Chinese Restaurant,Drugstore,Dumpling Restaurant,Donut Shop,Doner Restaurant,Deli / Bodega
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Women's Store,Dive Bar,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Fast Food Restaurant,Pharmacy,Pizza Place,Fried Chicken Joint,Beer Store,Sandwich Place,Discount Store,Department Store,Dessert Shop
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Gym,Dance Studio,Pharmacy,Sandwich Place,Skating Rink,Pub,Dim Sum Restaurant,Deli / Bodega


**Step 3: Clustering and making visualization**<br>
- Using KMeans from sklearn.cluster to cluster neighborhoods into 4 groups
- Creat visualization using folium library
- Explore characteristics of each group

In [79]:
#import libraries for clustering
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', axis = 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 2, 0, 0, 0, 0, 0, 0, 0])

In [80]:
neighborhoods_venues_sorted['Cluster Labels'] = kmeans.labels_

toronto_final = df.merge(neighborhoods_venues_sorted, on = 'Neighborhood', how = 'inner')

toronto_final.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M3A,North York,Parkwoods,43.753259,-79.329656,Park,Fast Food Restaurant,Food & Drink Shop,Women's Store,Dive Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,3
1,M4A,North York,Victoria Village,43.725882,-79.315572,Portuguese Restaurant,Intersection,Coffee Shop,Hockey Arena,Dive Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,0
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,Coffee Shop,Pub,Bakery,Park,Café,Theater,Breakfast Spot,Mexican Restaurant,Farmers Market,Restaurant,0
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Miscellaneous Shop,Vietnamese Restaurant,Coffee Shop,Gift Shop,Event Space,Shoe Store,0
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,Coffee Shop,Diner,Gym,Japanese Restaurant,Yoga Studio,Café,Liquor Store,Sushi Restaurant,Mexican Restaurant,Portuguese Restaurant,0


In [81]:
latitude = 43.662744
longitude = -79.321558
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_final['Latitude'], toronto_final['Longitude'], toronto_final['Neighborhood'], toronto_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

From the cluster details, we found that most neighborhoods in the area fall into the first group (0th). </br>

Let's explore them in more details

### Group 0 ###
The areas in this group are majorily dominated by Cafe/ Coffee Shop 

In [84]:
toronto_final[toronto_final['Cluster Labels'] == 0].head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
1,M4A,North York,Victoria Village,43.725882,-79.315572,Portuguese Restaurant,Intersection,Coffee Shop,Hockey Arena,Dive Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,0
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,Coffee Shop,Pub,Bakery,Park,Café,Theater,Breakfast Spot,Mexican Restaurant,Farmers Market,Restaurant,0
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Miscellaneous Shop,Vietnamese Restaurant,Coffee Shop,Gift Shop,Event Space,Shoe Store,0
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,Coffee Shop,Diner,Gym,Japanese Restaurant,Yoga Studio,Café,Liquor Store,Sushi Restaurant,Mexican Restaurant,Portuguese Restaurant,0
6,M3B,North York,Don Mills North,43.745906,-79.352188,Gym / Fitness Center,Baseball Field,Café,Japanese Restaurant,Caribbean Restaurant,Basketball Court,Diner,Discount Store,Dive Bar,Dog Run,0
7,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,Fast Food Restaurant,Pizza Place,Gym / Fitness Center,Gastropub,Breakfast Spot,Bank,Intersection,Rock Climbing Spot,Athletics & Sports,Pharmacy,0
8,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Middle Eastern Restaurant,Thai Restaurant,Bookstore,Ramen Restaurant,Plaza,Bubble Tea Shop,0
9,M6B,North York,Glencairn,43.709577,-79.445073,Asian Restaurant,Japanese Restaurant,Bakery,Hookah Bar,Pub,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,0
11,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Bar,Golf Course,Women's Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,0
12,M3C,North York,"Flemingdon Park, Don Mills South",43.7259,-79.340923,Beer Store,Gym,Coffee Shop,Asian Restaurant,Fast Food Restaurant,Dim Sum Restaurant,Italian Restaurant,Japanese Restaurant,Discount Store,Sporting Goods Shop,0


And as ones can guess, the rest three neighborhoods have much fewer cafe.

In [85]:
toronto_final[toronto_final['Cluster Labels'].isin([1,2,3])].head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M3A,North York,Parkwoods,43.753259,-79.329656,Park,Fast Food Restaurant,Food & Drink Shop,Women's Store,Dive Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,3
5,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,Fast Food Restaurant,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doner Restaurant,3
10,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724,Bank,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doner Restaurant,Deli / Bodega,2
20,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512,Park,Women's Store,Fast Food Restaurant,Market,Pharmacy,Gift Shop,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,2
34,M4J,East York,East Toronto,43.685347,-79.338106,Park,Convenience Store,Metro Station,Women's Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,2
55,M9M,North York,"Emery, Humberlea",43.724766,-79.532242,Baseball Field,Construction & Landscaping,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Women's Store,1
59,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,Park,Bus Line,Swim School,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dance Studio,2
63,M2P,North York,York Mills West,43.752758,-79.400049,Park,Bank,Women's Store,Dive Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,2
74,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724,Pizza Place,Park,Women's Store,Dive Bar,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,2
82,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Park,Playground,Women's Store,Dive Bar,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,2
