# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

__I Wayan Nadiantara__

#### Introduction

In this assignment, it required to explore, segment, and cluster the neighborhoods in the city of Toronto.  For the Toronto neighborhood data, a Wikipedia page exists that has all the information to explore and cluster the neighborhoods in Toronto. This notebook contain method about how to scrape data from Wikipedia page, wrangle it, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset in previous lab sesion.


## Table of Contents

1. <a href="#item1">Importing Dependencies and Web Scrapping</a>
2. <a href="#item2">Data Cleaning and Wrangling</a>  
3. <a href="#item3">Clustering and Visualization</a>  

## 1. Importing Dependencies and Web Scrapping

In [2]:
# installing module
!conda install -c conda-forge geopy --yes 
!pip install beautifulsoup4
!pip install lxml
!conda install -c conda-forge folium=0.5.0 --yes
print("module installed....")

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

module installed....


In [136]:
# importing dependencies
# basic module
import requests # handling requests
import pandas as pd # data management
import numpy as np # python array 
import random # random number generator

# geopy module
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# visualization module 
from IPython.display import Image 
from IPython.core.display import HTML 
from IPython.display import display_html
    
# data management
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors


import folium 
from bs4 import BeautifulSoup

print('Libraries successfully imported.')

Libraries successfully imported.


### Web scraping 
__Data description:__ This data containing a list of postal codes in Canada where the first letter is M. Postal codes beginning with M are located within the city of Toronto in the province of Ontario. Only the first three characters are listed, corresponding to the Forward Sortation Area.


__Data source:__ https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M 

In [137]:
print("Here the raw table before cleaning...")
src = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup_src =BeautifulSoup(src,'lxml')
tab_raw = str(soup_src.table)
display_html(tab_raw,raw=True)


Here the raw table before cleaning...


Postal Code,Borough,Neighbourhood
M1A,Not assigned,Not assigned
M2A,Not assigned,Not assigned
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
M8A,Not assigned,Not assigned
M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
M1B,Scarborough,"Malvern, Rouge"


## 2. Data Cleaning and Wrangling

In [138]:
# converting table to pandas dataframe
toronto_df_raw = pd.read_html(tab_raw)
toronto_df = toronto_df_raw[0]
toronto_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


In [139]:
# Data cleaning

# Dropping rows that has 'Not assigned' value at "Bourough" column
indexNull = toronto_df[ toronto_df['Borough'] == 'Not assigned' ].index
toronto_df.drop(indexNull , inplace=True)
toronto_df.reset_index(drop =True, inplace=True)
toronto_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### Joins data with Toronto coordinates regarding the postal code
Data description: Table containing certain coordinates in Toronto area (lattitude and longitude) based on its postal code
Data source: http://cocl.us/Geospatial_data

In [140]:
# importing
torontoloc_df = pd.read_csv('https://cocl.us/Geospatial_data')

# joins table
toronto_new_df = pd.merge(toronto_df,torontoloc_df, on='Postal Code')
toronto_new_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


## 3. Clustering

In [141]:
# Initializing location data from Toronto
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude_toronto, longitude_toronto))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [142]:
# Ploting each Borough in Toronto Map with Follium module
toronto_map = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)

# add markers to map
for lat, lng, borough, Neighbourhood in zip(toronto_new_df['Latitude'], toronto_new_df['Longitude'], toronto_new_df['Borough'], toronto_new_df['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#eb8109',
        fill=True,
        fill_color='#eb8109',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

In [147]:
# Credential
CLIENT_ID = 'YKAEK12CL44TAFRIOSXXCVJFZNV13LSABVXZCBQ5FTKEDW4L' # API Foursquare_ID
CLIENT_SECRET = 'MOP0THZS0D4RT3PTN1NZTEI3NZG4GZCHEHQSEST0ZKFQRQAF' # API Foursquare_SECRET
VERSION = '20200210' # version

# radius and limit
radius=500
LIMIT=100

### Scrapping venues data using Foursquare API in Toronto neighborhood

In [149]:
# Creating methods for get nearby venues data from each neighborhood

def get_nearby_venues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # API request with credential
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # get results as json file from the provider
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # minning important parameters from the data
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [150]:
#Scrape it and put it into a pandas dataframe
toronto_venues = get_nearby_venues(names=toronto_new_df['Neighbourhood'],
                                   latitudes=toronto_new_df['Latitude'],
                                   longitudes=toronto_new_df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [153]:
# check the data frame

print(toronto_venues.shape)
toronto_venues.head(15)

(2141, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
5,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
6,Victoria Village,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant
7,Victoria Village,43.725882,-79.315572,Eglinton Ave E & Sloane Ave/Bermondsey Rd,43.726086,-79.31362,Intersection
8,Victoria Village,43.725882,-79.315572,Pizza Nova,43.725824,-79.31286,Pizza Place
9,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery


We need encode each venue categorical variables using OneHot encoder

Read: https://en.wikipedia.org/wiki/One-hot

In [155]:
# encoding each venue category
venue_encode = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
venue_encode['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_encode = [venue_encode.columns[-1]] + list(venue_encode.columns[:-1])
venue_encode.head()

Unnamed: 0,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Victoria Village
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Victoria Village


In [156]:
# grouping the venue by its neighborhood
venue_grouped = venue_encode.groupby('Neighbourhood').mean().reset_index()
venue_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,"Willowdale, Willowdale East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0
92,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
93,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
94,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0


####  The most common venue in each neighborhood

In [157]:
# initializing method
def get_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [158]:
# computing 5 most common venues in each neighborhood
top_venues_n = 5

# ordinal indicators for first three venues
ord_indicator = ['st', 'nd', 'rd']

# generating column based on the venue
columns = ['Neighbourhood']
for ind in np.arange(top_venues_n):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
common_venues_sorted = pd.DataFrame(columns=columns)
common_venues_sorted['Neighbourhood'] = venue_grouped['Neighbourhood']

for ind in np.arange(venue_grouped.shape[0]):
    common_venues_sorted.iloc[ind, 1:] = get_most_common_venues(venue_grouped.iloc[ind, :], top_venues_n)

common_venues_sorted.head()

Unnamed: 0,Neighbourhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Skating Rink,Breakfast Spot,Dog Run
1,"Alderwood, Long Branch",Pizza Place,Pharmacy,Gym,Coffee Shop,Pub
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Gift Shop,Fried Chicken Joint,Sandwich Place
3,Bayview Village,Café,Bank,Japanese Restaurant,Chinese Restaurant,Deli / Bodega
4,"Bedford Park, Lawrence Manor East",Restaurant,Coffee Shop,Sandwich Place,Italian Restaurant,Liquor Store


### Clustering using K-means

In [160]:
# set number of clusters
k_n = 5

# drop neighborhood
venue_grouped_clusters = venue_grouped.drop('Neighbourhood', 1)

#  clustering with kmeans
Km = KMeans(n_clusters=k_n, random_state=0).fit(venue_grouped_clusters)

# check cluster labels 
Km.labels_


array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 1, 0, 3, 1, 3, 3, 3, 2, 3, 3, 3, 3, 0, 3, 3, 3, 0,
       3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3,
       0, 3, 1, 3, 3, 3, 3, 0])

In [161]:
# insert cluster's label
common_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

common_venues_merged = toronto_new_df

# joins table to get lattitude, longitude and label clusters
common_venues_merged = common_venues_merged.join(common_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

common_venues_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Construction & Landscaping,Yoga Studio,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,3.0,French Restaurant,Pizza Place,Coffee Shop,Portuguese Restaurant,Intersection
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,3.0,Coffee Shop,Pub,Bakery,Park,Breakfast Spot
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3.0,Clothing Store,Furniture / Home Store,Accessories Store,Coffee Shop,Miscellaneous Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3.0,Coffee Shop,Diner,Yoga Studio,Hobby Shop,Distribution Center
...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,3.0,River,Distribution Center,Deli / Bodega,Department Store,Dessert Shop
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,3.0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Gay Bar,Restaurant
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,3.0,Light Rail Station,Garden Center,Burrito Place,Skate Park,Auto Workshop
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,3.0,Deli / Bodega,Baseball Field,Dessert Shop,Dim Sum Restaurant,Diner


In [162]:
# checking any rows/columns that containing no data
common_venues_merged.isnull().sum()

Postal Code              0
Borough                  0
Neighbourhood            0
Latitude                 0
Longitude                0
Cluster_Labels           3
1th Most Common Venue    3
2th Most Common Venue    3
3th Most Common Venue    3
4th Most Common Venue    3
5th Most Common Venue    3
dtype: int64

In [164]:
# clean the missing data
common_venues_merged=common_venues_merged.dropna()

In [165]:
# make sure cluster labels has int type data
common_venues_merged[["Cluster_Labels"]] = common_venues_merged[["Cluster_Labels"]].astype("int")
common_venues_merged.dtypes

Postal Code               object
Borough                   object
Neighbourhood             object
Latitude                 float64
Longitude                float64
Cluster_Labels             int32
1th Most Common Venue     object
2th Most Common Venue     object
3th Most Common Venue     object
4th Most Common Venue     object
5th Most Common Venue     object
dtype: object

In [166]:
# neighbor clustered in a map 
venues_map_clusters = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# set color 
x = np.arange(k_n)
ys = [i + x + (i*x)**2 for i in range(k_n)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# set markers for each cluster
markers_colors = []
for lat, lon, poi, cluster in zip(common_venues_merged['Latitude'], common_venues_merged['Longitude'],
                                  common_venues_merged['Neighbourhood'], common_venues_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(venues_map_clusters)
       
venues_map_clusters

### Checking each cluster

In [167]:
# cluster 1
common_venues_merged.loc[common_venues_merged['Cluster_Labels'] == 0, common_venues_merged.columns[[1] + list(range(5, common_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,0,Park,Food & Drink Shop,Construction & Landscaping,Yoga Studio,Dog Run
21,York,0,Park,Women's Store,Pool,Yoga Studio,Distribution Center
35,East York,0,Convenience Store,Park,Metro Station,Yoga Studio,Dessert Shop
64,York,0,Convenience Store,Park,Yoga Studio,Dessert Shop,Dim Sum Restaurant
66,North York,0,Convenience Store,Park,Flower Shop,Dog Run,Dessert Shop
85,Scarborough,0,Park,Playground,Yoga Studio,Distribution Center,Deli / Bodega
91,Downtown Toronto,0,Park,Playground,Trail,Yoga Studio,Distribution Center


In [168]:
# cluster 2
common_venues_merged.loc[common_venues_merged['Cluster_Labels'] == 1, common_venues_merged.columns[[1] + list(range(5, common_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Scarborough,1,Fast Food Restaurant,Yoga Studio,Dog Run,Department Store,Dessert Shop
50,North York,1,Gym,Pizza Place,Dog Run,Department Store,Dessert Shop
52,North York,1,Gym,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant
83,Central Toronto,1,Gym,Tennis Court,Distribution Center,Department Store,Dessert Shop


In [169]:
# cluster 3
common_venues_merged.loc[common_venues_merged['Cluster_Labels'] == 2, common_venues_merged.columns[[1] + list(range(5, common_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
12,Scarborough,2,Bar,Yoga Studio,Dog Run,Dessert Shop,Dim Sum Restaurant
94,Etobicoke,2,Rental Car Location,Drugstore,Bar,Yoga Studio,Distribution Center


In [170]:
# cluster 4
common_venues_merged.loc[common_venues_merged['Cluster_Labels'] == 3, common_venues_merged.columns[[1] + list(range(5, common_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,North York,3,French Restaurant,Pizza Place,Coffee Shop,Portuguese Restaurant,Intersection
2,Downtown Toronto,3,Coffee Shop,Pub,Bakery,Park,Breakfast Spot
3,North York,3,Clothing Store,Furniture / Home Store,Accessories Store,Coffee Shop,Miscellaneous Shop
4,Downtown Toronto,3,Coffee Shop,Diner,Yoga Studio,Hobby Shop,Distribution Center
7,North York,3,Gym,Restaurant,Japanese Restaurant,Coffee Shop,Beer Store
...,...,...,...,...,...,...,...
98,Etobicoke,3,River,Distribution Center,Deli / Bodega,Department Store,Dessert Shop
99,Downtown Toronto,3,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Gay Bar,Restaurant
100,East Toronto,3,Light Rail Station,Garden Center,Burrito Place,Skate Park,Auto Workshop
101,Etobicoke,3,Deli / Bodega,Baseball Field,Dessert Shop,Dim Sum Restaurant,Diner


In [171]:
# cluster 5
common_venues_merged.loc[common_venues_merged['Cluster_Labels'] == 4, common_venues_merged.columns[[1] + list(range(5, common_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,Etobicoke,4,Filipino Restaurant,Yoga Studio,Dog Run,Department Store,Dessert Shop


### Zero data on cluster 6, because k number set on 5

In [172]:
# cluster 6
common_venues_merged.loc[common_venues_merged['Cluster_Labels'] == 5, common_venues_merged.columns[[1] + list(range(5, common_venues_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
