# Segmenting and Clustering Neighborhoods in Toronto

In this assignment, we will perform the below activities:
  - Data Collection
  - Data formatting
  - Data Normalization
  - Clustering
    

## Data Collection

I will collect the data from Wikipedia using the below URL
    - https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

All the Postal Codes will be considered, except the ones which are not yet assigned.

To extract the necessary data from this website, I will need to use some web scapping activities. I will use Python's beautifulsoup4 package to do the web scrapping. 



In [12]:
!pip install geopy



### Import necessary packages

In [13]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from collections import OrderedDict
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium 

## Web Scapping

Retrieve the complete page content first

In [6]:
URL = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(URL)

## Create dataframe of cordinates from the given CSV file

At first I tried to use the geocoder API to get the coordinates. goecoder API was going non-responsive.

After trying couple of times with the geocoder api to retrieve the coordinates, I have decided to use the CSV file present in the below URL to get the coordinates:
    - https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv


Read the CSV from the URL using Panda's read_csv method

In [7]:
cord_df = pd.read_csv("https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv")
cord_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Write an utility method to get the coordinates from the above dataframe.
This method will take the Toronto postal code and return the tuple of latitude and longitude 

In [8]:
def getCoordinatesFromCSVData(postal_code):
    temp_df = cord_df.loc[cord_df['Postal Code'] == postal_code]
    latitude = temp_df["Latitude"].values[0]
    longitude = temp_df["Longitude"].values[0]
    return latitude, longitude

Now our utility method getCoordinatesFromCSVData is ready. Now we can start extracting the data to build the required dataframe.

We will use BeautifulSoup's APIs to parse the html components and extract the data from the required component.

We will need to point the table which is holding the Postal Code information.

In many cells (the content of span component) the values are not standard
 - Some of the Postal Code is not assigned to any borough. Those need to be ignored
 - Parenthesis is not properly maintained
 - For some neighborhood names, <br/> is present between 2 words
All these need to be cleaned to collect the list of neighborhoods for each Postal Code

Once the data is cleaned up, then for each postal code:
 - Call the getCoordinatesFromCSVData() method to collect the latitude and longitude
 - Get the name of the borough
 - Get the list of neighborhoods
 - Create an Ordered Dictionary for the Postal code specific data and append into an array
 
Please note: I had to use the Ordered Dictionary, because, the order of the columns in the target DataFrame were not in desired order.

At the last, convert the whole array to Panda's dataframe for the desired result.

In [9]:
soup = BeautifulSoup(page.content, 'html.parser')
temp_toronto_data = []

tables = soup.find_all('table', {'rules': 'all'})
for table in tables:
    tbody = table.find('tbody')
    trs = tbody.find_all('tr')
    for tr in trs:
        tds = tr.find_all('td')
        for td in tds:
            para = td.find('p')
            b_tag = para.find('b')
            zipCode = b_tag.get_text()
            span_tag = para.find('span')
            span_tag_text = str(span_tag).replace("<br/>", "####", 1)
            span_tag_text = span_tag_text.replace("<br/>", "|||||")
            #print(span_tag_text)
            span_tag = BeautifulSoup(span_tag_text, 'html.parser')
            
            if "Not assigned" not in span_tag.get_text():
                span_tag_text = span_tag.text
                span_tag_text = span_tag_text.replace('(', '')
                span_tag_text = span_tag_text.replace(')', '')
                span_tag_text = span_tag_text.replace('/', ',')
                info_array = span_tag_text.split("####")
                borough = info_array[0]
                temp_neighbors = info_array[1]
                temp_neighbors = temp_neighbors.replace("Downsview|||||", "Downsview ")
                temp_neighbors = temp_neighbors.replace("Don Mills|||||", "Don Mills ") 
                temp_neighbors = temp_neighbors.replace("Willowdale|||||", "Willowdale ")
                temp_neighbors = temp_neighbors.replace("Northwest|||||Clairville", "Northwest Clairville") 
                temp_neighbors = temp_neighbors.replace("Danforth ||||| East", "Danforth East") 
                temp_neighbors = temp_neighbors.replace("|||||", ',')
                temp_neighbors = temp_neighbors.replace(" ,", ',')
                if temp_neighbors.startswith(","):
                    temp_neighbors = temp_neighbors.replace(",", '', 1).strip()
                #print("PostalCode " + zipCode + " is assigned to Borough : [" + borough + "] Neighborhoods : [" + temp_neighbors + "]")
                latitude, longitude = getCoordinatesFromCSVData(zipCode)
                temp_toronto_data.append(OrderedDict({
                    "PostalCode" : zipCode, "Borough" : borough, "Neighborhood" : temp_neighbors, "Latitude" : latitude, "Longitude" : longitude
                }))
                
                


toronto_df = pd.DataFrame(temp_toronto_data)
toronto_df.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


In [10]:
toronto_df.shape

(103, 5)

In [23]:
toronto_df.groupby('Neighborhood').count()

Unnamed: 0_level_0,PostalCode,Borough,Latitude,Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Agincourt,1,1,1,1
"Alderwood, Long Branch",1,1,1,1
"Bathurst Manor, Wilson Heights, Downsview North",1,1,1,1
Bayview Village,1,1,1,1
"Bedford Park, Lawrence Manor East",1,1,1,1
Berczy Park,1,1,1,1
"Birch Cliff, Cliffside West",1,1,1,1
"Brockton, Parkdale Village, Exhibition Place",1,1,1,1
"Business reply mail Processing Centre,969 Eastern,Enclave of M4L",1,1,1,1
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",1,1,1,1


In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(toronto_df['Borough'].unique()),
        toronto_df.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


Now lets put the neighboroods in the Toronto map. For that, we will be doing the below activities:
 - Use Geopy package to get the Coordinates of Tornonto, Canada
 - Use Folium package to create the map of Toronto, Canada
 - Add the neighbporhoods we had retieved from Wikipedia website on the same map

In [15]:
city_name = "Toronto, Ontario"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(city_name)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Now, lets mark the neighborhoods on Toronto Map

In [16]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Now lets start working on the clustering of the neighborhoods. We will perform the below activities:
    - Use Foursquare APIs to retrieve the venues near the neighborhoods
    - Do data wrangling to convert the venue categories to numerical columns
    - Use sklearn's KMean clustering algorithm for idenfying the clusters 

Before proceeding further, let's initialize necessary information about FourSquare credentials in an hidden cell:
 - CLIENT_ID
 - CLIENT_SECRET
 - Version

In [17]:
# The code was removed by Watson Studio for sharing.

In [18]:
LIMIT = 100

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Let's create a function to retrieve nearby venues for an input neighborhood in Toronto, Canada
This function will be used for each neghborhood

In [50]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        print("Neighborhood Name : [" + name + "] Number of Venues returned : %d" %(len(results)))
        # return only relevant information for each nearby venue
        if len(results) == 0:
            venues_list.append([(
                name, 
                lat, 
                lng, 
                'NOT AVAILABLE', 
                0.0, 
                0.0,  
                'NOT AVAILABLE')])
                
        venues_list.append([(
               name, 
               lat, 
               lng, 
               v['venue']['name'], 
               v['venue']['location']['lat'], 
               v['venue']['location']['lng'],  
               v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [51]:
toronto_venues = getNearbyVenues(names=toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )


Neighborhood Name : [Parkwoods] Number of Venues returned : 2
Neighborhood Name : [Victoria Village] Number of Venues returned : 4
Neighborhood Name : [Regent Park, Harbourfront] Number of Venues returned : 47
Neighborhood Name : [Lawrence Manor, Lawrence Heights] Number of Venues returned : 10
Neighborhood Name : [Ontario Provincial Government] Number of Venues returned : 38
Neighborhood Name : [Islington Avenue] Number of Venues returned : 0
Neighborhood Name : [Malvern, Rouge] Number of Venues returned : 1
Neighborhood Name : [Don Mills North] Number of Venues returned : 5
Neighborhood Name : [Parkview Hill, Woodbine Gardens] Number of Venues returned : 12
Neighborhood Name : [Garden District, Ryerson] Number of Venues returned : 100
Neighborhood Name : [Glencairn] Number of Venues returned : 4
Neighborhood Name : [West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale] Number of Venues returned : 0
Neighborhood Name : [Rouge Hill, Port Union, Highland Creek] Number 

In [88]:
print(toronto_venues.shape)
toronto_venues.head()

(2229, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [53]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Wilson Heights, Downsview North",20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",23,23,23,23,23,23
Berczy Park,54,54,54,54,54,54
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre,969 Eastern,Enclave of M4L",16,16,16,16,16,16
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16


### Please note 

We had retrieved total 103 unique neighborhoods but for 4 Neighborhoods, Foursquare API didn't return any venue. That is why we got 99 (103 - 4) unique neighborhoods. Below are the neighborhoods having no venue:

 - Upper Rouge
 - Islington Avenue
 - West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
 - Willowdale, Newtonbrook
 

Now let's see how many unique venue categories we got in Toronto, Canada

In [54]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 269 uniques categories.


## Now let's transform the categories to features using One Hot Coding method

In [95]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot.shape

(2229, 269)

### Now add the "Nehiborhood" Column from the toronto_venues dataframe

In [96]:
# add neighborhood column back to dataframe
print("Length of toronto_venues :: %d" % (len(toronto_venues['Neighborhood'])))
temp_neighborhood = toronto_onehot['Neighborhood']
print("Number of rows in toronto_onehot :: %d " % (toronto_onehot.shape[0]))
print("Number of columns in toronto_onehot before removal of two column :: %d " % (len(toronto_onehot.columns)))
toronto_onehot = toronto_onehot.drop(columns=["NOT AVAILABLE", "Neighborhood"])
print("Number of columns in toronto_onehot after removal of two column :: %d " % (len(toronto_onehot.columns)))
toronto_onehot.insert(0, 'Neighborhood', toronto_venues['Neighborhood'])
print("Number of columns in toronto_onehot after adding one column :: %d " % (len(toronto_onehot.columns)))


Length of toronto_venues :: 2229
Number of rows in toronto_onehot :: 2229 
Number of columns in toronto_onehot before removal of two column :: 269 
Number of columns in toronto_onehot after removal of two column :: 267 
Number of columns in toronto_onehot after adding one column :: 268 


In [97]:
toronto_onehot.shape

(2229, 268)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [99]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.shape

(103, 268)

#### Let's print each neighborhood along with the top 5 most common venues

In [100]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt ----
                       venue  freq
0             Breakfast Spot  0.25
1  Latin American Restaurant  0.25
2               Skating Rink  0.25
3                     Lounge  0.25
4          Accessories Store  0.00


----Alderwood, Long Branch----
                venue  freq
0         Pizza Place  0.22
1            Pharmacy  0.11
2                 Pub  0.11
3  Athletics & Sports  0.11
4      Sandwich Place  0.11


----Bathurst Manor, Wilson Heights, Downsview North----
                       venue  freq
0                       Bank  0.10
1                Coffee Shop  0.10
2                Bridal Shop  0.05
3             Sandwich Place  0.05
4  Middle Eastern Restaurant  0.05


----Bayview Village----
                 venue  freq
0                 Bank  0.25
1                 Café  0.25
2   Chinese Restaurant  0.25
3  Japanese Restaurant  0.25
4    Accessories Store  0.00


----Bedford Park, Lawrence Manor East----
                     venue  freq
0           Sandwich Plac

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [101]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [102]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Skating Rink,Breakfast Spot,Lounge,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
1,"Alderwood, Long Branch",Pizza Place,Gym,Pub,Skating Rink,Coffee Shop,Pharmacy,Athletics & Sports,Sandwich Place,Yoga Studio,Distribution Center
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Frozen Yogurt Shop,Bridal Shop,Sandwich Place,Restaurant,Diner,Supermarket,Ice Cream Shop,Sushi Restaurant
3,Bayview Village,Japanese Restaurant,Café,Bank,Chinese Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Sandwich Place,Restaurant,Coffee Shop,Juice Bar,Sushi Restaurant,Indian Restaurant,Butcher,Fast Food Restaurant,Thai Restaurant


## 4. Cluster Neighborhoods

In [103]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [104]:
len(kmeans.labels_)

103

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [106]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

ValueError: cannot insert Cluster Labels, already exists

In [109]:
colors = ['red', 'blue', 'green', 'purple', 'orange']                        

'red'

Lets plot the neighborhoods on the Toronto Map with different colors:
 - Red for Cluster 1
 - Blue for Cluster 2
 - Green for Cluster 3
 - Purple for Cluster 4
 - Orange for Cluster 5

In [121]:
# create map of Toronto using latitude and longitude values
map_toronto_clustered = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Borough'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = '{}, {}'.format(neighborhood, "Cluster %d" % (cluster + 1))
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=colors[cluster],
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_clustered)  
    
map_toronto_clustered

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Members of Cluster 1

In [113]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Park,Food & Drink Shop,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Falafel Restaurant
21,York,0,Park,Women's Store,Market,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Dessert Shop,Drugstore
35,East York,0,Park,Coffee Shop,Convenience Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Dumpling Restaurant
40,North York,0,Park,Airport,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
61,Central Toronto,0,Park,Swim School,Bus Line,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant,Event Space
64,York,0,Park,Convenience Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
66,North York,0,Park,Bank,Convenience Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
85,Scarborough,0,Park,Playground,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
91,Downtown Toronto,0,Park,Playground,Trail,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant


#### Members of Cluster 2

In [114]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Intersection,Coffee Shop,Portuguese Restaurant,Hockey Arena,Yoga Studio,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run
2,Downtown Toronto,1,Coffee Shop,Park,Bakery,Pub,Café,Theater,Mexican Restaurant,Farmers Market,Event Space,Shoe Store
3,North York,1,Clothing Store,Accessories Store,Coffee Shop,Boutique,Miscellaneous Shop,Furniture / Home Store,Event Space,Vietnamese Restaurant,Dim Sum Restaurant,Comic Shop
4,Queen's Park,1,Coffee Shop,Park,Yoga Studio,Café,Bar,Beer Bar,Italian Restaurant,Seafood Restaurant,Juice Bar,Sandwich Place
5,Etobicoke,1,Yoga Studio,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
6,Scarborough,1,Fast Food Restaurant,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
7,North York,1,Gym / Fitness Center,Café,Caribbean Restaurant,Baseball Field,Japanese Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Empanada Restaurant
8,East York,1,Pizza Place,Gym / Fitness Center,Café,Intersection,Athletics & Sports,Bus Line,Gastropub,Bank,Pharmacy,Pet Store
9,Downtown Toronto,1,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Cosmetics Shop,Japanese Restaurant,Café,Restaurant,Fast Food Restaurant,Italian Restaurant,Bubble Tea Shop
10,North York,1,Park,Pizza Place,Pub,Japanese Restaurant,Dumpling Restaurant,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Deli / Bodega


#### Members of Cluster 3

In [115]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,2,Baseball Field,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market
101,Etobicoke,2,Baseball Field,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market


#### Members of Cluster 4

In [116]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,3,Garden,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Department Store


#### Members of Cluster 5

In [117]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,4,Playground,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
83,Central Toronto,4,Playground,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
