# Capstone Project for Applied Data Science IBM Coursera

Chang Che \
Nov 2019

This notebook will be mainly used for the capstone project.

## Section one: Preparation

### 1. Import related libraries

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
import csv
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

### 2. Web Scraping -- Canada Postal Codes

Use the given url to web scrape the table we want:

In [2]:
wiki = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url,'lxml')
# print(soup)
table = soup.find('table',{'class':'wikitable sortable'})
# print(table)
nrow = len(table.find_all('tr'))

Set up the number of rows and the header for the dataframe:

In [3]:
rows = table.select('tbody > tr')
header = [th.text.rstrip() for th in rows[0].find_all('th')]

The following code will generate a csv file for the scraped table. Meanwhile, (1) the cells with a borough that is Not assigned will be ignored; (2) if a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [4]:
with open('torontopostalcode.csv', 'w') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(header)
        for row in rows[1:]:
            data = [th.text.rstrip() for th in row.find_all('td')]
            if data[1].lower()=='not assigned':
                continue
            if data[2].lower()=='not assigned':
                data[2] = data[1]
            writer.writerow(data)

In [5]:
df = pd.read_csv('torontopostalcode.csv')
header

['Postcode', 'Borough', 'Neighbourhood']

Now we need to combine the duplicate postal codes into the same row:

In [6]:
df[['Postcode','Neighbourhood']] = df.groupby('Postcode')['Neighbourhood'].apply(lambda x: ', '.join(x)).reset_index()
df.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,North York,"Rouge, Malvern"
1,M1C,North York,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Downtown Toronto,"Guildwood, Morningside, West Hill"
3,M1G,North York,Woburn
4,M1H,North York,Cedarbrae
5,M1J,Queen's Park,Scarborough Village
6,M1K,Queen's Park,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,North York,"Birch Cliff, Cliffside West"


Use the .shape method to print the number of rows of the dataframe:

In [7]:
df.shape

(210, 3)

### 3. Use the Geocoder package or the csv file to create the following dataframe:

Read the online csv file as another data frame first:

In [8]:
url2 = "https://cocl.us/Geospatial_data"
df2 = pd.read_csv(url2)

In [9]:
df2.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Meger two data frames:

In [10]:
df3 = pd.merge(left=df,right=df2, left_on='Postcode', right_on='Postal Code')
df3 = df3.drop('Postal Code',axis=1)
df3.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,North York,"Rouge, Malvern",43.806686,-79.194353
1,M1C,North York,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Downtown Toronto,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,North York,Woburn,43.770992,-79.216917
4,M1H,North York,Cedarbrae,43.773136,-79.239476
5,M1J,Queen's Park,Scarborough Village,43.744734,-79.239476
6,M1K,Queen's Park,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,North York,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Section two: Explore and cluster the neighborhoods in Toronto

I will work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. 

### 1. Select the cases that contain the word Toronto in its borough.

In [11]:
address = 'Toronto'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Select the locations with Toronto in Borough:

In [12]:
df4 = df3[df3['Borough'].apply(lambda x: 'Toronto' in x)].reset_index(drop=True)
df4.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1E,Downtown Toronto,"Guildwood, Morningside, West Hill",43.763573,-79.188711
1,M1S,Downtown Toronto,Agincourt,43.7942,-79.262029
2,M1T,Downtown Toronto,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302
3,M3B,Downtown Toronto,Don Mills North,43.745906,-79.352188
4,M4B,East Toronto,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937


Generate a map to visualize Toronto with the neighborhoods superimposed on top(not clustered yet):

In [13]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df4['Latitude'], df4['Longitude'], df4['Borough'], df4['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### 2. Utilize the Foursquare API to obtain venue information

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [14]:
CLIENT_ID = '5WD4QPP0N04UIK5QSHQGOE4N40MK3BNFZGZICX0YNBXWGNB5' # your Foursquare ID
CLIENT_SECRET = 'K12Q2SP11PV0YTSRIKRDB1YVKUERVF3B3YM0JDADNZXH0LP0' # your Foursquare Secret
VERSION = '20191130' # Foursquare API version

Let's create a function to repeat the same process to all the neighborhoods in Toronto. And let's get the top 100 venues that are within a radius of 500 meters.

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
venues = getNearbyVenues(names=df4['Neighbourhood'],latitudes=df4['Latitude'],longitudes=df4['Longitude'])

Guildwood, Morningside, West Hill
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Don Mills North
Woodbine Gardens, Parkview Hill
Woodbine Heights
East Toronto
The Danforth West, Riverdale
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Bedford Park, Lawrence Manor East
Roselawn
Forest Hill North, Forest Hill West
Lawrence Heights, Lawrence Manor
Glencairn
Humewood-Cedarvale
Caledonia-Fairbanks
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Runnymede, Swansea
Queen's Park
Canada Post Gateway Processing Centre
Business Reply Mail Processing Centre 969 Eastern
Weston


Let's check the size of the resulting dataframe:

In [17]:
venues.shape

(700, 7)

In [18]:
venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
1,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
2,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location


Let's check how many venues were returned for each neighborhood:

In [19]:
venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",21,21,21,21,21,21
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"Cabbagetown, St. James Town",44,44,44,44,44,44
Caledonia-Fairbanks,5,5,5,5,5,5
Canada Post Gateway Processing Centre,10,10,10,10,10,10
Christie,17,17,17,17,17,17
Church and Wellesley,83,83,83,83,83,83
"Clarks Corners, Sullivan, Tam O'Shanter",11,11,11,11,11,11
"Commerce Court, Victoria Hotel",100,100,100,100,100,100


Let's find out how many unique categories can be curated from all the returned venues:

In [20]:
len(venues['Venue Category'].unique())

169

### 3. Analyze Each Neighborhood

In [21]:
# one hot encoding
onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
onehot['Neighborhood'] = venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]
onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [22]:
onehot.shape

(700, 170)

In [23]:
toronto_grouped = onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Bedford Park, Lawrence Manor East",0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824
3,"Cabbagetown, St. James Town",0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Caledonia-Fairbanks,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0


In [24]:
toronto_grouped.shape

(30, 170)

The function to sort the venues in descending order:

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create a new dataframe and display the top 10 venues for each neighborhood.

In [26]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Breakfast Spot,Lounge,Skating Rink,Latin American Restaurant,Ethiopian Restaurant,Flower Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
1,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Pub,Butcher,Sandwich Place,Liquor Store,Café,Juice Bar,Restaurant,Sushi Restaurant
2,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Spa,Gym / Fitness Center,Garden Center,Garden,Light Rail Station,Fast Food Restaurant,Farmers Market,Park,Pizza Place
3,"Cabbagetown, St. James Town",Coffee Shop,Pizza Place,Bakery,Italian Restaurant,Café,Restaurant,Pub,Flower Shop,Jewelry Store,Sandwich Place
4,Caledonia-Fairbanks,Park,Women's Store,Fast Food Restaurant,Market,Yoga Studio,Electronics Store,Flower Shop,Fish & Chips Shop,Field,Farmers Market


### 4. Run *k*-means to cluster the neighborhood into 2 clusters.

Run *k*-means to cluster the neighborhood into 2 clusters.

In [27]:
# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_)

[1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 2 1 1 0 1 1]


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [28]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = df4.merge(neighborhoods_venues_sorted, left_on='Neighbourhood',right_on='Neighborhood')
toronto_merged=toronto_merged.drop('Neighborhood',axis=1)

In [29]:
toronto_merged.head(30) # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1E,Downtown Toronto,"Guildwood, Morningside, West Hill",43.763573,-79.188711,1,Breakfast Spot,Electronics Store,Pizza Place,Medical Center,Rental Car Location,Mexican Restaurant,Intersection,Spa,Field,Fast Food Restaurant
1,M1S,Downtown Toronto,Agincourt,43.7942,-79.262029,1,Breakfast Spot,Lounge,Skating Rink,Latin American Restaurant,Ethiopian Restaurant,Flower Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
2,M1T,Downtown Toronto,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302,1,Pizza Place,Pharmacy,Fast Food Restaurant,Noodle House,Chinese Restaurant,Thai Restaurant,Fried Chicken Joint,Italian Restaurant,Bank,Yoga Studio
3,M3B,Downtown Toronto,Don Mills North,43.745906,-79.352188,1,Gym / Fitness Center,Caribbean Restaurant,Basketball Court,Baseball Field,Café,Japanese Restaurant,Dessert Shop,Dance Studio,Flower Shop,Fish & Chips Shop
4,M4B,East Toronto,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,1,Pizza Place,Fast Food Restaurant,Intersection,Pharmacy,Pet Store,Bus Line,Bank,Gastropub,Gym / Fitness Center,Breakfast Spot
5,M4C,Downtown Toronto,Woodbine Heights,43.695344,-79.318389,1,Skating Rink,Bus Stop,Cosmetics Shop,Pharmacy,Video Store,Curling Ice,Park,Dance Studio,Beer Store,Yoga Studio
6,M4J,Downtown Toronto,East Toronto,43.685347,-79.338106,1,Convenience Store,Coffee Shop,Intersection,Park,Dog Run,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
7,M4K,Downtown Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Restaurant,Bookstore,Pizza Place,Brewery,Bubble Tea Shop
8,M4T,Downtown Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,1,Gym,Playground,Trail,Tennis Court,Diner,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
9,M4V,Downtown Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,1,Pub,Coffee Shop,Supermarket,Sushi Restaurant,Fried Chicken Joint,Bagel Shop,Sports Bar,Restaurant,Light Rail Station,Liquor Store


Finally, let's visualize the resulting clusters:

In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 5. Examine Clusters

Now, we examine each cluster and determine the discriminating venue categories that distinguish each cluster. 

#### Cluster 1

In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,0,Park,Playground,Trail,Yoga Studio,Dog Run,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
17,West Toronto,0,Park,Trail,Sushi Restaurant,Jewelry Store,Yoga Studio,Dog Run,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
20,Downtown Toronto,0,Hockey Arena,Field,Trail,Park,Yoga Studio,Dog Run,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant
21,Downtown Toronto,0,Park,Women's Store,Fast Food Restaurant,Market,Yoga Studio,Electronics Store,Flower Shop,Fish & Chips Shop,Field,Farmers Market
29,East Toronto,0,Park,Convenience Store,Dog Run,Flower Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant


Based on the information above, I name this cluster "Park and Entertaiment".

#### Cluster 2

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Breakfast Spot,Electronics Store,Pizza Place,Medical Center,Rental Car Location,Mexican Restaurant,Intersection,Spa,Field,Fast Food Restaurant
1,Downtown Toronto,1,Breakfast Spot,Lounge,Skating Rink,Latin American Restaurant,Ethiopian Restaurant,Flower Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
2,Downtown Toronto,1,Pizza Place,Pharmacy,Fast Food Restaurant,Noodle House,Chinese Restaurant,Thai Restaurant,Fried Chicken Joint,Italian Restaurant,Bank,Yoga Studio
3,Downtown Toronto,1,Gym / Fitness Center,Caribbean Restaurant,Basketball Court,Baseball Field,Café,Japanese Restaurant,Dessert Shop,Dance Studio,Flower Shop,Fish & Chips Shop
4,East Toronto,1,Pizza Place,Fast Food Restaurant,Intersection,Pharmacy,Pet Store,Bus Line,Bank,Gastropub,Gym / Fitness Center,Breakfast Spot
5,Downtown Toronto,1,Skating Rink,Bus Stop,Cosmetics Shop,Pharmacy,Video Store,Curling Ice,Park,Dance Studio,Beer Store,Yoga Studio
6,Downtown Toronto,1,Convenience Store,Coffee Shop,Intersection,Park,Dog Run,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
7,Downtown Toronto,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Restaurant,Bookstore,Pizza Place,Brewery,Bubble Tea Shop
8,Downtown Toronto,1,Gym,Playground,Trail,Tennis Court,Diner,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
9,Downtown Toronto,1,Pub,Coffee Shop,Supermarket,Sushi Restaurant,Fried Chicken Joint,Bagel Shop,Sports Bar,Restaurant,Light Rail Station,Liquor Store


Based on the information above, I name this cluster "Urban Living Area".

#### Cluster 3

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,West Toronto,2,Garden,Yoga Studio,Dog Run,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Ethiopian Restaurant,Electronics Store


It seems like this is a mix of cluster 1 and 2.