# Neighborhoods in Toronto

*The purpose of this notebook is to do a webscrape of the page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, get latitudes and longitudes for the different postal codes with geocoder and finally use Foursquares API to compare the different neighborhoods in Toronto. The three parts of the assignment are all in this notebook (different sections are numbered).* 

## 1. Web scraping postal codes and neighborhoods

In [10]:
#If lxml isn't installed, that needs to be done:
%pip install lxml

Collecting lxml
  Downloading lxml-4.5.0-cp38-cp38-macosx_10_9_x86_64.whl (4.6 MB)
[K     |████████████████████████████████| 4.6 MB 3.2 MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.0
Note: you may need to restart the kernel to use updated packages.


In [None]:
#Import libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
print('Libraries imported')

In [92]:
#Retrieve webpage and create a 'soup':
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html,'html.parser')

In [93]:
#Find the table (as there is only 1) and convert it into a pandas dataframe
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


In [94]:
#Drop rows where Borough is not assigned with a conditioned drop function 
df.drop(df[df['Borough'] == 'Not assigned'].index, inplace = True) 
df.head()  

Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [95]:
#Make neighborhoods separated by commas instead of backslash using the replace function and regular expressions:
df.replace(regex=r'/', value=',', inplace=True)
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park , Harbourfront"
5,M6A,North York,"Lawrence Manor , Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government"


In [96]:
#Checking for neighborhoods without a value
check = df[df['Neighborhood'] == ''] 
check
#Looks like there are none. Good.

Unnamed: 0,Postal code,Borough,Neighborhood


In [97]:
#Resetting index to start from 0:
df.reset_index(drop=True, inplace=True)

#Checking shape
df.shape

(103, 3)

## 2. Retrieving latitudes and longitudes

The next task is to add the coordinates (latitude and longitude) fo each postal code. 

In [37]:
#Then import the module for geocoding
from geopy.geocoders import Nominatim # import geocoder

In [89]:
#Create a loop that goes through the 103 rows of the dataframe and gets the coordinates
count = 0
success = 0
lat_list = []
lon_list = []

while count < 103:
    postal_code = df.at[count,'Postal code']
    count += 1
    address = '{}, Toronto, Ontario'.format(postal_code)
    try:
        geolocator = Nominatim(user_agent="foursquare_agent")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        success += 1
    except:
        latitude = ''
        longitude = ''
    lat_list.append(latitude)
    lon_list.append(longitude)

print(count) # 103
print(success) # 26
#Considering the poor result, we'll use the link to a CSV file containing the data instead

103
26


In [87]:
#Getting CSV file from url
import requests
url = 'http://cocl.us/Geospatial_data'
r = requests.get(url)
with open("Geospatial_Coordinates.csv", "wb") as code:
    code.write(r.content)
print('Data downloaded!')

Data downloaded!


In [91]:
#Reading the CSV file into a dataframe
df_geo = pd.read_csv("Geospatial_Coordinates.csv")

#Changing the name of the postal code to lowercase C
df_geo.rename(columns={"Postal Code":"Postal code"}, errors="raise", inplace=True)

df_geo.head()

Unnamed: 0,Postal code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [98]:
#Merging the two datasets by postal code
df_full = pd.merge(df, df_geo, on='Postal code')
df_full.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494


## 3. Exploring and clustering neighborhoods in Toronto 

For the purpose of this assignment, I have chosen to explore only the neighborhoods of Downtown Toronto.

In [100]:
#Creating a new dataframe containing only the neighborhoods located in Downtown Toronto
df_dt = df_full[df_full['Borough'] == 'Downtown Toronto']
df_dt.shape
#This returns a dataframe with 19 neighborhoods. Let's go.

(19, 5)

### Creating a map of Downtown Toronto. 

In [101]:
#Let's get the coordinates for Toronto
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.6534817 -79.3839347


In [104]:
import folium

# create map of Toronto using latitude and longitude values
map_dt = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(df_dt['Latitude'], df_dt['Longitude'], df_dt['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.5,
        parse_html=False).add_to(map_dt)  
    
map_dt

### To get information about the different neighborhoods, I will use the foursquare API

In [106]:
#Preparing API credentials
CLIENT_ID = 'L4VLULQWHVKXWO4U0UY2CNYHAUHEQZ4YTHP0BDSQWZP0IZHQ' # your Foursquare ID
CLIENT_SECRET = 'IX0BS0UQ3MCNNN4D2A0LTSYWE2XCVBLXNY5NHDPXI0BAVRQT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: L4VLULQWHVKXWO4U0UY2CNYHAUHEQZ4YTHP0BDSQWZP0IZHQ
CLIENT_SECRET:IX0BS0UQ3MCNNN4D2A0LTSYWE2XCVBLXNY5NHDPXI0BAVRQT


In [109]:
#Defininf function to get nearby venues in a neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [110]:
#Running the above function on the Downtorn Toronto neighborhoods:
dt_venues = getNearbyVenues(names=df_dt['Neighborhood'],
                                   latitudes=df_dt['Latitude'],
                                   longitudes=df_dt['Longitude']
                                  )

Regent Park , Harbourfront
Queen's Park , Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond , Adelaide , King
Harbourfront East , Union Station , Toronto Islands
Toronto Dominion Centre , Design Exchange
Commerce Court , Victoria Hotel
University of Toronto , Harbord
Kensington Market , Chinatown , Grange Park
CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst  Quay , South Niagara , Island airport
Rosedale
Stn A PO Boxes
St. James Town , Cabbagetown
First Canadian Place , Underground city
Church and Wellesley


In [111]:
#Checking the results
print(dt_venues.shape)
dt_venues.head()

(1280, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park , Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park , Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park , Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park , Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park , Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


### Next step is analyzing each neighborhood

In [127]:
# one hot encoding
dt_onehot = pd.get_dummies(dt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighborhood'] = dt_venues['Neighborhood'] 

#Right now, columns are in alphabetical order. Now we need to move the new column to the first position. 
cols = list(dt_onehot.columns.values)
cols.remove('Neighborhood')
cols.insert(0,'Neighborhood')
dt_onehot = dt_onehot.reindex(columns=cols)

dt_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Next, I group the data by neighborhood and standardize the data  by taking the mean of the frequency of occurrence of each venue category. After this step, we have a dataset we can use to do cluster analysis!

In [128]:
dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()
dt_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0
1,"CN Tower , King and Spadina , Railway Lands , ...",0.0,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,...,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.012987
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,...,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025316
5,"Commerce Court , Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
6,"First Canadian Place , Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0
8,"Harbourfront East , Union Station , Toronto Is...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
9,"Kensington Market , Chinatown , Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.039474,0.0,0.052632,0.013158,0.0,0.0


While not necessary for the cluster analysis, it can be nice to have a table with an overview of the most common venues for each neighborhood. To do this, we'll create a new dataframe with the top ten results for each neighborhood.

In [157]:
#Importing numpy library
import numpy as np

#Defining a function for putting the results into a dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#Defining that I want the 10 most popular venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']

for ind in np.arange(dt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Café,Farmers Market,Beer Bar,Bakery,Restaurant,Cheese Shop,Hotel
1,"CN Tower , King and Spadina , Railway Lands , ...",Airport Lounge,Airport Service,Airport Terminal,Boat or Ferry,Boutique,Rental Car Location,Coffee Shop,Sculpture Garden,Harbor / Marina,Airport Gate
2,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Burger Joint,Thai Restaurant,Japanese Restaurant,Spa,Salad Place,Gym / Fitness Center,Bubble Tea Shop
3,Christie,Grocery Store,Café,Coffee Shop,Park,Restaurant,Baby Store,Italian Restaurant,Athletics & Sports,Diner,Candy Store
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Yoga Studio,Burger Joint,Hotel,Café,Mediterranean Restaurant


### K-means cluster analysis

In [135]:
#Import scikit learn libraries
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs



In [158]:
# I want to group the neighborhoods into four clusters
kclusters = 4

dt_grouped_clustering = dt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 1, 3, 1, 1, 1, 1, 1, 1], dtype=int32)

Having created the clusters, I will now add them to the dataset with the most common venues

In [159]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dt_merged = df_dt

dt_merged = dt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

dt_merged.head() # check the last columns!

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,1,Coffee Shop,Bakery,Pub,Park,Mexican Restaurant,Breakfast Spot,Restaurant,Café,Yoga Studio,Performing Arts Venue
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Yoga Studio,Creperie,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Fried Chicken Joint,Distribution Center,Discount Store
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Clothing Store,Coffee Shop,Cosmetics Shop,Middle Eastern Restaurant,Café,Japanese Restaurant,Bubble Tea Shop,Diner,Electronics Store,Ramen Restaurant
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Café,Restaurant,American Restaurant,Beer Bar,Cosmetics Shop,Japanese Restaurant,Italian Restaurant,Cocktail Bar,Diner
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cocktail Bar,Seafood Restaurant,Café,Farmers Market,Beer Bar,Bakery,Restaurant,Cheese Shop,Hotel


### Clusters

Now, we're almost done! To visualize the location of the clusters, I'll generate a color-coded map

In [160]:
import folium 
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighborhood'], dt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 1: Fresh air and a healthy body
This is apparently the place to be in Downtown, if you love nature and if you are the sporty type. Parks, trails and playgrounds hit the top three, with dog run, yoga and dance studio also respresented, not leaving much space for restaurants. 

In [161]:
dt_merged.loc[dt_merged['Cluster Labels'] == 0, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Rosedale,0,Park,Trail,Playground,Yoga Studio,Dance Studio,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


### Cluster 2: City life joys
This is the largest cluster, and an obvious similarity is the abundance of coffee shops. In these areas, you get to enjoy city life to the fullest, with restaurants, cafe's and gyms en masse.

In [162]:
dt_merged.loc[dt_merged['Cluster Labels'] == 1, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Regent Park , Harbourfront",1,Coffee Shop,Bakery,Pub,Park,Mexican Restaurant,Breakfast Spot,Restaurant,Café,Yoga Studio,Performing Arts Venue
4,"Queen's Park , Ontario Provincial Government",1,Coffee Shop,Yoga Studio,Creperie,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Fried Chicken Joint,Distribution Center,Discount Store
9,"Garden District, Ryerson",1,Clothing Store,Coffee Shop,Cosmetics Shop,Middle Eastern Restaurant,Café,Japanese Restaurant,Bubble Tea Shop,Diner,Electronics Store,Ramen Restaurant
15,St. James Town,1,Coffee Shop,Café,Restaurant,American Restaurant,Beer Bar,Cosmetics Shop,Japanese Restaurant,Italian Restaurant,Cocktail Bar,Diner
20,Berczy Park,1,Coffee Shop,Cocktail Bar,Seafood Restaurant,Café,Farmers Market,Beer Bar,Bakery,Restaurant,Cheese Shop,Hotel
24,Central Bay Street,1,Coffee Shop,Italian Restaurant,Sandwich Place,Burger Joint,Thai Restaurant,Japanese Restaurant,Spa,Salad Place,Gym / Fitness Center,Bubble Tea Shop
30,"Richmond , Adelaide , King",1,Coffee Shop,Restaurant,Café,Bar,Hotel,Bakery,Thai Restaurant,Gym,Steakhouse,Lounge
36,"Harbourfront East , Union Station , Toronto Is...",1,Coffee Shop,Aquarium,Restaurant,Hotel,Café,Italian Restaurant,Fried Chicken Joint,Scenic Lookout,Brewery,Sporting Goods Shop
42,"Toronto Dominion Centre , Design Exchange",1,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gastropub,Seafood Restaurant,Bar,Japanese Restaurant,Sporting Goods Shop
48,"Commerce Court , Victoria Hotel",1,Coffee Shop,Café,Restaurant,Hotel,American Restaurant,Gym,Gastropub,Seafood Restaurant,Japanese Restaurant,Vegetarian / Vegan Restaurant


### Cluster 3: Airport
There is not much to say here. The airport is like no other place in downtown if you want to go to the airport lounge, the gate, the terminal, etc....

In [163]:
dt_merged.loc[dt_merged['Cluster Labels'] == 2, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
87,"CN Tower , King and Spadina , Railway Lands , ...",2,Airport Lounge,Airport Service,Airport Terminal,Boat or Ferry,Boutique,Rental Car Location,Coffee Shop,Sculpture Garden,Harbor / Marina,Airport Gate


### Cluster 4: Family friendly
What sets Christie apart from the rest of downtown is the abundance of grocery stores, candy stores and baby stores. However, you can still get your daily latte kick or a romantic italian dinner experience.

In [164]:
dt_merged.loc[dt_merged['Cluster Labels'] == 3, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Christie,3,Grocery Store,Café,Coffee Shop,Park,Restaurant,Baby Store,Italian Restaurant,Athletics & Sports,Diner,Candy Store


## Wrap-up
This concludes my little downtown Toronto analysis. 