# 1. Creating and cleaning the DataFrame from the list in Wikipedia

#### The first thing to do is import all relevant dependencies

In [3]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

#### Then we have to download the website where the table is located and create the BeautifulSoup object. Then we can observe the HTML code of the page and locate the table that we need.

In [4]:
html_doc = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = requests.get(html_doc) #download de webpage where the table is located

soup = BeautifulSoup(page.content, 'html.parser') #create the BeautifulSoup object to parser

soup

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"dcacf011-3e89-4a5d-819c-21afa04af03e","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":975466835,"wgRevisionId":975466835,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Communications in Ontario","

#### As we can see, the table is identified by the string < tbody >, the headers or names of the columns with the string < th >, and the data with the string < td >.  We use the find_all() method of the BeautifulSoup object to create a list with the column names, and a list with the data. We also eliminate the character newline (\n) because we will not use it in the dataframe.

In [159]:
column_pc = [] #names of the columns of the table
data_pc = [] #data of the table

for elem in soup.tbody.find_all("th"): #iterate only elements in the titles of the table
    column_pc.append(elem.text.replace("\n","")) 
    
for elem in soup.tbody.find_all("td"): #iterate only elements in the body of the table
    data_pc.append(elem.text.replace("\n",""))
    
show = 10

print("Column names: {}".format(column_pc))
print("First {} elements of data: {}".format(show,data_pc[0:show]))

Column names: ['Postal Code', 'Borough', 'Neighbourhood']
First 10 elements of data: ['M1A', 'Not assigned', 'Not assigned', 'M2A', 'Not assigned', 'Not assigned', 'M3A', 'North York', 'Parkwoods', 'M4A']


#### Now, we only have to create the dataframe with the column names of the appropiate list, and fill it with the data. It's important to note that the three columns of the table are in the same list, so we have to separate them into the columns of the dataframe.

In [6]:
pc_df = pd.DataFrame(columns=column_pc) #Create de dataframe with the names of the columns from the original table

for j in range(len(column_pc)):
    pc_df[column_pc[j]] = [data_pc[i] for i in range(j, len(data_pc),3)] #Separate each element in the data_pc list into the corresponding column in the dataframe

pc_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


#### We have to eliminate (or drop) all the rows in which the Borough is listed as *Not assigned*, so we create a dataframe in which this rows doesn't exist.

In [7]:
pc_df_proc = pc_df[pc_df["Borough"] != "Not assigned"].reset_index(drop = True) #Drop every row in which Borough has a value of "Not assigned"

pc_df_proc.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


#### To have only one row for each unique postal code,  we create an auxiliary dataframe with all the instances of a postal code, then we append all the neighbourhoods into the first row, separated with a comma. Then we drop all the other rows, and we append the dataframe to another auxiliary dataframe. We repeat this for each postal code. At the end, we have the desired dataframe.

In [8]:
unique_pc = pc_df_proc.drop_duplicates(subset="Postal Code",keep ="first")["Postal Code"] #List of uniques Postal Codes
df_aux = pd.DataFrame(columns=column_pc) #auxiliary dataframe
pc_df_aux2 = pd.DataFrame(columns=column_pc) #auxiliary dataframe

for code in unique_pc:
    df_aux = pc_df_proc[pc_df_proc["Postal Code"] == code].reset_index(drop = True)
    
    for i in range(1,df_aux.shape[0]-1):
        df_aux.loc[0,"Neighbourhood"] = df_aux.loc[0,"Neighbourhood"] + ", " + df_aux.loc[i,"Neighbourhood"]
    
    df_aux.drop_duplicates(subset="Postal Code", inplace=True)
    pc_df_aux2 = pc_df_aux2.append(df_aux,ignore_index=True)
    
pc_df_proc = pc_df_aux2

print("There are {} rows in the processed dataframe, in which {} have an unique postal code.".format(pc_df_proc.shape[0], pc_df_proc.nunique()[0]))

There are 103 rows in the processed dataframe, in which 103 have an unique postal code.


#### We replace every neighbourhood listed as *Not assigned* with the name of their boroughs. We know there are no boroughs that are not "Not assigned" because we already remove them.

In [9]:
pc_df_proc.loc[pc_df_proc[pc_df_proc["Neighbourhood"] == "Not assigned"].index,"Neighbourhood"] = pc_df_proc.loc[pc_df_proc[pc_df_proc["Neighbourhood"] == "Not assigned"].index,"Borough"]

print("There are {} rows in the dataframe, in which {} have a Neighborhood as Not assigned.".format(pc_df_proc.shape[0], pc_df_proc[pc_df_proc["Neighbourhood"] == "Not assigned"].shape[0]))

There are 103 rows in the dataframe, in which 0 have a Neighborhood as Not assigned.


#### We can see the number of rows of the resulting dataframe and compare it to the original one

In [10]:
print("The number of rows of the original raw dataframe is: {}".format(pc_df.shape[0]))
print("The number of rows of the processed dataframe is: {}".format(pc_df_proc.shape[0]))

The number of rows of the original raw dataframe is: 180
The number of rows of the processed dataframe is: 103


# 2. Getting the Latitud and Longitud of each Postal Code

#### We import all relevant dependencies

In [12]:
!pip install geocoder

import geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |███▎                            | 10kB 15.9MB/s eta 0:00:01[K     |██████▋                         | 20kB 1.8MB/s eta 0:00:01[K     |██████████                      | 30kB 2.3MB/s eta 0:00:01[K     |█████████████▎                  | 40kB 2.6MB/s eta 0:00:01[K     |████████████████▋               | 51kB 2.0MB/s eta 0:00:01[K     |████████████████████            | 61kB 2.3MB/s eta 0:00:01[K     |███████████████████████▎        | 71kB 2.6MB/s eta 0:00:01[K     |██████████████████████████▋     | 81kB 2.8MB/s eta 0:00:01[K     |██████████████████████████████  | 92kB 2.9MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 2.3MB/s 
Collecting ratelim
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad4

#### We try to use the geocoder package, but since it has a limited amount of tries per day, we will see if it's possible, given the amount of times it will have to be used.

In [13]:
postal_code = "M1A"
tries = 30 #how many iterations are tried before the while loop stops

# initialize the variable to None
lat_lng_coords = None

aux = 0
# loop until it gets the coordinates or tried enough times
while(lat_lng_coords is None):
    g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
    lat_lng_coords = g.latlng
    if aux == tries: 
        lat_lng_coords = ("Error","Error")
    aux += 1

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

print(latitude, longitude)

Error Error


#### Since it's impossible to use the package, we will use the csv file provided in the course. First, we download de csv file.

In [15]:
!wget -O latlong.csv http://cocl.us/Geospatial_data

--2020-08-30 17:35:38--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 169.55.161.7
Connecting to cocl.us (cocl.us)|169.55.161.7|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-08-30 17:35:39--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|169.55.161.7|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-08-30 17:35:40--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 103.116.4.197
Connecting to ibm.box.com (ibm.box.com)|103.116.4.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-08-30 17:35:41--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb

#### Then, we convert it to a dataframe.

In [16]:
df_latlong = pd.read_csv("latlong.csv")

df_latlong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### We check that it has the same number of rows than our processed dataframe.

In [17]:
print("The number of rows of the provided dataframe is: {}".format(df_latlong.shape[0]))

The number of rows of the provided dataframe is: 103


#### We merge both dataframes using the *Postal Code* column as reference, and we get the desired dataframe.

In [18]:
pc_df_proc = pc_df_proc.merge(df_latlong, on="Postal Code")

pc_df_proc.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


# 3. Cluster analysis

#### We import all relevant dependencies

In [19]:
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from geopy.geocoders import Nominatim 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes  #Only if it hasn't been installed before
import folium # map rendering library

#### We define the a user_agent, called *tor_explorer*, to define an instance of the geocoder.

In [20]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### We create a map of Toronto with the Neighborhoods superimposed.

In [160]:
# create map of Toronto using latitude and longitude values
map_toronto_full = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(pc_df_proc['Latitude'], pc_df_proc['Longitude'], pc_df_proc['Borough'], pc_df_proc['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_full)  
    
map_toronto_full

#### To reduce the complexity, we will only use neighbourhoods that contain the word "Toronto"

In [22]:
toronto_df = pc_df_proc[pc_df_proc["Borough"].str.contains("Toronto")].reset_index(drop = True)

toronto_df.head()

39


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


#### We show the new selection of neighbourhoods on the map.

In [24]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### We use the Foursquare API to explore the neighbourhoods. We enter the Foursquare credentials.

In [None]:
# @hidden_cell

CLIENT_ID = input("Enter Foursquare CLIENT ID: ") # Foursquare ID
CLIENT_SECRET = input("Enter Foursquare CLIENT SECRET: ") # Foursquare Secret
VERSION = '20180605' # Foursquare API version

#### We create a function that looks for the first venues within a given radius and of a certain latitude and longitude, and with a limit of results. Since the neighbourhoods are located close together, a short radius is chosen.

In [43]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now, we simply run the function with the data in the dataframe with the neighbourhoods of Toronto.

In [60]:
limit = 100

toronto_venues = getNearbyVenues(names=toronto_df['Neighbourhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude'],
                                   LIMIT = limit
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


#### We need to rename the column from American to Canadian English.

In [61]:
toronto_venues.rename(columns={'Neighborhood':'Neighbourhood'}, inplace=True)

In [106]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


#### We drop all the venues where the category is *Neighborhood*, because it is not considered a venue for this partiuclar study.

In [114]:
toronto_venues[toronto_venues['Venue Category'] == "Neighborhood"]

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
266,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
417,"Richmond, Adelaide, King",43.650571,-79.384568,Downtown Toronto,43.653232,-79.385296,Neighborhood
524,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Harbourfront,43.639526,-79.380688,Neighborhood
957,Studio District,43.659526,-79.340923,Leslieville,43.66207,-79.337856,Neighborhood


In [119]:
toronto_venues = toronto_venues.drop(toronto_venues[toronto_venues['Venue Category'] == "Neighborhood"].index)

toronto_venues[toronto_venues['Venue Category'] == "Neighborhood"]

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category


#### We check we have enough data to have a significant result.

In [120]:
toronto_venues.shape[0]

1630

#### And how many venues we have for each neighbourhood.

In [121]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,57,57,57,57,57,57
"Brockton, Parkdale Village, Exhibition Place",22,22,22,22,22,22
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",14,14,14,14,14,14
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17
Central Bay Street,64,64,64,64,64,64
Christie,18,18,18,18,18,18
Church and Wellesley,76,76,76,76,76,76
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,32,32,32,32,32,32
Davisville North,9,9,9,9,9,9


#### And how many categories we have.

In [122]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 235 uniques categories.


#### We can say we have enough data, so we create dummy columns with each category, and we put Neighborhood as the first column.

In [123]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
ind_minus_neig = list(toronto_onehot.columns)
ind_minus_neig.remove("Neighbourhood")

fixed_columns = ["Neighbourhood"] + ind_minus_neig
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,...,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### We group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [124]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,...,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.035088,0.0,0.0,0.0,0.017544,0.017544,0.0,0.035088,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,...,0.0,0.0,0.035088,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.017544,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.015625
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.026316,0.0,0.013158,...,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,...,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,...,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### We create a function to sort venues by most appearences.

In [125]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### We create a dataframe with the top 10 venues per Neighborhood.

In [126]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Pub,Cheese Shop,Beer Bar,Restaurant,Bakery,Cocktail Bar,Café,Seafood Restaurant,Farmers Market
1,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Coffee Shop,Nightclub,Stadium,Intersection,Bakery,Italian Restaurant,Climbing Gym,Restaurant
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Garden,Park,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market,Fast Food Restaurant
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Coffee Shop,Boutique,Boat or Ferry,Rental Car Location,Plane,Sculpture Garden
4,Central Bay Street,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Salad Place,Japanese Restaurant,Bubble Tea Shop,Burger Joint,Juice Bar,Thai Restaurant
5,Christie,Grocery Store,Café,Park,Diner,Candy Store,Baby Store,Restaurant,Athletics & Sports,Italian Restaurant,Bank
6,Church and Wellesley,Coffee Shop,Sushi Restaurant,Gay Bar,Japanese Restaurant,Restaurant,Yoga Studio,Pub,Bubble Tea Shop,Hotel,Café
7,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym,American Restaurant,Japanese Restaurant,Seafood Restaurant,Gastropub,Vegetarian / Vegan Restaurant
8,Davisville,Dessert Shop,Sandwich Place,Café,Italian Restaurant,Gym,Sushi Restaurant,Pizza Place,Coffee Shop,Brewery,Diner
9,Davisville North,Park,Pizza Place,Breakfast Spot,Sandwich Place,Food & Drink Shop,Dance Studio,Hotel,Department Store,Gym / Fitness Center,Cosmetics Shop


In [127]:
print(toronto_grouped.shape)

(39, 236)


#### Now, it is time to finally create the clusters. We will use five clusters because a very small number would have too much error, and a very big number would be too overespecializates. A test was made with several values, and 5 was the one that showed the best results.

In [141]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 3, 2, 2, 2, 1, 2, 1, 2, 3, 2,
       2, 2, 2, 2, 3, 4, 2, 2, 2, 2, 2, 2, 1, 0, 2, 2, 2], dtype=int32)

#### We assign the cluster label to each neighbourhood.

In [142]:
# add clustering labels
try:
  neighborhoods_venues_sorted.drop("Cluster Labels",axis = 1, inplace = True) #Drop a previously created "Cluster Labels" column, useful to try different number of clusters to find the optimal one
  neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
except:
  neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Cosmetics Shop,Shoe Store,Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Yoga Studio,Bank,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Clothing Store,Coffee Shop,Bubble Tea Shop,Japanese Restaurant,Café,Cosmetics Shop,Hotel,Electronics Store,Fast Food Restaurant,Pizza Place
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Coffee Shop,Café,Cocktail Bar,American Restaurant,Italian Restaurant,Beer Bar,Seafood Restaurant,Hotel,Restaurant,Park
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Pub,Health Food Store,Coffee Shop,Trail,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant


#### And to be able to better visualize the results, we asign a color according to the cluster label to each of the neighbourhoods in our map.

In [143]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Finally we show which Neighbourhoods corresponde to each cluster.

#### **Cluster 0.** The name of the neighbourhood clearly indicates the presence of beaches, so it's understandable that it belongs to a different category than the other neighbourhoods.

In [154]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + [2] + list(range(5, toronto_merged.shape[1]))]].reset_index(drop = True)

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,The Beaches,0,Pub,Health Food Store,Coffee Shop,Trail,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant


#### **Cluster 1.** This neighbourhoods are located in the outskirst of the zone we are analizing. The precense of pharmacies, sandwich places, restaurants and pizza places indicate it is a recidential area.

In [155]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + [2] + list(range(5, toronto_merged.shape[1]))]].reset_index(drop = True)

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Toronto,"Dufferin, Dovercourt Village",1,Pharmacy,Bakery,Park,Liquor Store,Café,Bar,Bank,Supermarket,Middle Eastern Restaurant,Pizza Place
1,East Toronto,"India Bazaar, The Beaches West",1,Liquor Store,Restaurant,Steakhouse,Ice Cream Shop,Food & Drink Shop,Sushi Restaurant,Brewery,Fish & Chips Shop,Italian Restaurant,Fast Food Restaurant
2,Central Toronto,Lawrence Park,1,Park,Jewelry Store,Bus Line,Swim School,Dim Sum Restaurant,Discount Store,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
3,Central Toronto,Davisville North,1,Park,Pizza Place,Breakfast Spot,Sandwich Place,Food & Drink Shop,Dance Studio,Hotel,Department Store,Gym / Fitness Center,Cosmetics Shop
4,Central Toronto,"The Annex, North Midtown, Yorkville",1,Sandwich Place,Café,Coffee Shop,Park,Pizza Place,Donut Shop,Burger Joint,Indian Restaurant,Middle Eastern Restaurant,Pub
5,Central Toronto,Davisville,1,Dessert Shop,Sandwich Place,Café,Italian Restaurant,Gym,Sushi Restaurant,Pizza Place,Coffee Shop,Brewery,Diner
6,East Toronto,"Business reply mail Processing Centre, South C...",1,Light Rail Station,Garden,Park,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market,Fast Food Restaurant


#### **Cluster 2**. Given the closeness of the neighbourhoods at the core of the zone, it was expected to have a fairly homogeneous result there. That's why we have so many elements in this cluster. The heavy precense of coffee shops and cafés indicate a place oriented to bussiness and going out. 

In [156]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + [2] + list(range(5, toronto_merged.shape[1]))]].reset_index(drop = True)

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",2,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Cosmetics Shop,Shoe Store,Restaurant
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",2,Coffee Shop,Yoga Studio,Bank,Beer Bar,Smoothie Shop,Sandwich Place,Café,Portuguese Restaurant,Chinese Restaurant,Persian Restaurant
2,Downtown Toronto,"Garden District, Ryerson",2,Clothing Store,Coffee Shop,Bubble Tea Shop,Japanese Restaurant,Café,Cosmetics Shop,Hotel,Electronics Store,Fast Food Restaurant,Pizza Place
3,Downtown Toronto,St. James Town,2,Coffee Shop,Café,Cocktail Bar,American Restaurant,Italian Restaurant,Beer Bar,Seafood Restaurant,Hotel,Restaurant,Park
4,Downtown Toronto,Berczy Park,2,Coffee Shop,Pub,Cheese Shop,Beer Bar,Restaurant,Bakery,Cocktail Bar,Café,Seafood Restaurant,Farmers Market
5,Downtown Toronto,Central Bay Street,2,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Salad Place,Japanese Restaurant,Bubble Tea Shop,Burger Joint,Juice Bar,Thai Restaurant
6,Downtown Toronto,Christie,2,Grocery Store,Café,Park,Diner,Candy Store,Baby Store,Restaurant,Athletics & Sports,Italian Restaurant,Bank
7,Downtown Toronto,"Richmond, Adelaide, King",2,Coffee Shop,Café,Restaurant,Gym,Clothing Store,Bar,Hotel,Thai Restaurant,Lounge,Sushi Restaurant
8,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",2,Coffee Shop,Aquarium,Café,Hotel,Restaurant,Brewery,Scenic Lookout,Fried Chicken Joint,Bar,Sporting Goods Shop
9,West Toronto,"Little Portugal, Trinity",2,Bar,Asian Restaurant,Vietnamese Restaurant,Coffee Shop,Men's Store,Restaurant,Café,Yoga Studio,Malay Restaurant,Bakery


#### **Cluster 3**. This cluster corresponds to the north of our zone of study. It's a part of town with more parks than the rest.

In [157]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + [2] + list(range(5, toronto_merged.shape[1]))]].reset_index(drop = True)

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",3,Park,Jewelry Store,Trail,Sushi Restaurant,Yoga Studio,Diner,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
1,Central Toronto,"Moore Park, Summerhill East",3,Park,Trail,Tennis Court,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
2,Downtown Toronto,Rosedale,3,Park,Playground,Trail,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


#### **Cluster 4.** In this cluster we have only one neighbourhood with a venue named "Garden" as the most common. A quick Google search indicates this corresponds to a big cementery. 

In [158]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + [2] + list(range(5, toronto_merged.shape[1]))]].reset_index(drop = True)

Unnamed: 0,Borough,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Roselawn,4,Garden,Home Service,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
