# Coursera Capstone Project - Week 3 Notebook

This Notebook contains the activities from Week 3, and it is split in three parts: data wrangling, geocoding, and clustering and analysis (two parts).

## Assignment 1: Obtaining postal codes for the Toronto area

In Part 1, the goal is to obtain postal data for Boroughs and Neighborhoods in the Toronto area, by scrapping a table found in wikipedia.

In [1]:
# Importing tools for Part 1

#DataFrame Library
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#Numerical analysis Library
import numpy as np #

#Request management Library
import requests

#HTML parsing Library
from bs4 import BeautifulSoup

To obtain the data, we will use the BeautifulSoup Library:

In [2]:
#Importing raw data

website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text #Defining the target URL
soup = BeautifulSoup(website_url, 'lxml') #Creating a parser object called 'soup'

PostalCode_table = soup.find('table', {'class':'wikitable'}) #Passing the table object to variable 'PostalCode_table'
PostalCode_table_data = PostalCode_table.tbody.find_all('td') #Passing the objects (td-tags) inside the table to variable 'PostalCode_table_data'

#Looping through td-objects and stripping the text to a new list, 'PostalCode_data'

PostalCode_data = [] #Object initialization 

for td in PostalCode_table_data: #Looping and stripping
    PostalCode_data.append(td.text.strip())
    
PostalCode_data[0:6] #Checkpoint

['M1A', 'Not assigned', '', 'M2A', 'Not assigned', '']

Once the data has been obtained, we will organize it into separate lists and them assemble it in a pandas DataFrame:

In [3]:
#Shaping the Data

#Initializing the container lists that will hold the data
Postcode = []
Borough = []
Neighborhood = []

#Appending the data to the container lists, removing PostalCodes that are not associated with a Borough
for i in range(1, len(PostalCode_data)-1, 3):
    if PostalCode_data[i] != 'Not assigned':
        Postcode.append(PostalCode_data[i-1])
        Borough.append(PostalCode_data[i])
        Neighborhood.append(PostalCode_data[i+1])
        
print(Postcode[:3], Borough[:3], Neighborhood[:3]) #Checkpoint

['M3A', 'M4A', 'M5A'] ['North York', 'North York', 'Downtown Toronto'] ['Parkwoods', 'Victoria Village', 'Regent Park / Harbourfront']


In [5]:
#Assembling the main DataFrame 'df' with three columns: PostalCode, Borough, and Neighborhood

#Converting the container lists to numpy arrays and transposing them
Postcode = np.array(Postcode)
Postcode = Postcode.reshape(Postcode.shape[0],1)
Borough = np.array(Borough)
Borough = Borough.reshape(Borough.shape[0],1)
Neighborhood = np.array(Neighborhood)
Neighborhood = Neighborhood.reshape(Neighborhood.shape[0],1)

In [10]:
#Combining the arrays into a single matrix 'combine'
Combine = np.concatenate((Postcode, Borough, Neighborhood), axis=1)

#Initializing the DataFrame
df = pd.DataFrame(Combine)
df.columns = ['PostalCode', 'Borough', 'Neighborhood']

#Checkpoint
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [11]:
print(df.shape)

(103, 3)


## Assignment 2: Geocoding

In Part 2, the goal is to use Geocoding to obtain Latitude and Longitude values for each Borough/Neighborhood combination.

In [12]:
# Importing tools for Part 2

#For geolocation, we will use Library pgeocode
! pip install pgeocode
import pgeocode

#For spatial data visualization, we will use Library Folium
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Collecting pgeocode
  Downloading https://files.pythonhosted.org/packages/86/44/519e3db3db84acdeb29e24f2e65991960f13464279b61bde5e9e96909c9d/pgeocode-0.2.1-py2.py3-none-any.whl
Installing collected packages: pgeocode
Successfully installed pgeocode-0.2.1
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  c

Although there are many geolocation libraries available, in this activity we will employ pgeocode, a library specialized in postal codes:

In [13]:
location = pgeocode.Nominatim('CA')

In [14]:
# Obtaining Latitude/Longitude data for each entry on 'df' 

location = pgeocode.Nominatim('CA') #The pgeocode library requires the country of query to be specified, in this case, Canada

#Initializing the containers that will host geolocation data
latitude_list = []
longitude_list = []

#Querying location data
for index, row in df.iterrows():
        loc = location.query_postal_code(row[0])
        latitude_list.append(loc['latitude'])
        longitude_list.append(loc['longitude'])

print("Latitude/Longitude import complete")

print(latitude_list[:5], longitude_list[:5]) #Checkpoint

Latitude/Longitude import complete
[43.7545, 43.7276, 43.6555, 43.7223, 43.6641] [-79.33, -79.3148, -79.3626, -79.4504, -79.3889]


In [15]:
#Incorporating the obtained data to the main dataframe 'df'

latitude_list = np.array(latitude_list)
latitude_list.reshape(latitude_list.shape[0],1)

longitude_list = np.array(longitude_list)
longitude_list.reshape(longitude_list.shape[0],1)

df['Latitude'] = latitude_list
df['Longitude'] = longitude_list

df.head() #Checkpoint

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.6555,-79.3626
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.7223,-79.4504
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.6641,-79.3889


## Assignment 3: Exploring and clustering

In [16]:
#Checking data integrity

bool_series = pd.isnull(df["Longitude"])
df[bool_series]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
76,M7R,Mississauga,Canada Post Gateway Processing Centre,,


One of the PostalCodes failed to return values of Latitude and Longitude. Upon closer analysis, that postal code do not correspond to a neighborhood, but to a Postal Service center. Therefore, we will drop that row from the DataFrame and check the integrity of the data after that.

In [17]:
df.dropna(inplace=True) #Drops entire rows containing null values
df.reset_index(drop=True, inplace=True) #Resets the index of the DF to prevent problems during index iterations

bool_series = pd.isnull(df["Longitude"])
df[bool_series]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude


In [18]:
#Searching for duplicated results

dupl = df.duplicated('PostalCode', keep=False)

for i in range(0, len(dupl)):
    if dupl[i]:
        print ('Duplicated Postal Code!')

dupl = df.duplicated('Neighborhood', keep=False)        
        
for i in range(0, len(dupl)):
    if dupl[i]:
        print ('Duplicated Neighborhood! Index= {}'.format(i))

Duplicated Neighborhood! Index= 7
Duplicated Neighborhood! Index= 13
Duplicated Neighborhood! Index= 40
Duplicated Neighborhood! Index= 46
Duplicated Neighborhood! Index= 53
Duplicated Neighborhood! Index= 59
Duplicated Neighborhood! Index= 60
Duplicated Neighborhood! Index= 72


As we can see, although there are no duplicate Postal Codes, some neighborhoods are duplicated because they have different Postal Codes. Let's look at those neighborhoods on the map:

In [20]:
map_toronto = folium.Map(location=[df['Latitude'][0], df['Longitude'][0]], zoom_start=12) #Map Initialization

i=0

#Plotting each datapoint from df on the map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    if df.duplicated('Neighborhood', keep=False)[i]:
        folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    else:
        folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    i+=1
    
map_toronto #Render the map

As we can see in the map above, the dupicated Neighborhoods with different Postal codes are sufficiently apart that we should consider them as different neighborhoods. Let's differentiate them based on the postal code:

In [21]:
#Finding and renaming duplicated Neighborhoods

duplicate_locations = df.duplicated('Neighborhood', keep=False) #Generates a boolean list with the positions of duplicated Neighborhoods

for i in range(0, len(duplicate_locations)-1): #Loops through the dataframe and rename the duplicated Neighborhoods with the Postal Code
    if duplicate_locations[i]:
        df['Neighborhood'][i] = df['Neighborhood'][i] + " - " + df['PostalCode'][i]
    
df.head(10) #Checkpoint

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.6555,-79.3626
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.7223,-79.4504
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.6641,-79.3889
5,M9A,Etobicoke,Islington Avenue,43.6662,-79.5282
6,M1B,Scarborough,Malvern / Rouge,43.8113,-79.193
7,M3B,North York,Don Mills - M3B,43.745,-79.359
8,M4B,East York,Parkview Hill / Woodbine Gardens,43.7063,-79.3094
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783


Now that our data is curated, we can move to data analysis.

### Part 3.1 Neighborhood analysis

In Part 3, we will obtain venue data for each neighboorhood, compare their composition, and cluster them based on that data.

In [22]:
#Importing the tools necessary for this segment

#Importing json libraries to handle the query results from the Foursquare API
import json
from pandas.io.json import json_normalize

Before we begin, let's look at the area coverage if we analyze the same area for each postcode:

In [24]:
map_toronto = folium.Map(location=[df['Latitude'][0], df['Longitude'][0]], zoom_start=11) #Map Initialization

#Plotting each datapoint from df on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=500, #When the function Circle is used, radius takes a value in meters.
    popup=label,
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
map_toronto #Render the map

It is clear, from the map above, that a radius of 500 results in subtantial overlap over Toronto downtown, average coverage in the areas immediately surrounding Toronto downtown, and poor coverage for the peripheral areas. Let's see what happend when radius is 800:

In [25]:
map_toronto = folium.Map(location=[df['Latitude'][0], df['Longitude'][0]], zoom_start=12) #Map Initialization

#Plotting each datapoint from df on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=800, #When the function Circle is used, radius takes a value in meters.
    popup=label,
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
map_toronto #Render the map

As we can see, a radius of 800 works well for the area surrounding Toronto Downtown, but offers only moderate coverage for the more distant areas. To solve that problem, let's separate our dataset in three major areas:

In [26]:
#To start, let's obtain a df with only Downtown Toronto

df_downtown = df.loc[df['Borough'] == 'Downtown Toronto'] #Assign all entries for 'Downtown Toronto' to a new DF 'df_downtown'
df_downtown.reset_index(drop=True, inplace=True) #Resets the Index

df_downtown.head(10) #Checkpoint

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park / Harbourfront,43.6555,-79.3626
1,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.6641,-79.3889
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756
4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754
5,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386
6,M6G,Downtown Toronto,Christie,43.6683,-79.4205
7,M5H,Downtown Toronto,Richmond / Adelaide / King,43.6496,-79.3833
8,M5J,Downtown Toronto,Harbourfront East / Union Station / Toronto Is...,43.623,-79.3936
9,M5K,Downtown Toronto,Toronto Dominion Centre / Design Exchange,43.6469,-79.3823


In [27]:
#Next, we will generate a df that has Toronto Boroughs excluding Downtown

df_toronto = df[df['Borough'].str.contains("Toronto")] #Assign all Neighborhoods containing the word 'Toronto' to a new DF 'df_toronto'

df_toronto.drop(df_toronto.loc[df_toronto['Borough'] == 'Downtown Toronto'].index, inplace=True) #Removes all instaces of the 'Downtown Toronto' Borough
df_toronto.reset_index(drop=True, inplace=True) #Resets the Index

df_toronto.head(10) #Checkpoint

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.6784,-79.2941
1,M6H,West Toronto,Dufferin / Dovercourt Village,43.6655,-79.4378
2,M6J,West Toronto,Little Portugal / Trinity,43.648,-79.4177
3,M4K,East Toronto,The Danforth West / Riverdale,43.6803,-79.3538
4,M6K,West Toronto,Brockton / Parkdale Village / Exhibition Place,43.6383,-79.4301
5,M4L,East Toronto,India Bazaar / The Beaches West,43.6693,-79.3155
6,M4M,East Toronto,Studio District,43.6561,-79.3406
7,M4N,Central Toronto,Lawrence Park,43.7301,-79.3935
8,M5N,Central Toronto,Roselawn,43.7113,-79.4195
9,M4P,Central Toronto,Davisville North,43.7135,-79.3887


In [28]:
#And finally, let's assign the remaining Buroughs to a DF of Suburbs

df_suburbs = df.copy() #Cloning the complete DataFrame to a new DataFrame object 'df_suburbs'

df_suburbs_boolean = df['Borough'].str.contains("Toronto") #Returns a list with the index of every row where the Borough contains 'Toronto'

for i in range(0, len(df_suburbs_boolean)-1): #Loops through each row of the df_surrounding DF, dropping every row that contains 'Toronto'
    if df_suburbs_boolean[i]:
        df_suburbs.drop(i, inplace=True)

df_suburbs.reset_index(drop=True, inplace=True) #Resets the Index

df_suburbs.head(10) #Checkpoint

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M6A,North York,Lawrence Manor / Lawrence Heights,43.7223,-79.4504
3,M9A,Etobicoke,Islington Avenue,43.6662,-79.5282
4,M1B,Scarborough,Malvern / Rouge,43.8113,-79.193
5,M3B,North York,Don Mills - M3B,43.745,-79.359
6,M4B,East York,Parkview Hill / Woodbine Gardens,43.7063,-79.3094
7,M6B,North York,Glencairn,43.7081,-79.4479
8,M9B,Etobicoke,West Deane Park / Princess Gardens / Martin Gr...,43.6505,-79.5517
9,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.7878,-79.1564


Let's evaluate the total coverage with our new custom datasets:

In [30]:
#Given the above results, it is clear that our query is resulting in uneven returns. Let's plot our query radius by neighborhood:

map_toronto = folium.Map(location=[df['Latitude'][0], df['Longitude'][0]], zoom_start=11) #Map Initialization

#Plotting each datapoint from downtown on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df_downtown['Latitude'], df_downtown['Longitude'], df_downtown['Borough'], df_downtown['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=300,
    popup=label,
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
#Plotting each datapoint from Toronto on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=750,
    popup=label,
    color='blue',
    fill=True,
    fill_color='blue',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Plotting each datapoint from the suburbs on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df_suburbs['Latitude'], df_suburbs['Longitude'], df_suburbs['Borough'], df_suburbs['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=1000,
    popup=label,
    color='green',
    fill=True,
    fill_color='green',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
map_toronto #Render the map

Although the map reveal we've obtained a much better coverage with custom radius, while also minimizing overlaps, there are two problematic datapoints still in the analysis:

* As the map above shows, one data point doesn't match the others. That data point corresponds to a Postal service processing center, rather than a neighborhood. Therefore, let's remove it from the analysis.

* One of the postal codes in Downtown Toronto addresses Underground city, overlapping very significantly with other markers. Since the geolocation cannot differentiate between the undergroud and the surface venues, we should remove that marker from the analysis as well.

* The Neighborhood Commerce Court/Victoria Hotel also overlaps significantly with other downtown neighborhoods, and therefore must be removed

In [31]:
#Removing excessive datapoints

df_toronto.drop(df_toronto.index[df_toronto['Neighborhood'] == "Business reply mail Processing CentrE"], inplace=True)
df_toronto.reset_index(drop=True, inplace=True)

df_downtown.drop(df_downtown.index[df_downtown['PostalCode'] == "M5X"], inplace=True)
df_downtown.drop(df_downtown.index[df_downtown['PostalCode'] == "M5L"], inplace=True)
df_downtown.reset_index(drop=True, inplace=True)

df_downtown #Checkpoint

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park / Harbourfront,43.6555,-79.3626
1,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.6641,-79.3889
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756
4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754
5,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386
6,M6G,Downtown Toronto,Christie,43.6683,-79.4205
7,M5H,Downtown Toronto,Richmond / Adelaide / King,43.6496,-79.3833
8,M5J,Downtown Toronto,Harbourfront East / Union Station / Toronto Is...,43.623,-79.3936
9,M5K,Downtown Toronto,Toronto Dominion Centre / Design Exchange,43.6469,-79.3823


Let's see if the problems were solved:

In [33]:
#Given the above results, it is clear that our query is resulting in uneven returns. Let's plot our query radius by neighborhood:

map_toronto = folium.Map(location=[df['Latitude'][0], df['Longitude'][0]], zoom_start=11) #Map Initialization

#Plotting each datapoint from downtown on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df_downtown['Latitude'], df_downtown['Longitude'], df_downtown['Borough'], df_downtown['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=300,
    popup=label,
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
#Plotting each datapoint from Toronto on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=750,
    popup=label,
    color='blue',
    fill=True,
    fill_color='blue',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

#Plotting each datapoint from the suburbs on the map, with real radius of search in meters
for lat, lng, borough, neighborhood in zip(df_suburbs['Latitude'], df_suburbs['Longitude'], df_suburbs['Borough'], df_suburbs['Neighborhood']):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
    [lat, lng],
    radius=1000,
    popup=label,
    color='green',
    fill=True,
    fill_color='green',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
    
map_toronto #Render the map

Now that our coverage problems were solved, we can set up the FourSquare API calls for each area. The following cell will be hidden in the final code to protect the Foursquare credentials

In [42]:
# The code was removed by Watson Studio for sharing.

In [55]:
#Defining Functions that will help obtain data from Foursquare

#Function to return the 'venue type' from each query result

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#Function to return all nearby venues to a location
    
def getNearbyVenues(names, latitudes, longitudes, radius, limit=100):
    
    venues_list=[] #Initialize the container
    
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # Assemble the URL for endpoint request
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request and create an exception if a LAT/LONG pair return an error
        results = requests.get(url).json()
        if results['meta']['code'] != 500:
            results = results["response"]['groups'][0]['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We will now perform three separate calls with custom Radius: Downtown, Toronto, Suburbs:

In [66]:
#Requesting the API and saving the relevant information to separate DF

Downtown_raw_data = getNearbyVenues(names=df_downtown['Neighborhood'], latitudes=df_downtown['Latitude'], longitudes=df_downtown['Longitude'], radius=300) #See function defined above

Toronto_raw_data = getNearbyVenues(names=df_toronto['Neighborhood'], latitudes=df_toronto['Latitude'], longitudes=df_toronto['Longitude'], radius=750) #See function defined above

Suburbs_raw_data = getNearbyVenues(names=df_suburbs['Neighborhood'], latitudes=df_suburbs['Latitude'], longitudes=df_suburbs['Longitude'], radius=1000) #See function defined above

Downtown_raw_data.head() #Checkpoint

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park / Harbourfront,43.6555,-79.3626,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park / Harbourfront,43.6555,-79.3626,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park / Harbourfront,43.6555,-79.3626,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,Regent Park / Harbourfront,43.6555,-79.3626,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,Regent Park / Harbourfront,43.6555,-79.3626,Cocina Economica,43.654959,-79.365657,Mexican Restaurant


Let's examine the total number of venues obtained for each zone:

In [67]:
#Starting with Downtown

Downtown_raw_data.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,17,17,17,17,17,17
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,18,18,18,18,18,18
Central Bay Street,12,12,12,12,12,12
Christie,2,2,2,2,2,2
Church and Wellesley,55,55,55,55,55,55
"Garden District, Ryerson",50,50,50,50,50,50
Harbourfront East / Union Station / Toronto Islands,3,3,3,3,3,3
Kensington Market / Chinatown / Grange Park,19,19,19,19,19,19
Queen's Park / Ontario Provincial Government,4,4,4,4,4,4
Regent Park / Harbourfront,17,17,17,17,17,17


In [68]:
#Let's check the number of neighborhoods returned to make sure all neighborhoods returned a match
print(len(df_downtown), len(Downtown_raw_data.groupby('Neighborhood').count()))

17 17


In [69]:
#Followed by the remaining of Toronto

Toronto_raw_data.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Brockton / Parkdale Village / Exhibition Place,100,100,100,100,100,100
Davisville,36,36,36,36,36,36
Davisville North,18,18,18,18,18,18
Dufferin / Dovercourt Village,62,62,62,62,62,62
Forest Hill North & West,6,6,6,6,6,6
High Park / The Junction South,55,55,55,55,55,55
India Bazaar / The Beaches West,53,53,53,53,53,53
Lawrence Park,4,4,4,4,4,4
Little Portugal / Trinity,100,100,100,100,100,100
Moore Park / Summerhill East,32,32,32,32,32,32


In [63]:
#Let's check the number of neighborhoods returned to make sure all neighborhoods returned a match
print(len(df_toronto), len(Toronto_raw_data.groupby('Neighborhood').count()))

19 19


In [70]:
#And finally, the Suburbs

Suburbs_raw_data.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,44,44,44,44,44,44
Alderwood / Long Branch,29,29,29,29,29,29
Bathurst Manor / Wilson Heights / Downsview North,30,30,30,30,30,30
Bayview Village,8,8,8,8,8,8
Bedford Park / Lawrence Manor East,42,42,42,42,42,42
Birch Cliff / Cliffside West,14,14,14,14,14,14
Caledonia-Fairbanks,25,25,25,25,25,25
Cedarbrae,21,21,21,21,21,21
Clarks Corners / Tam O'Shanter / Sullivan,37,37,37,37,37,37
Cliffside / Cliffcrest / Scarborough Village West,14,14,14,14,14,14


In [65]:
#Let's check the number of neighborhoods returned to make sure all neighborhoods returned a match
print(len(df_suburbs), len(Suburbs_raw_data.groupby('Neighborhood').count()))

63 61


In [71]:
#In this group, we find a mismatch between the input DF and the venues results. Let's identify what is the Neighborhood that is missing.

returned_suburbs = pd.DataFrame(Suburbs_raw_data.groupby('Neighborhood').size().reset_index(name='Group Count')) #Turns the venue count object in a DF

df_merge = returned_suburbs.merge(df_suburbs, how='outer', on='Neighborhood') #Merges counts and the original suburb DF

df_merge

Unnamed: 0,Neighborhood,Group Count,PostalCode,Borough,Latitude,Longitude
0,Agincourt,44.0,M1S,Scarborough,43.7946,-79.2644
1,Alderwood / Long Branch,29.0,M8W,Etobicoke,43.6021,-79.5402
2,Bathurst Manor / Wilson Heights / Downsview North,30.0,M3H,North York,43.7535,-79.4472
3,Bayview Village,8.0,M2K,North York,43.7797,-79.3813
4,Bedford Park / Lawrence Manor East,42.0,M5M,North York,43.7335,-79.4177
5,Birch Cliff / Cliffside West,14.0,M1N,Scarborough,43.6952,-79.2646
6,Caledonia-Fairbanks,25.0,M6E,York,43.6889,-79.4507
7,Cedarbrae,21.0,M1H,Scarborough,43.7686,-79.2389
8,Clarks Corners / Tam O'Shanter / Sullivan,37.0,M1T,Scarborough,43.7812,-79.3036
9,Cliffside / Cliffcrest / Scarborough Village West,14.0,M1M,Scarborough,43.7247,-79.2312


As we can see in the table above, no venues were returned for the Upper Rouge Neighborhood by the Foursquare API. Let's confirm that is the case:

In [72]:
lat = df_suburbs[df_suburbs['Neighborhood'] == 'Upper Rouge']['Latitude'].values[0]
lng = df_suburbs[df_suburbs['Neighborhood'] == 'Upper Rouge']['Longitude'].values[0]
radius = 1000
limit = 100

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ea3034e963d29001b89c445'},
  'headerLocation': 'Rouge',
  'headerFullLocation': 'Rouge, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 0,
  'suggestedBounds': {'ne': {'lat': 43.843000009000015,
    'lng': -79.19444666564566},
   'sw': {'lat': 43.82499999099999, 'lng': -79.21935333435435}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': []}]}}

We have now confirmed that there are no venue hits for Upper Rouge in the Foursquare Database. Let's remove it from the df_suburbs dataframe:

In [74]:
#Dropping the Upper Rouge Neighborhood

df_suburbs.drop(df_suburbs.index[df_suburbs['Neighborhood'] == "Upper Rouge"], inplace=True)
df_suburbs.reset_index(drop=True, inplace=True)

As we can see, with our approach we were able to attain substantial coverage of venues in the examined area, particularly in the suburbs and greater Toronto. Downtown Toronto has the poorest coverage due to the geographical proximity of many of its neighborhoods. For analysis purposes, let's create a fourth DF containing the combined data of the three areas:

In [75]:
df_total = pd.concat([df_downtown, df_toronto, df_suburbs]).reset_index(inplace=False)
Total_raw_data = pd.concat([Downtown_raw_data, Toronto_raw_data, Suburbs_raw_data])

Total_raw_data.groupby('Neighborhood').count() #Checkpoint

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,44,44,44,44,44,44
Alderwood / Long Branch,29,29,29,29,29,29
Bathurst Manor / Wilson Heights / Downsview North,30,30,30,30,30,30
Bayview Village,8,8,8,8,8,8
Bedford Park / Lawrence Manor East,42,42,42,42,42,42
Berczy Park,17,17,17,17,17,17
Birch Cliff / Cliffside West,14,14,14,14,14,14
Brockton / Parkdale Village / Exhibition Place,100,100,100,100,100,100
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,18,18,18,18,18,18
Caledonia-Fairbanks,25,25,25,25,25,25


Now that all 4 datasets are ready, we need to start preparing the data for clustering. The first step is to hot encode the venue data:

## Part 3.2: Neighborhood analysis, Part 2

Now that the data has been imported, we can start adjusting it for KNN cluster analysis:

In [76]:
#Defining Hot-Encoding and Visualization functions that will be used for clustering

def venuesHotEncoder(venue_df): #The hot encoder will return a DF containing the relative frequency of venues for a DF
    hot_encoded = pd.get_dummies(venue_df[['Venue Category']], prefix="", prefix_sep="") #Initializes the DF that will be return and hot-codes the venues for each neighborhood
    hot_encoded[' Neighborhood'] = venue_df['Neighborhood'] #Inserts Neighborhood names in the hot encoded DF. Obs: There is a venue category called 'Neighborhood'. Therefore, to avoid misplacing the data, the new DF has an empty space in the 'Neighborhood' label
    
    fixed_columns = [hot_encoded.columns[-1]] + list(hot_encoded.columns[:-1]) #Transports the Neighborhood column to the first position of the table
    hot_encoded = hot_encoded[fixed_columns] #Applies the correct order of columns
    
    hot_encoded = hot_encoded.groupby(' Neighborhood').mean().reset_index() #Groups the neighborhoods and replaces boolean values for the average of occurrance of each venue type for a certain neighborhood

    return hot_encoded #Returns a DF containing the relative frequency of venues for a certain DF

In [77]:
#Applying the Hot-Encoding function to each DF

downtown_encoded = venuesHotEncoder(Downtown_raw_data)
toronto_encoded = venuesHotEncoder(Toronto_raw_data)
suburbs_encoded = venuesHotEncoder(Suburbs_raw_data)
total_encoded = venuesHotEncoder(Total_raw_data)

total_encoded.head() #Checkpoint

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach Bar,Beer Bar,Beer Store,Big Box Store,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,College Stadium,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Daycare,Deli / Bodega,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Leather Goods Store,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,Neighborhood.1,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Paintball Field,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skating Rink,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Storage Facility,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.159091,0.0,0.0,0.0,0.0,0.022727,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.022727,0.0,0.0,0.090909,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.068966,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bathurst Manor / Wilson Heights / Downsview North,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.033333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bedford Park / Lawrence Manor East,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.02381,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.071429,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.02381,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0


In [78]:
# import k-means for clustering
from sklearn.cluster import KMeans

In [79]:
#Initializing the KNN object

kclusters = 5 #Arbitrary number to start the analysis

#Removing the non-dependable variable from the analysis DF
total_encoded_clustering = total_encoded.drop(' Neighborhood', 1)

#Creating and Fitting the KNN objects
total_kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(total_encoded_clustering)

#Checkpoint
print(len(df_total), len(total_kmeans.labels_))

98 98


In [80]:
#Incorporating Clusters to the DF

#Cloning the DFs
df_total_clustered = df_total.copy()
df_total_clustered.sort_values('Neighborhood', inplace=True)

#Inserting Cluster Labels
df_total_clustered.insert(len(df_total_clustered.columns), 'Cluster Labels', total_kmeans.labels_)

df_total_clustered.head(10) #Checkpoint

Unnamed: 0,index,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels
87,51,M1S,Scarborough,Agincourt,43.7946,-79.2644,0
93,57,M8W,Etobicoke,Alderwood / Long Branch,43.6021,-79.5402,0
56,20,M3H,North York,Bathurst Manor / Wilson Heights / Downsview North,43.7535,-79.4472,0
63,27,M2K,North York,Bayview Village,43.7797,-79.3813,2
73,37,M5M,North York,Bedford Park / Lawrence Manor East,43.7335,-79.4177,0
4,4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,0
76,40,M1N,Scarborough,Birch Cliff / Cliffside West,43.6952,-79.2646,0
21,4,M6K,West Toronto,Brockton / Parkdale Village / Exhibition Place,43.6383,-79.4301,0
12,12,M5V,Downtown Toronto,CN Tower / King and Spadina / Railway Lands / ...,43.6404,-79.3995,0
51,15,M6E,York,Caledonia-Fairbanks,43.6889,-79.4507,0


Now that we have the clusters for each analysis zone, we can plot and analyse the consistency of the clusters:

In [81]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [82]:
#Plotting cluster when the analyses was performed for the whole area

map_clusters = folium.Map(location=[df_total_clustered['Latitude'][0], df_total_clustered['Longitude'][0]], zoom_start=12) #df_downtown coordinates is used to initialize the map regardless of the analysis zone so the map is initialized showing the whole area

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, bor, cluster in zip(df_total_clustered['Latitude'], df_total_clustered['Longitude'], df_total_clustered['Neighborhood'], df_total_clustered['Borough'], df_total_clustered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - ' + str(bor) + ' - ' + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

From the map above, we can see that Downtown Toronto has been segmented in 4 clusters (0, 1, 2 and 3), Greater Toronto only two cluster (0 and 2), and the Suburbs include 4 clusters (0, 2, 3 and 4). If we analyze the zones individually, can we obtain the same clustering?

In [83]:
#Initializing the KNN object

#Number of clusters based on the clusters defined by analysis of the whole zone
kcluster_downtown = 4
kcluster_toronto = 2
kcluster_suburbs = 4

#Removing the non-dependable variable from the analysis DF
downtown_encoded_clustering = downtown_encoded.drop(' Neighborhood', 1) 
toronto_encoded_clustering = toronto_encoded.drop(' Neighborhood', 1)
suburbs_encoded_clustering = suburbs_encoded.drop(' Neighborhood', 1)

#Creating and Fitting the KNN objects
downtown_kmeans = KMeans(n_clusters=kcluster_downtown, random_state=0).fit(downtown_encoded_clustering)
toronto_kmeans = KMeans(n_clusters=kcluster_toronto, random_state=0).fit(toronto_encoded_clustering)
suburbs_kmeans = KMeans(n_clusters=kcluster_suburbs, random_state=0).fit(suburbs_encoded_clustering)

#Checkpoint
print(len(df_downtown), len(downtown_kmeans.labels_))
print(len(df_toronto), len(toronto_kmeans.labels_))
print(len(df_suburbs),len(suburbs_kmeans.labels_))

17 17
19 19
62 62


In [84]:
#Incorporating Clusters to the DF

#Cloning the DFs
df_downtown_clustered = df_downtown.copy()
df_downtown_clustered.sort_values('Neighborhood', inplace=True) #Sorting datapoints so they match the cluster labels
df_toronto_clustered = df_toronto.copy()
df_toronto_clustered.sort_values('Neighborhood', inplace=True) #Sorting datapoints so they match the cluster labels
df_suburbs_clustered = df_suburbs.copy()
df_suburbs_clustered.sort_values('Neighborhood', inplace=True) #Sorting datapoints so they match the cluster labels

#Inserting Cluster Labels
df_downtown_clustered.insert(len(df_downtown_clustered.columns), 'Cluster Labels', downtown_kmeans.labels_)
df_toronto_clustered.insert(len(df_toronto_clustered.columns), 'Cluster Labels', toronto_kmeans.labels_)
df_suburbs_clustered.insert(len(df_suburbs_clustered.columns), 'Cluster Labels', suburbs_kmeans.labels_)

df_downtown_clustered.head() #Checkpoint

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels
4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,0
12,M5V,Downtown Toronto,CN Tower / King and Spadina / Railway Lands / ...,43.6404,-79.3995,0
5,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,0
6,M6G,Downtown Toronto,Christie,43.6683,-79.4205,3
16,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.383,0


In [85]:
# Plotting the clusters when only downtown is analyzed

map_clusters = folium.Map(location=[df_downtown_clustered['Latitude'][0], df_downtown_clustered['Longitude'][0]], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kcluster_downtown)
ys = [i + x + (i*x)**2 for i in range(kcluster_downtown)]
colors_array = cm.rainbow(np.linspace(0.1, 1.1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, bor, cluster in zip(df_downtown_clustered['Latitude'], df_downtown_clustered['Longitude'], df_downtown_clustered['Neighborhood'], df_downtown_clustered['Borough'], df_downtown_clustered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - ' + str(bor) + ' - ' + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [86]:
# Plotting the clusters when only the greater Toronto (excluding downtown) is analyzed

map_clusters = folium.Map(location=[df_downtown_clustered['Latitude'][0], df_downtown_clustered['Longitude'][0]], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kcluster_toronto)
ys = [i + x + (i*x)**2 for i in range(kcluster_toronto)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, bor, cluster in zip(df_toronto_clustered['Latitude'], df_toronto_clustered['Longitude'], df_toronto_clustered['Neighborhood'], df_toronto_clustered['Borough'], df_toronto_clustered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - ' + str(bor) + ' - ' + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [87]:
# Plotting the clusters when only the suburbs are analyzed
map_clusters = folium.Map(location=[df_downtown_clustered['Latitude'][0], df_downtown_clustered['Longitude'][0]], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kcluster_suburbs)
ys = [i + x + (i*x)**2 for i in range(kcluster_suburbs)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, bor, cluster in zip(df_suburbs_clustered['Latitude'], df_suburbs_clustered['Longitude'], df_suburbs_clustered['Neighborhood'], df_suburbs_clustered['Borough'], df_suburbs_clustered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - ' + str(bor) + ' - ' + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

As we can see, zone-based clustering produced results that are very similar to the whole area clustering, in particular for the Toronto area, likely due to the high number of venue hits obtained for those neighborhoods, strenghtening intracluster relationships. Let's look at the composition of different clusters:

In [88]:
#Defining a function that returns the 'num_top_venues' most common venue types for a certain neighborhood 'row'

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    row_categories_names = []
    
    for i in range(0, num_top_venues):
        if row_categories_sorted[i] > 0.0000:
            row_categories_names.append(row_categories_sorted.index.values[i])
        else:
            row_categories_names.append('---')

    return row_categories_names

In [89]:
#Defining a function that uses the function 'return_most_common_venues' and returns a complete DF, including Neighborhood name, cluster, # of hits and the top venues
 
def topVenueGenerator (df, kmeans, raw_data, num_top_venues=5):
    
    indicators = ['st', 'nd', 'rd']

    # create columns according to number of top venues
    columns = ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    # create a new dataframe
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood'] = df[' Neighborhood']

    for ind in np.arange(df.shape[0]):
        neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df.iloc[ind, :], num_top_venues)
        
    neighborhoods_venues_sorted.insert(1, 'Cluster Labels', kmeans.labels_)
    neighborhoods_venues_sorted.insert(1, 'Number of Hits', pd.DataFrame(raw_data.groupby('Neighborhood').size().reset_index(name='Group Count')['Group Count']))
    neighborhoods_venues_sorted.sort_values('Cluster Labels', inplace=True)

    return(neighborhoods_venues_sorted)

In [91]:
#Generate Top10 lists

total_top10 = topVenueGenerator(total_encoded, total_kmeans, Total_raw_data, num_top_venues=10)
downtown_top10 = topVenueGenerator(downtown_encoded, downtown_kmeans, Downtown_raw_data, num_top_venues=10)
toronto_top10 = topVenueGenerator(toronto_encoded, toronto_kmeans, Toronto_raw_data, num_top_venues=10)
suburbs_top10 = topVenueGenerator(suburbs_encoded, suburbs_kmeans, Suburbs_raw_data, num_top_venues=10)

In [92]:
#Top venues for all the neighborhoods analyzed

total_top10

Unnamed: 0,Neighborhood,Number of Hits,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,44,0,Chinese Restaurant,Shopping Mall,Coffee Shop,Bakery,Pizza Place,Sandwich Place,Bank,Motorcycle Shop,Badminton Court,Lounge
70,Runnymede / The Junction North,28,0,Coffee Shop,Brewery,Gas Station,Park,Pizza Place,Discount Store,Indian Restaurant,Burger Joint,Pharmacy,Sandwich Place
69,Runnymede / Swansea,61,0,Coffee Shop,Café,Pub,Bakery,Pizza Place,Italian Restaurant,Sushi Restaurant,Gastropub,Bank,Restaurant
68,Rouge Hill / Port Union / Highland Creek,5,0,Park,Home Service,Gym / Fitness Center,Paper / Office Supplies Store,Event Service,---,---,---,---,---
67,Roselawn,6,0,Playground,Bank,Pharmacy,Garden,Café,---,---,---,---,---
65,Richmond / Adelaide / King,68,0,Coffee Shop,Hotel,Asian Restaurant,Restaurant,Café,Salad Place,Steakhouse,American Restaurant,Seafood Restaurant,Japanese Restaurant
64,Regent Park / Harbourfront,17,0,Breakfast Spot,Chinese Restaurant,Greek Restaurant,Sandwich Place,Sporting Goods Shop,Furniture / Home Store,Electronics Store,Ethiopian Restaurant,Coffee Shop,Event Space
62,Parkwoods,18,0,Bus Stop,Park,Pizza Place,Train Station,Pharmacy,Food & Drink Shop,Supermarket,Laundry Service,Shopping Mall,Café
61,Parkview Hill / Woodbine Gardens,20,0,Pizza Place,Brewery,Bakery,Fast Food Restaurant,Intersection,Coffee Shop,Athletics & Sports,Bank,Rock Climbing Spot,Gastropub
60,Parkdale / Roncesvalles,73,0,Coffee Shop,Restaurant,Bakery,Café,Eastern European Restaurant,Pizza Place,Sushi Restaurant,Thai Restaurant,Gift Shop,Playground


In [93]:
#Top venues for toronto_downtown

downtown_top10

Unnamed: 0,Neighborhood,Number of Hits,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,17,0,Italian Restaurant,Park,Tailor Shop,Museum,Cocktail Bar,Mexican Restaurant,Restaurant,Café,Japanese Restaurant,Concert Hall
1,CN Tower / King and Spadina / Railway Lands / ...,18,0,Café,Park,Restaurant,Coffee Shop,Gym,Market,Caribbean Restaurant,Diner,Intersection,Ramen Restaurant
2,Central Bay Street,12,0,Coffee Shop,Bubble Tea Shop,Japanese Restaurant,Sushi Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Spa,Hotel,Ramen Restaurant,Gastropub
14,Stn A PO Boxes,48,0,Coffee Shop,Steakhouse,Sporting Goods Shop,Pub,Hotel,Italian Restaurant,Sandwich Place,Sports Bar,Fast Food Restaurant,New American Restaurant
4,Church and Wellesley,55,0,Gay Bar,Coffee Shop,Japanese Restaurant,Dessert Shop,Burger Joint,Gym / Fitness Center,Hobby Shop,Ice Cream Shop,Italian Restaurant,Juice Bar
5,"Garden District, Ryerson",50,0,Coffee Shop,Middle Eastern Restaurant,Café,Sandwich Place,Restaurant,Bar,Burrito Place,Diner,Ramen Restaurant,Pub
6,Harbourfront East / Union Station / Toronto Is...,3,0,Park,Athletics & Sports,Music Venue,---,---,---,---,---,---,---
7,Kensington Market / Chinatown / Grange Park,19,0,Café,Vietnamese Restaurant,Mexican Restaurant,Grocery Store,Bar,Farmers Market,Caribbean Restaurant,Cheese Shop,Cocktail Bar,Coffee Shop
15,Toronto Dominion Centre / Design Exchange,76,0,Coffee Shop,Restaurant,Deli / Bodega,Bakery,Salad Place,Café,Japanese Restaurant,Bar,Tea Room,Thai Restaurant
9,Regent Park / Harbourfront,17,0,Breakfast Spot,Yoga Studio,Furniture / Home Store,Event Space,Ethiopian Restaurant,Electronics Store,Mexican Restaurant,Coffee Shop,Chinese Restaurant,Sandwich Place


In [94]:
#Top venues for the greater Toronto, excluding downtown
toronto_top10

Unnamed: 0,Neighborhood,Number of Hits,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brockton / Parkdale Village / Exhibition Place,100,0,Café,Coffee Shop,Restaurant,Bar,Pharmacy,Bakery,Tibetan Restaurant,Lounge,Gift Shop,Italian Restaurant
16,The Annex / North Midtown / Yorkville,61,0,Italian Restaurant,Coffee Shop,Pub,Café,Sandwich Place,History Museum,American Restaurant,Grocery Store,Vegetarian / Vegan Restaurant,Park
15,Summerhill West / Rathnelly / South Hill / For...,51,0,Coffee Shop,Italian Restaurant,Sushi Restaurant,Pub,Grocery Store,Gym,Skating Rink,Restaurant,Bagel Shop,Pizza Place
14,Studio District,56,0,Coffee Shop,Café,American Restaurant,Bar,Brewery,Sandwich Place,Gastropub,Bakery,Clothing Store,Seafood Restaurant
13,Runnymede / Swansea,61,0,Coffee Shop,Café,Bakery,Pub,Pizza Place,Italian Restaurant,Sushi Restaurant,Bank,Restaurant,Gastropub
11,Parkdale / Roncesvalles,73,0,Coffee Shop,Restaurant,Bakery,Café,Eastern European Restaurant,Pharmacy,Bookstore,Sushi Restaurant,Playground,Pizza Place
10,North Toronto West,45,0,Coffee Shop,Clothing Store,Café,Sporting Goods Shop,Restaurant,Italian Restaurant,Diner,Dessert Shop,Bakery,Yoga Studio
17,The Beaches,13,0,Pub,Caribbean Restaurant,Health Food Store,Neighborhood,Cheese Shop,Sandwich Place,Bakery,Gastropub,Trail,Coffee Shop
9,Moore Park / Summerhill East,32,0,Grocery Store,Coffee Shop,Gym,Café,Thai Restaurant,Park,Sushi Restaurant,Yoga Studio,Cantonese Restaurant,Sandwich Place
6,India Bazaar / The Beaches West,53,0,Indian Restaurant,Fast Food Restaurant,Brewery,Donut Shop,Coffee Shop,Sandwich Place,Café,Restaurant,Gym,Grocery Store


In [95]:
#Top venues for the suburbs

suburbs_top10

Unnamed: 0,Neighborhood,Number of Hits,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,New Toronto / Mimico South / Humber Bay Shores,24,0,Park,Convenience Store,Italian Restaurant,Dessert Shop,Coffee Shop,Music Venue,Restaurant,Café,Bus Stop,Mexican Restaurant
43,Rouge Hill / Port Union / Highland Creek,5,0,Home Service,Gym / Fitness Center,Event Service,Park,Paper / Office Supplies Store,---,---,---,---,---
25,Humber Summit,6,0,Electronics Store,Construction & Landscaping,Skating Rink,Arts & Crafts Store,General Entertainment,---,---,---,---,---
16,Downsview - M3M,8,0,Bank,Spa,Restaurant,Park,Supermarket,Vietnamese Restaurant,Baseball Field,BBQ Joint,---,---
42,Parkwoods,18,0,Park,Bus Stop,Pharmacy,Train Station,Shopping Mall,Supermarket,Laundry Service,Food & Drink Shop,Café,Tennis Court
14,Downsview - M3K,23,0,Athletics & Sports,Racetrack,Turkish Restaurant,Coffee Shop,Skating Rink,Soccer Field,Sandwich Place,Food Court,Latin American Restaurant,Chinese Restaurant
51,West Deane Park / Princess Gardens / Martin Gr...,18,0,Park,Pizza Place,Convenience Store,Hotel,Clothing Store,Mexican Restaurant,Café,Theater,Bank,Restaurant
28,Islington Avenue,20,0,Pharmacy,Grocery Store,Bank,Park,Bakery,Spa,Shopping Mall,Liquor Store,Bus Stop,Café
22,Golden Mile / Clairlea / Oakridge,25,0,Park,Convenience Store,Bakery,Intersection,Bus Line,Grocery Store,Pub,Soccer Field,Filipino Restaurant,Bus Station
6,Caledonia-Fairbanks,25,0,Pizza Place,Beer Store,Mexican Restaurant,Park,Portuguese Restaurant,Japanese Restaurant,Food Truck,Discount Store,Bank,Bus Line


With that, we complete our clustering of neighborhoods in the Toronto area.