# Part 1

We will use Beautiful Soup to scrap the table containing the Toronto postal codes, boroughs, and neighborhoods from the Wikipedia link provided in the submission instructions.

In [1]:
#!pip install bs4    # uncomment if bs4 not already installed
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
print('All libraries for part 1 installed and imported!')

All libraries for part 1 installed and imported!


In [2]:
# loading the webpage containing the table as text and creating the BeautifulSoup object

table_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(table_url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"X-suCQpAMNAAAK2erk0AAABJ","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":995657573,"wgRevisionId":995657573,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Communications in Ontar

Now that we have the webpage that contains the table, let's extract it.

In [3]:
# Let's find the html tag that contains the table using beautiful soup's find method

Toronto_table = soup.find('table',{'class':'wikitable sortable'})
Toronto_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td

In [4]:
# Finding the individual elements of the table including the title and rows 

tr_list = Toronto_table.find('tbody').findAll('tr')
tr_list

[<tr>
 <th>Postal Code
 </th>
 <th>Borough
 </th>
 <th>Neighbourhood
 </th></tr>,
 <tr>
 <td>M1A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M2A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M3A
 </td>
 <td>North York
 </td>
 <td>Parkwoods
 </td></tr>,
 <tr>
 <td>M4A
 </td>
 <td>North York
 </td>
 <td>Victoria Village
 </td></tr>,
 <tr>
 <td>M5A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Regent Park, Harbourfront
 </td></tr>,
 <tr>
 <td>M6A
 </td>
 <td>North York
 </td>
 <td>Lawrence Manor, Lawrence Heights
 </td></tr>,
 <tr>
 <td>M7A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Queen's Park, Ontario Provincial Government
 </td></tr>,
 <tr>
 <td>M8A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M9A
 </td>
 <td>Etobicoke
 </td>
 <td>Islington Avenue, Humber Valley Village
 </td></tr>,
 <tr>
 <td>M1B
 </td>
 <td>Scarborough
 </td>
 <td>Malvern, Rouge
 </td></tr>,
 <tr>
 <td>M2B
 </td>
 <td>Not assigned
 </td>

It appears that in the html code above, the data in the table is arranged in rows, but we want to arrange in a column. So, we will begin extracting each data element of the table from the html code above and then arrange it as columns to be organized in a pandas dataframe. 

In [5]:
# Removing the titles 

tr_list.pop(0)
tr_list

[<tr>
 <td>M1A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M2A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M3A
 </td>
 <td>North York
 </td>
 <td>Parkwoods
 </td></tr>,
 <tr>
 <td>M4A
 </td>
 <td>North York
 </td>
 <td>Victoria Village
 </td></tr>,
 <tr>
 <td>M5A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Regent Park, Harbourfront
 </td></tr>,
 <tr>
 <td>M6A
 </td>
 <td>North York
 </td>
 <td>Lawrence Manor, Lawrence Heights
 </td></tr>,
 <tr>
 <td>M7A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Queen's Park, Ontario Provincial Government
 </td></tr>,
 <tr>
 <td>M8A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M9A
 </td>
 <td>Etobicoke
 </td>
 <td>Islington Avenue, Humber Valley Village
 </td></tr>,
 <tr>
 <td>M1B
 </td>
 <td>Scarborough
 </td>
 <td>Malvern, Rouge
 </td></tr>,
 <tr>
 <td>M2B
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M3B
 </td>
 <td>North York
 </td>
 <td>

In [6]:
# Looping through the html elements to remove the <tr> and </tr> tags. 

inter_list = []
for words in tr_list:
    for word in words:
        inter_list.append(word)
inter_list

['\n',
 <td>M1A
 </td>,
 '\n',
 <td>Not assigned
 </td>,
 '\n',
 <td>Not assigned
 </td>,
 '\n',
 <td>M2A
 </td>,
 '\n',
 <td>Not assigned
 </td>,
 '\n',
 <td>Not assigned
 </td>,
 '\n',
 <td>M3A
 </td>,
 '\n',
 <td>North York
 </td>,
 '\n',
 <td>Parkwoods
 </td>,
 '\n',
 <td>M4A
 </td>,
 '\n',
 <td>North York
 </td>,
 '\n',
 <td>Victoria Village
 </td>,
 '\n',
 <td>M5A
 </td>,
 '\n',
 <td>Downtown Toronto
 </td>,
 '\n',
 <td>Regent Park, Harbourfront
 </td>,
 '\n',
 <td>M6A
 </td>,
 '\n',
 <td>North York
 </td>,
 '\n',
 <td>Lawrence Manor, Lawrence Heights
 </td>,
 '\n',
 <td>M7A
 </td>,
 '\n',
 <td>Downtown Toronto
 </td>,
 '\n',
 <td>Queen's Park, Ontario Provincial Government
 </td>,
 '\n',
 <td>M8A
 </td>,
 '\n',
 <td>Not assigned
 </td>,
 '\n',
 <td>Not assigned
 </td>,
 '\n',
 <td>M9A
 </td>,
 '\n',
 <td>Etobicoke
 </td>,
 '\n',
 <td>Islington Avenue, Humber Valley Village
 </td>,
 '\n',
 <td>M1B
 </td>,
 '\n',
 <td>Scarborough
 </td>,
 '\n',
 <td>Malvern, Rouge
 </td>,
 '\n',
 

In [7]:
# Further cleaning the strings to extract the each element of each column 

refined_list=[]
for words in inter_list:
    if words!='\n':
        for word in words:
            word=word.strip('<td>').strip('</td>').strip('\n')
            refined_list.append(word)
refined_list

['M1A',
 'Not assigned',
 'Not assigned',
 'M2A',
 'Not assigned',
 'Not assigned',
 'M3A',
 'North York',
 'Parkwoods',
 'M4A',
 'North York',
 'Victoria Village',
 'M5A',
 'Downtown Toronto',
 'Regent Park, Harbourfront',
 'M6A',
 'North York',
 'Lawrence Manor, Lawrence Heights',
 'M7A',
 'Downtown Toronto',
 "Queen's Park, Ontario Provincial Government",
 'M8A',
 'Not assigned',
 'Not assigned',
 'M9A',
 'Etobicoke',
 'Islington Avenue, Humber Valley Village',
 'M1B',
 'Scarborough',
 'Malvern, Rouge',
 'M2B',
 'Not assigned',
 'Not assigned',
 'M3B',
 'North York',
 'Don Mills',
 'M4B',
 'East York',
 'Parkview Hill, Woodbine Gardens',
 'M5B',
 'Downtown Toronto',
 'Garden District, Ryerson',
 'M6B',
 'North York',
 'Glencairn',
 'M7B',
 'Not assigned',
 'Not assigned',
 'M8B',
 'Not assigned',
 'Not assigned',
 'M9B',
 'Etobicoke',
 'West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale',
 'M1C',
 'Scarborough',
 'Rouge Hill, Port Union, Highland Creek',
 'M2C',


In [8]:
# Creating the lists to populate each column of the table

postal_list=refined_list[::3]
borough_list=refined_list[1::3]
neigh_list=refined_list[2::3]

In [9]:
# Assigning titles and creating a dataframe

Tor_dict = {'PostalCode':postal_list, 'Borough':borough_list, 'Neighborhood':neigh_list}
Tor_df = pd.DataFrame(Tor_dict)
Tor_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [10]:
# Dropping the rows containing the Boroughs with  values: 'Not Assigned'

Tor_df.drop(Tor_df[Tor_df['Borough']=='Not assigned'].index, inplace = True)
Tor_df.reset_index(drop=True, inplace=True)
Tor_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [11]:
# Checking any Neighborhood with value of 'Not Assigned'

Tor_df[Tor_df['Neighborhood']=='Not assigned'].value_counts()

Series([], dtype: int64)

No neighborhoods with the value of 'Not assigned.' Now, let's check for any unusual patterns in the data. Since there should be fewer boroughs than neighborhoods or postal codes, let's group by boroughs and get the number of postal codes and neighborhoods associated with each borough.

In [12]:
Tor_df.groupby('Borough').count()

Unnamed: 0_level_0,PostalCode,Neighborhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Central Toronto,9,9
Downtown Toronto,19,19
East Toronto,5,5
East York,5,5
Etobicoke,12,12
Mississauga,1,1
North York,24,24
Scarborough,17,17
West Toronto,6,6
York,5,5


Interesting, the borough of Mississauga is the only one in the dataframe that has a single neighborhood and postal code associated with it. Let's take a look at it. 

In [13]:
Tor_df.loc[Tor_df['Borough']=='Mississauga']

Unnamed: 0,PostalCode,Borough,Neighborhood
76,M7R,Mississauga,Canada Post Gateway Processing Centre


Since only a postal processing center is listed as a neighborhood for the borough of Mississauga, this row is not relvant for our analysis so, lets remove it from the dataframe. 

In [14]:
Tor_df.drop(index=76, inplace=True)
Tor_df.shape

(102, 3)

This table has 102 rows (neighborhoods) and does not contain any borough or neighberhood with the value of 'Not assigned.' Therefore, it is ready for processing in parts 2 and 3.

# Part 2

In this part, we will use Nominatim from pgeocode to retrieve latitudes and longitudes from postal codes offline. This is a geocoding library available under a BSD license. This library was used after multiple calls using the geocoder as per the assignment instructions, were returned with 'none'. Please find the documentation for pgeocode here: https://pgeocode.readthedocs.io/en/latest/index.html

In [15]:
# !pip install pgeocode ---Uncomment if pgeocode is not installed

from pgeocode import Nominatim
print('Nominatim from pgeocode imported')

Nominatim from pgeocode imported


In [16]:
# Retrieving the latitude and longitude from the returned dataframe using Nominatim 

pcode = Tor_df['PostalCode'].values

nomi = Nominatim('Ca')
Tor_latlong_df = nomi.query_postal_code(pcode)[['postal_code','latitude','longitude']]
Tor_latlong_df.rename(columns={'postal_code':'PostalCode','latitude':'Latitude','longitude':'Longitude'}, inplace=True)

In [17]:
# Merging the main table with the one containing the corresponding latitudes and longitudes

Tor_df_complete = Tor_df.merge(Tor_latlong_df, how='inner', on='PostalCode')
Tor_df_complete.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889


In [18]:
# Checking the number of rows and columns of the merged dataframe before processing.
Tor_df_complete.shape

(102, 5)

Now, this table is ready to be used in part 3.

# Part 3.1

Now, let's obtain the number of boroughs and neighborhoods in Toronto from the table above.

In [19]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(Tor_df_complete['Borough'].unique()),
        Tor_df_complete.shape[0]
    )
)

The dataframe has 9 boroughs and 102 neighborhoods.


### Let's use geopy and folium to locate toronto and map these boroughs and neighborhoods on it.

In [20]:
# import all necessary libraries

import json 
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 

print('Libraries imported.')

Libraries imported.


Let's use geopy libraries to obtain latitudes and longitudes of Toronto.

In order to define an instance of the geocoder, we will define a user agent as *tor_explorer*.

In [21]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent='tor_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Now, let's create a map of Toronto with neighborhoods superimposed on it.

In [22]:
# create map of toronto using latitude and longitude values

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Tor_df_complete['Latitude'], Tor_df_complete['Longitude'], Tor_df_complete['Borough'], Tor_df_complete['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Now, let's consider a hypothetical case of Heather, who just secured an internal medicine residency at the Toronto General Hospital and is now exploring neighborhoods arund her future workplace. So, lets create a subset for the borough in which the hospital is located, Downtown Toronto.

In [23]:
downTor_df = Tor_df_complete[Tor_df_complete['Borough']=='Downtown Toronto'].reset_index(drop=True)
downTor_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756
4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754


Let's get the gographical coordinates of Downtown Toronto to visualize the neighborhoods in it. 

In [24]:
address = 'Downtown Toronto, Ontario'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 43.6563221, -79.3809161.


In [25]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downTor_df['Latitude'], downTor_df['Longitude'], downTor_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

Now, let's narrow down the search for suitable neighborhoods by their distances from the neighborhood in which the Toronto General Hospital is located. We can focus on 10 closest neighborhoods from the center of the neighborhood in which the hospital is located. Let's begin by finding the neighborhood in which the hospital is located based on the postal code, 'M5G' (retreived from a quick google search) associated with it. 

In [26]:
# Retrieving the neighborhood associated with the postal code 'M5G'

downTor_df.loc[downTor_df['PostalCode']=='M5G']

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
5,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386


#### Use pgeocode to obtain 9 other neighorhoods with shortest distances from the Central Bay Street neighborhood.

First, let's get the neighborhoods with distances from Central Bay Street.

In [27]:
from pgeocode import GeoDistance

distance = GeoDistance('Ca')
tempdists = []

for code in downTor_df['PostalCode']:
    dist = distance.query_postal_code('M5G',code)
    tempdists.append(dist)
downTor_df['Distances (Km) from Central Bay Street'] = pd.DataFrame(tempdists)
downTor_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Distances (Km) from Central Bay Street
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,1.885175
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0.887415
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,0.625807
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,1.010777
4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,1.472924


Now, let's 

In [28]:
# Create a dataframe containing the 10 closest neighborhoods to Central Bay Street including itself
pcodes_closest = downTor_df.sort_values(by='Distances (Km) from Central Bay Street').iloc[0:10]
pcodes_closest.reset_index(drop=True, inplace=True)

# Getting the latitude and longitude for the Central Bay Street neighborhood
neigh_lat = pcodes_closest.loc[0, 'Latitude']
neigh_lon = pcodes_closest.loc[0, 'Longitude']
print(neigh_lat, neigh_lon)
pcodes_closest

43.6564 -79.38600000000002


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Distances (Km) from Central Bay Street
0,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,0.0
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,0.625807
2,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6496,-79.3833,0.786711
3,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.6492,-79.3823,0.854155
4,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.6492,-79.3823,0.854155
5,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0.887415
6,M5T,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.6541,-79.3978,0.983161
7,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,1.010777
8,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.383,1.051074
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.6469,-79.3823,1.097496


Now, let's start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquares Credentials and Version

In [29]:
CLIENT_ID = '0WTINNBRW1VPI02ZOKOMZ1DMEJMZPF1CDGNK52OTDR1KINGT' 
CLIENT_SECRET = 'BK042E1H1BZ41BWFNA3HKI5YN5NU0MTSWBEVIXTNLK1VRTJ0' 
VERSION = '20210115' 
LIMIT = 100
radius = 500

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0WTINNBRW1VPI02ZOKOMZ1DMEJMZPF1CDGNK52OTDR1KINGT
CLIENT_SECRET:BK042E1H1BZ41BWFNA3HKI5YN5NU0MTSWBEVIXTNLK1VRTJ0


#### Let's start our exploration with Central Bay Street since the hospital is in this neighborhood.

Let's get the top 100 venues that are in the Central Bay Street within a raidus of 500 meters begining with a url for the get request to the foursquare api.

In [30]:
url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    neigh_lat,
    neigh_lon,
    radius,
    LIMIT
)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60038fa64c759760aca0b271'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 59,
  'suggestedBounds': {'ne': {'lat': 43.6609000045, 'lng': -79.37979177890104},
   'sw': {'lat': 43.651899995499996, 'lng': -79.39220822109901}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '537d4d6d498ec171ba22e7fe',
       'name': "Jimmy's Coffee",
       'location': {'address': '82 Gerrard Street W',
        'crossStreet': 'Gerrard & LaPlante',
        'lat': 43.65842123574496,
        'lng': -79.38561319551111,
        '

Now, let's borrow the get_category_type function from the Foursquare lab.

In [31]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now, let's clean the json and structure it into a *pandas* dataframe.

In [32]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Jimmy's Coffee,Coffee Shop,43.658421,-79.385613
1,Japango,Sushi Restaurant,43.655268,-79.385165
2,The Queen and Beaver Public House,Gastropub,43.657472,-79.383524
3,Chatime 日出茶太,Bubble Tea Shop,43.655542,-79.384684
4,The Elm Tree Restaurant,Modern European Restaurant,43.657397,-79.383761


Let's find out how many venues were returned by Foursquare?

In [33]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

59 venues were returned by Foursquare.


# Part 3.2 Explore Neighborhoods in Downtown Toronto

#### Let's borrow a function from the NYC neighborhoods lab to repeat the same process with all the neighborhoods in Downtown Toronto. 

In [34]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now, let's run the above function on each neighborhood from the ones in the pcodes_closest neighborhood and create a new neighborhood called *cenbayst_venues*.

In [35]:
cenbayst_venues = getNearbyVenues(names=pcodes_closest['Neighborhood'],
                                  latitudes=pcodes_closest['Latitude'],
                                  longitudes=pcodes_closest['Longitude']
                                  )

print(cenbayst_venues.shape)
cenbayst_venues.head()

Central Bay Street
Garden District, Ryerson
Richmond, Adelaide, King
Commerce Court, Victoria Hotel
First Canadian Place, Underground city
Queen's Park, Ontario Provincial Government
Kensington Market, Chinatown, Grange Park
St. James Town
Church and Wellesley
Toronto Dominion Centre, Design Exchange
(807, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central Bay Street,43.6564,-79.386,Jimmy's Coffee,43.658421,-79.385613,Coffee Shop
1,Central Bay Street,43.6564,-79.386,Japango,43.655268,-79.385165,Sushi Restaurant
2,Central Bay Street,43.6564,-79.386,The Queen and Beaver Public House,43.657472,-79.383524,Gastropub
3,Central Bay Street,43.6564,-79.386,Chatime 日出茶太,43.655542,-79.384684,Bubble Tea Shop
4,Central Bay Street,43.6564,-79.386,The Elm Tree Restaurant,43.657397,-79.383761,Modern European Restaurant


Now, let's check how many venues were returned for each neighborhood

In [36]:
cenbayst_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Bay Street,59,59,59,59,59,59
Church and Wellesley,79,79,79,79,79,79
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Kensington Market, Chinatown, Grange Park",54,54,54,54,54,54
"Queen's Park, Ontario Provincial Government",28,28,28,28,28,28
"Richmond, Adelaide, King",100,100,100,100,100,100
St. James Town,87,87,87,87,87,87
"Toronto Dominion Centre, Design Exchange",100,100,100,100,100,100


Now, let's find out how many unique categories are associated with all of these venues. 

In [37]:
print('There are {} uniques categories.'.format(len(cenbayst_venues['Venue Category'].unique())))

There are 156 uniques categories.


In [38]:
cenbayst_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central Bay Street,43.6564,-79.386,Jimmy's Coffee,43.658421,-79.385613,Coffee Shop
1,Central Bay Street,43.6564,-79.386,Japango,43.655268,-79.385165,Sushi Restaurant
2,Central Bay Street,43.6564,-79.386,The Queen and Beaver Public House,43.657472,-79.383524,Gastropub
3,Central Bay Street,43.6564,-79.386,Chatime 日出茶太,43.655542,-79.384684,Bubble Tea Shop
4,Central Bay Street,43.6564,-79.386,The Elm Tree Restaurant,43.657397,-79.383761,Modern European Restaurant


# 3.3 Analyze Each Neighborhood

In [39]:
# one hot encoding
cenbayst_onehot = pd.get_dummies(cenbayst_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column to the dataframe
cenbayst_onehot['Neighborhood'] = cenbayst_venues['Neighborhood']

# Determine the position of this column in the dataframe 
print(cenbayst_onehot.columns.get_loc('Neighborhood'))
print(cenbayst_onehot.shape)

106
(807, 156)


Now, let's move the Neighborhood and Distances columns to the first and second column respectively. 

In [40]:
# move neighborhood column to the first position
fixed_columns = [cenbayst_onehot.columns[-50]] + list(cenbayst_onehot.columns[:105]) + list(cenbayst_onehot.columns[107:])
cenbayst_onehot = cenbayst_onehot[fixed_columns]
cenbayst_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Central Bay Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Central Bay Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Central Bay Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Central Bay Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Central Bay Street,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Now, let's start to figure out the most common venue category by grouping the rows by neighborhood and by taking the mean of the frequency of eeach category. 

In [41]:
cenbayst_grouped = cenbayst_onehot.groupby('Neighborhood').mean().reset_index()
cenbayst_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Central Bay Street,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.016949,0.0
1,Church and Wellesley,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,...,0.012658,0.012658,0.012658,0.012658,0.0,0.0,0.0,0.012658,0.0,0.025316
2,"Commerce Court, Victoria Hotel",0.03,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.02,...,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0
3,"First Canadian Place, Underground city",0.03,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.02,...,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0
4,"Garden District, Ryerson",0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,...,0.01,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0
5,"Kensington Market, Chinatown, Grange Park",0.0,0.018519,0.0,0.037037,0.0,0.0,0.037037,0.0,0.018519,...,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.037037,0.018519,0.0
6,"Queen's Park, Ontario Provincial Government",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Richmond, Adelaide, King",0.03,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.02,...,0.01,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0
8,St. James Town,0.034483,0.011494,0.0,0.0,0.011494,0.011494,0.022989,0.0,0.011494,...,0.0,0.011494,0.011494,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0
9,"Toronto Dominion Centre, Design Exchange",0.03,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.01,...,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0


#### Let's confirm the size of the this new grouped dataframe.

In [42]:
cenbayst_grouped.shape

(10, 155)

#### Let's print the top 5 most common venues for each neighborhood

In [43]:
num_top_venues = 5

for hood in cenbayst_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = cenbayst_grouped[cenbayst_grouped['Neighborhood'] == hood].T.reset_index() #using T as an accessor to the transverse method
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Bay Street----
                       venue  freq
0                Coffee Shop  0.22
1                       Café  0.07
2  Middle Eastern Restaurant  0.05
3         Italian Restaurant  0.03
4             Sandwich Place  0.03


----Church and Wellesley----
                  venue  freq
0           Coffee Shop  0.06
1   Japanese Restaurant  0.06
2      Sushi Restaurant  0.05
3  Fast Food Restaurant  0.04
4            Restaurant  0.04


----Commerce Court, Victoria Hotel----
         venue  freq
0  Coffee Shop  0.10
1        Hotel  0.07
2         Café  0.06
3   Restaurant  0.05
4          Gym  0.04


----First Canadian Place, Underground city----
         venue  freq
0  Coffee Shop  0.10
1        Hotel  0.07
2         Café  0.06
3   Restaurant  0.05
4          Gym  0.04


----Garden District, Ryerson----
                       venue  freq
0                Coffee Shop  0.11
1             Clothing Store  0.07
2                       Café  0.04
3  Middle Eastern Restaurant  0.03


#### Now, let's put these results into a dataframe.

First, let's write a function to sort the venues in descending order.

In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now, let's create the new dataframe and display the top 10 vanues for each neighborhood.

In [45]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = cenbayst_grouped['Neighborhood']

for ind in np.arange(cenbayst_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cenbayst_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Bay Street,Coffee Shop,Café,Middle Eastern Restaurant,Bubble Tea Shop,Italian Restaurant,Pizza Place,Sandwich Place,Discount Store,Donut Shop,Comic Shop
1,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Fast Food Restaurant,Café,Grocery Store,Hotel,Yoga Studio
2,"Commerce Court, Victoria Hotel",Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant
3,"First Canadian Place, Underground city",Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant
4,"Garden District, Ryerson",Coffee Shop,Clothing Store,Café,Cosmetics Shop,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Bookstore,Lingerie Store,Diner


#### Before running cluster analysis, let's insert the distances of each neighborhood to Central Bay St.

# Part 3.4 Custer Neighborhoods

Now, let's run k-means to cluster the neighborhood into 5 clusters.

In [46]:
# set number of clusters
kclusters = 5

cenbayst_grouped_clustering = cenbayst_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cenbayst_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 4, 1, 1, 0, 3, 2, 1, 1, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [47]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
cenbayst_merged = downTor_df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
cenbayst_merged = cenbayst_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
cenbayst_merged.sort_values(by='Distances (Km) from Central Bay Street', inplace=True)
cenbayst_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Distances (Km) from Central Bay Street,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,0.0,0,Coffee Shop,Café,Middle Eastern Restaurant,Bubble Tea Shop,Italian Restaurant,Pizza Place,Sandwich Place,Discount Store,Donut Shop,Comic Shop
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,0.625807,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Bookstore,Lingerie Store,Diner
4,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6496,-79.3833,0.786711,1,Café,Coffee Shop,Gym,Restaurant,Hotel,Salad Place,Steakhouse,Sushi Restaurant,American Restaurant,Asian Restaurant
6,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.6492,-79.3823,0.854155,1,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant
8,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.6492,-79.3823,0.854155,1,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant


Now, let's visualize the resulting clusters.

In [48]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cenbayst_merged['Latitude'], cenbayst_merged['Longitude'], cenbayst_merged['Neighborhood'], cenbayst_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [49]:
cenbayst_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Distances (Km) from Central Bay Street,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,0.0,0,Coffee Shop,Café,Middle Eastern Restaurant,Bubble Tea Shop,Italian Restaurant,Pizza Place,Sandwich Place,Discount Store,Donut Shop,Comic Shop
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,0.625807,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Bookstore,Lingerie Store,Diner
4,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6496,-79.3833,0.786711,1,Café,Coffee Shop,Gym,Restaurant,Hotel,Salad Place,Steakhouse,Sushi Restaurant,American Restaurant,Asian Restaurant
6,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.6492,-79.3823,0.854155,1,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant
8,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.6492,-79.3823,0.854155,1,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant
0,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0.887415,2,Sushi Restaurant,Hobby Shop,Café,Ramen Restaurant,College Cafeteria,Portuguese Restaurant,College Theater,Creperie,Dance Studio,Nightclub
7,M5T,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.6541,-79.3978,0.983161,3,Café,Mexican Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Caribbean Restaurant,Gaming Cafe,Vietnamese Restaurant,Arts & Crafts Store,Grocery Store,Bakery
2,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,1.010777,1,Coffee Shop,Seafood Restaurant,Café,American Restaurant,Italian Restaurant,Cocktail Bar,Gastropub,Clothing Store,Farmers Market,Department Store
9,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.383,1.051074,4,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Fast Food Restaurant,Café,Grocery Store,Hotel,Yoga Studio
5,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.6469,-79.3823,1.097496,1,Coffee Shop,Hotel,Café,American Restaurant,Japanese Restaurant,Salad Place,Seafood Restaurant,Breakfast Spot,Italian Restaurant,Restaurant


# 3.5 Examine Clusters

Now, let's examine each cluster while keeping in mind the distance of each neighborhood in the cluster from Central Bay Street. 

In [56]:
cenbayst_merged.loc[cenbayst_merged['Cluster Labels'] == 0, cenbayst_merged.columns[[2] + list(range(5, cenbayst_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Distances (Km) from Central Bay Street,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Central Bay Street,0.0,0,Coffee Shop,Café,Middle Eastern Restaurant,Bubble Tea Shop,Italian Restaurant,Pizza Place,Sandwich Place,Discount Store,Donut Shop,Comic Shop
1,"Garden District, Ryerson",0.625807,0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Bookstore,Lingerie Store,Diner


In [57]:
cenbayst_merged.loc[cenbayst_merged['Cluster Labels'] == 1, cenbayst_merged.columns[[2] + list(range(5, cenbayst_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Distances (Km) from Central Bay Street,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"Richmond, Adelaide, King",0.786711,1,Café,Coffee Shop,Gym,Restaurant,Hotel,Salad Place,Steakhouse,Sushi Restaurant,American Restaurant,Asian Restaurant
6,"Commerce Court, Victoria Hotel",0.854155,1,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant
8,"First Canadian Place, Underground city",0.854155,1,Coffee Shop,Hotel,Café,Restaurant,Gym,Japanese Restaurant,Steakhouse,Salad Place,Seafood Restaurant,American Restaurant
2,St. James Town,1.010777,1,Coffee Shop,Seafood Restaurant,Café,American Restaurant,Italian Restaurant,Cocktail Bar,Gastropub,Clothing Store,Farmers Market,Department Store
5,"Toronto Dominion Centre, Design Exchange",1.097496,1,Coffee Shop,Hotel,Café,American Restaurant,Japanese Restaurant,Salad Place,Seafood Restaurant,Breakfast Spot,Italian Restaurant,Restaurant


In [58]:
cenbayst_merged.loc[cenbayst_merged['Cluster Labels'] == 2, cenbayst_merged.columns[[2] + list(range(5, cenbayst_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Distances (Km) from Central Bay Street,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Queen's Park, Ontario Provincial Government",0.887415,2,Sushi Restaurant,Hobby Shop,Café,Ramen Restaurant,College Cafeteria,Portuguese Restaurant,College Theater,Creperie,Dance Studio,Nightclub


In [59]:
cenbayst_merged.loc[cenbayst_merged['Cluster Labels'] == 3, cenbayst_merged.columns[[2] + list(range(5, cenbayst_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Distances (Km) from Central Bay Street,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,"Kensington Market, Chinatown, Grange Park",0.983161,3,Café,Mexican Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Caribbean Restaurant,Gaming Cafe,Vietnamese Restaurant,Arts & Crafts Store,Grocery Store,Bakery


In [60]:
cenbayst_merged.loc[cenbayst_merged['Cluster Labels'] == 4, cenbayst_merged.columns[[2] + list(range(5, cenbayst_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Distances (Km) from Central Bay Street,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Church and Wellesley,1.051074,4,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Fast Food Restaurant,Café,Grocery Store,Hotel,Yoga Studio


#### So, it apears that the neighborhoods in cluster 0: Central Bay Street, Garden District and Ryerson offer highly recommended diverse cuisine such as Italian, and Japanese, and Middle Eastern as well as convenience food such as pizza, sandwich, and donut to grab some food before Heather heads home after a late night or an over night shift at the hospital. However, for precisely these reasons, the rent prices may be high in this neighborhood. In that case, the neighborhoods in cluster 1 may be a good alternative as they offer similar diversity in cuisine and are within roughly 1 kilometer from the center of the neighborhood in which her hospital is located, Central Bay Street. The next step in helping her in the apartment hunting would be to add the rent prices for each neighborhood in the analysis. 