## Segmenting and Clustering Neighborhoods in Toronto | Part 1-3


### Tables of contents:
<a href="#item1">__Part 1__ of this notebook will create a pandas dataframe with Toronto's postal codes, boroughs, and neighborhoods.</a><br>
<a href="#item2">__Part 2__ of this notebook will get the latitude and the longitude coordinates for each of Toronto's neighborhoods.</a><br>
<a href="#item3">__Part 3__ of this notebook will explore and cluster Toronto's neighborhoods.</a><br>

<a id='item1'></a>

### __Part 1:__ Create dataframe of Toronto's neighborhoods

First, I build the code to scrape the Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, 
in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

Note: I could have scraped the table with BeautifulSoup (click __here__ to see the code), but found a much easier solution using pandas read_html method (see next cell)

<!-- 
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html = requests.get(url).text

soup = BeautifulSoup(html, 'html.parser')
#print(soup.prettify())
table = soup.find(lambda tag: tag.name=='table')
df = pd.read_html(str(table))[0]
df.rename(columns={'Postcode':'PostalCode', 'Neighbourhood':'Neighborhood'}, inplace=True)
df.head()
-->

In [1]:
# Get the table with pandas 'read_html' method
import pandas as pd

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df.rename(columns={'Postcode':'PostalCode', 'Neighbourhood':'Neighborhood'}, inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Now, I edit/clean the data frame as instructed.

__1)__ Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [2]:
df.Borough.value_counts()  # 77 rows have a borough that is not assigned
print('Not assigned boroughs:',(df.Borough=='Not assigned').sum())

df.drop(df[df.Borough=='Not assigned'].index, inplace=True)
df.reset_index(drop=True, inplace=True)
df.head()

Not assigned boroughs: 77


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


__2)__ If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [3]:
i_na = df[df.Neighborhood=='Not assigned'].index  # only observation #6 has a non-assigned Neighborhood

df.loc[i_na,'Neighborhood'] = df.loc[i_na,'Borough']
df.loc[i_na,:]

Unnamed: 0,PostalCode,Borough,Neighborhood
6,M7A,Queen's Park,Queen's Park


Now, let's quickly summarize the number of unique boroughs and neighborhoods in the resulting dataframe.

In [4]:
print('The resulting dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),len(df['Neighborhood'].unique())))

The resulting dataframe has 11 boroughs and 210 neighborhoods.


__3)__ More than one neighborhood can exist in one postal code area. These two rows will be combined into one row with the neighborhoods separated with a comma.

In [5]:
# loop over unique postal codes and join all boroughs and neighborhoods for each postal code in a new dataframe
pcodes = df.PostalCode.unique()
df_Tor = pd.DataFrame(columns=df.columns)
    
for pcode in pcodes:    
    boroughs = ', '.join(df[df.PostalCode==pcode].Borough.unique())
    neighborhoods = ', '.join(df[df.PostalCode==pcode].Neighborhood.unique())
    df_Tor = df_Tor.append({'PostalCode': pcode, 'Borough': boroughs,
                            'Neighborhood': neighborhoods}, ignore_index=True)

df_Tor.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


__4)__ In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [6]:
df_Tor.shape

(103, 3)

<a id='item2'></a>

### __Part 2:__ Get coordinates of Toronto's neighborhoods

First, I tried getting the coordinates using geocoder, but it didn't work since google denied the request (click __here__ to see my code for trying). Hence I used the csv file to get the coordinates.

<!-- 
#!conda install -c conda-forge geocoder --yes  # uncomment this line if you haven't installed geocoder yet
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None
postal_code = 'M5G'
i = 0  # to make sure the test loop ends at some point in case no result can be obtained

# loop until you get the coordinates
while(lat_lng_coords is None) and (i<=20):
    g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
    lat_lng_coords = g.latlng
    print(i,':',g)
    i=i+1

if (lat_lng_coords != None):
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]

    print(latitude)
    print(longitude)
-->

In [7]:
!wget -q -O 'Toronto_Lat_Lng.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

Data downloaded!


Now load the data into a pandas dataframe

In [8]:
df_LL = pd.read_csv('Toronto_Lat_Lng.csv')
df_LL.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
df_LL.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
# Before merging both data frames by PostalCode, check if they contain the same PostalCodes 
list(df_LL.PostalCode.sort_values()) == list(df_Tor.PostalCode.sort_values())

True

In [10]:
# Merge both data frames by PostalCode
df_TorLL = pd.merge(df_Tor, df_LL, on='PostalCode')
df_TorLL.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


<a id='item3'></a>

### __Part 3:__ Explore and cluster Toronto's neighborhoods

In this part, I will use the Foursquare API to explore Toronto's neighborhoods. I will use the _explore_ function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters by the the *k*-means clustering algorithm. Finally, I will use the _Folium_ library to visualize the neighborhoods in Toronto and their emerging clusters.

First download all the dependencies that we will need and that have not been downloaded yet.

In [11]:
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't installed geopy yet lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't installed folium yet
import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Libraries imported.')

Libraries imported.


Now we will use the geopy library to get the latitude and longitude values of Toronto. In order to define an instance of the geocoder, we need to define a user_agent, which we name <em>toronto_explorer</em>.

In [12]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Now we create a map of Toronto with neighborhoods superimposed on top.

In [13]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers for each neighborhood to map
for lat, lng, borough, neighborhood, pcode in zip(df_TorLL['Latitude'], df_TorLL['Longitude'], df_TorLL['Borough'], 
                                                  df_TorLL['Neighborhood'], df_TorLL['PostalCode']):
    label = '{} ({}, {})'.format(neighborhood, borough, pcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Wow, these are many. Let's now look only at the neighborhoods that are closest to Toronto's central coordinates. First we make a new dataframe <b>df_Central</b> that includes only the central neighborhoods. 

In [14]:
lat_min = latitude - 0.1
lat_max = latitude + 0.1
lon_min = longitude - 0.1
lon_max = longitude + 0.1
print(lat_min, lat_max, lon_min, lon_max)

df_Central = df_TorLL.copy()

for pcode in df_Central.PostalCode:
    lat = float(df_Central.Latitude[df_Central.PostalCode==pcode])
    lon = float(df_Central.Longitude[df_Central.PostalCode==pcode])
    
    if ((lat < lat_min) or (lat > lat_max)) or ((lon < lon_min) or (lon > lon_max)):
        df_Central.drop(df_Central[df_Central.PostalCode==pcode].index, inplace=True)
        
df_Central.reset_index(drop=True, inplace=True)
print(df_Central.shape)
df_Central.head()

43.553962999999996 43.753963 -79.487207 -79.28720700000001
(57, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


Now we create again a map of Toronto with only markers for these central neighborhoods.

In [15]:
map_central = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers for the central neighborhoods to map
for lat, lng, borough, neighborhood, pcode in zip(df_Central['Latitude'], df_Central['Longitude'], df_Central['Borough'], 
                                                  df_Central['Neighborhood'], df_Central['PostalCode']):
    label = '{} ({}, {})'.format(neighborhood, borough, pcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ffd000',
        fill_opacity=0.7,
        parse_html=False).add_to(map_central)  
    
map_central

By exploring the map and clicking on the markers, we can see that Postal codes with numbers 4, 5, and 6 and the most central areas. Hence, we will focus our following exploration on these central areas. Let's subset the dataframe accordingly

### Explore Toronto's neighborhoods with Foursquare
In the following, we are going to use the Foursquare API to explore Toronto's neighborhoods and to cluster them.

First, I define my Foursquare Credentials and Version

In [19]:
CLIENT_ID = 'AALGIC4UUYADR440BQ1QKMYAV3GG3P4HGEUEILXNCUORHUOR' # my Foursquare ID
CLIENT_SECRET = '0VL1YHW1YTJWWHOB1NSXYYBSZV50XEHHZRPTYPGZRVCA1VXQ' # my Foursquare Secret
VERSION = '20190302'  # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: AALGIC4UUYADR440BQ1QKMYAV3GG3P4HGEUEILXNCUORHUOR
CLIENT_SECRET:XXXX will keep this secret XXXX


Then, let's define a function that will output the top venues in each neighborhood.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    print('Wait...')
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we will run the above function on each central neighborhood and create a new dataframe called toronto_venues.

In [21]:
toronto_venues = getNearbyVenues(names=df_Central['Neighborhood'],
                                 latitudes=df_Central['Latitude'],
                                 longitudes=df_Central['Longitude'])
print('Done!')

Wait...
Done!


Let's check the size and first 5 rows of the resulting dataframe.

In [22]:
print(toronto_venues.shape)
toronto_venues.head()

(1909, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


Let's also check how many unique venues were returned.

In [23]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 251 uniques categories.


#### Build a dataframe with each neighborhood's mean frequencies of venue categories for clustering.

In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(57, 251)


Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
1,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now let's create the new dataframe and display the top 10 venue categories for each neighborhood. Therefore, we first define a function to sort the venues in descending order. 

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now we use this function to build a new dataframe with the to 10 venue categories for each neighborhood.

In [26]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Burger Joint,Hotel,Restaurant,Cosmetics Shop,Gym
1,"Bedford Park, Lawrence Manor East",Coffee Shop,Pizza Place,Fast Food Restaurant,Italian Restaurant,Juice Bar,Sushi Restaurant,Pharmacy,Pub,Café,Butcher
2,Berczy Park,Coffee Shop,Cocktail Bar,Italian Restaurant,Farmers Market,Café,Seafood Restaurant,Cheese Shop,Bakery,Beer Bar,Steakhouse
3,"Brockton, Exhibition Place, Parkdale Village",Café,Coffee Shop,Breakfast Spot,Grocery Store,Climbing Gym,Restaurant,Bar,Stadium,Caribbean Restaurant,Burrito Place
4,Business Reply Mail Processing Centre 969 Eastern,Park,Garden Center,Auto Workshop,Smoke Shop,Burrito Place,Fast Food Restaurant,Farmers Market,Garden,Brewery,Comic Shop


### Finally: Clustering Toronto's neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [27]:
# set number of clusters
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 1, 1, 0, 1, 1, 1, 1, 3, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 3,
       0, 1, 3, 0, 0, 1, 1, 0, 4, 1, 1, 1, 0, 1, 4, 1, 1, 3, 1, 3, 2, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [28]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = df_Central.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3,Fast Food Restaurant,Park,Food & Drink Shop,Women's Store,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Pizza Place,Hockey Arena,Intersection,Coffee Shop,Portuguese Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,1,Coffee Shop,Café,Pub,Bakery,Park,Gym / Fitness Center,Theater,Mexican Restaurant,Breakfast Spot,Restaurant
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1,Clothing Store,Accessories Store,Miscellaneous Shop,Event Space,Vietnamese Restaurant,Coffee Shop,Furniture / Home Store,Boutique,Women's Store,Dive Bar
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,1,Coffee Shop,Burger Joint,Japanese Restaurant,Gym,Diner,Restaurant,Fast Food Restaurant,Italian Restaurant,Smoothie Shop,Seafood Restaurant


### Let's visualize the resulting clusters

In [29]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' (Cluster ' + str(cluster) + ')', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Finally, examine clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we try to assign a name to each cluster (despite the remaining diversity of venue categories for each cluster).

#### __Cluster 1:__ Bus stations and parks

In [30]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,North York,0,Gym / Fitness Center,Café,Japanese Restaurant,Basketball Court,Caribbean Restaurant,Dog Run,Diner,Discount Store,Dive Bar,Doner Restaurant
8,North York,0,Pizza Place,Japanese Restaurant,Sushi Restaurant,Pub,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Deli / Bodega
18,Downtown Toronto,0,Café,Grocery Store,Park,Baby Store,Coffee Shop,Italian Restaurant,Convenience Store,Restaurant,Nightclub,Diner
20,Downtown Toronto,0,Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Burger Joint,Hotel,Restaurant,Cosmetics Shop,Gym
21,West Toronto,0,Pharmacy,Supermarket,Discount Store,Bakery,Liquor Store,Middle Eastern Restaurant,Music Venue,Park,Pool,Café
24,West Toronto,0,Bar,Coffee Shop,Asian Restaurant,Restaurant,Men's Store,Cocktail Bar,Pizza Place,Café,Bakery,Vietnamese Restaurant
27,Downtown Toronto,0,Coffee Shop,Hotel,Café,American Restaurant,Restaurant,Gastropub,Deli / Bodega,Bar,Italian Restaurant,Concert Hall
28,West Toronto,0,Café,Coffee Shop,Breakfast Spot,Grocery Store,Climbing Gym,Restaurant,Bar,Stadium,Caribbean Restaurant,Burrito Place
30,Downtown Toronto,0,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gym,Deli / Bodega,Italian Restaurant,Steakhouse,Bakery
31,East Toronto,0,Café,Coffee Shop,American Restaurant,Bakery,Italian Restaurant,Gastropub,Gym / Fitness Center,Diner,Park,New American Restaurant


#### __Cluster 2:__ Cafés & Pizza Places  
_Those would be my favorite neighborhoods :)_

In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Pizza Place,Hockey Arena,Intersection,Coffee Shop,Portuguese Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar
2,Downtown Toronto,1,Coffee Shop,Café,Pub,Bakery,Park,Gym / Fitness Center,Theater,Mexican Restaurant,Breakfast Spot,Restaurant
3,North York,1,Clothing Store,Accessories Store,Miscellaneous Shop,Event Space,Vietnamese Restaurant,Coffee Shop,Furniture / Home Store,Boutique,Women's Store,Dive Bar
4,Queen's Park,1,Coffee Shop,Burger Joint,Japanese Restaurant,Gym,Diner,Restaurant,Fast Food Restaurant,Italian Restaurant,Smoothie Shop,Seafood Restaurant
6,East York,1,Pizza Place,Fast Food Restaurant,Athletics & Sports,Gastropub,Intersection,Pet Store,Pharmacy,Rock Climbing Spot,Breakfast Spot,Bank
7,Downtown Toronto,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Juice Bar,Theater,Bubble Tea Shop,Pizza Place,Plaza,Fast Food Restaurant
9,North York,1,Grocery Store,Gym,Coffee Shop,Beer Store,Asian Restaurant,Chinese Restaurant,Restaurant,Italian Restaurant,Japanese Restaurant,Discount Store
10,East York,1,Beer Store,Cosmetics Shop,Skating Rink,Bus Stop,Asian Restaurant,Dance Studio,Curling Ice,Park,Pharmacy,Video Store
11,Downtown Toronto,1,Coffee Shop,Restaurant,Hotel,Café,Breakfast Spot,Italian Restaurant,Park,Cocktail Bar,Cosmetics Shop,Clothing Store
13,East Toronto,1,Coffee Shop,Health Food Store,Pub,Women's Store,Dive Bar,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant


#### __Cluster 3__ Fast Food, Parks & Women's Stores

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Central Toronto,2,Garden,Women's Store,Deli / Bodega,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


#### __Cluster 4:__ Gardens

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,3,Fast Food Restaurant,Park,Food & Drink Shop,Women's Store,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
15,York,3,Park,Women's Store,Fast Food Restaurant,Market,Pharmacy,German Restaurant,Curling Ice,Gluten-free Restaurant,Donut Shop,Doner Restaurant
22,East York,3,Park,Coffee Shop,Convenience Store,Women's Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar
36,North York,3,Park,Bank,Women's Store,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
38,Central Toronto,3,Trail,Park,Sushi Restaurant,Jewelry Store,Women's Store,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doner Restaurant
51,Downtown Toronto,3,Park,Trail,Playground,Women's Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run


#### __Cluster 5:__ Parks

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,York,4,Hockey Arena,Trail,Playground,Field,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar
47,Central Toronto,4,Gym,Playground,Trail,Restaurant,Colombian Restaurant,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


### __To conclude...__

Although I have (unfortunately) never been to Toronto, this notebook already gives me some ideas about the characteristics of certain neighborhoods in this city. I would love to go there someday and check if my cluster labels are appropriate :). <br>
#### _Thanks for completing this notebook! Karima_<br><br>