# Segmenting and Clustering Neighborhoods in Toronto: part 3

In this third part, we're gonna finish the *Segmenting and Clustering Neighborhoods in Toronto* project by exploring and segmenting neighborhoods in Toronto.

The following libraries will be required:

In [1]:
import requests

import pandas as pd 
import numpy as np
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

from geopy.geocoders import Nominatim
from pandas import json_normalize
from sklearn.cluster import KMeans

## 1. Loading the dataset

Let's begin by opening the dataframe I updated in [part 2](https://github.com/anaflvss/Coursera-Capstone/blob/master/03_neighborhoods_toronto_part_2.ipynb). This [CSV file](https://github.com/anaflvss/Coursera-Capstone/blob/master/data/post_code_toronto_2.csv) was also included in the repository.

In [2]:
df_o = pd.read_csv('../Coursera-Capstone/data/post_code_toronto_2.csv')
df_o.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Here, I decided to select only boroughs that contained the word **'Toronto'** in it.

In [3]:
df = df_o[df_o['Borough'].isin(['Central Toronto', 'Downtown Toronto', 'East Toronto', 'West Toronto'])]
df.reset_index(inplace=True, drop=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


Let's count how many neighborhoods there are in each borough.

In [4]:
df.groupby('Borough')['Neighborhood'].count()

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
West Toronto         6
Name: Neighborhood, dtype: int64

Next, I'm gonna get the coordinates of Toronto, explore it and visualize it using the **folium** library.

In [5]:
# specifing the address
address = 'Toronto, ON'

# creating a geolocator with the 'geopy.geocoders' library
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of Toronto are {latitude}, {longitude}.')

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [6]:
# creating a folium map
toronto_map = folium.Map(location=[latitude, longitude],
                        zoom_start=10)

# adding markers for the neighborhoods
for lat, long, bor, nbhd in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(nbhd, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=2,
        popup=label,
        color='black',
        fill=True,
        fill_color='blue',
        parse_html=False).add_to(toronto_map)  
    
toronto_map

## 2. Testing Foursquare API

Next, I'm gonna reproduce the same analysis we did in the **New York clustering notebook** from Coursera.

* To begin, let's test the Foursquare API.

In [7]:
# Foursquare API credentials 
CLIENT_ID = '2M0X3KDXQBWI1ZOICSOXVJQRURFMSVJSTP0WYAGIUNFKAL4Z' 
CLIENT_SECRET = 'KDBT0H4RFHPQUJF1CH4MLKGM1BSZUQRHLAOQSJ0NSZTVUHVL' 
VERSION = '20200711' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2M0X3KDXQBWI1ZOICSOXVJQRURFMSVJSTP0WYAGIUNFKAL4Z
CLIENT_SECRET:KDBT0H4RFHPQUJF1CH4MLKGM1BSZUQRHLAOQSJ0NSZTVUHVL


In [8]:
# I'm using the first neighborhood in the dataset: The Beaches

neighborhood_lat = df.loc[0, 'Latitude'] # neighborhood latitude value from the df
neighborhood_long = df.loc[0, 'Longitude'] # neighborhood longitude value from the df

neighborhood_name = df.loc[0, 'Neighborhood'] # neighborhood name from the df

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_lat, 
                                                               neighborhood_long))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


* Now, let's make a request to get the venues from this neighborhood.

In [9]:
# I create the url for the request using Foursquare API
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_lat, 
    neighborhood_long, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=2M0X3KDXQBWI1ZOICSOXVJQRURFMSVJSTP0WYAGIUNFKAL4Z&client_secret=KDBT0H4RFHPQUJF1CH4MLKGM1BSZUQRHLAOQSJ0NSZTVUHVL&v=20200711&ll=43.67635739999999,-79.2930312&radius=500&limit=100'

* Using the **requests** library, I make the request with the url created in the previous cell.

In [10]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f653d2fa1f69843e1db1b48'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'The Beaches',
  'headerFullLocation': 'The Beaches, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 43.680857404499996,
    'lng': -79.28682091449052},
   'sw': {'lat': 43.67185739549999, 'lng': -79.29924148550948}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bd461bc77b29c74a07d9282',
       'name': 'Glen Manor Ravine',
       'location': {'address': 'Glen Manor',
        'crossStreet': 'Queen St.',
        'lat': 43.67682094413784,
        'lng': -79.29394208780985,
        'labeledLatLngs': [{'labe

* The request was successful! Now, I'm gonna extract the category of the venues, using a function learned in the **New York clustering notebook**.

In [11]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [12]:
# getting the json file with venues from our request results
venues = results['response']['groups'][0]['items']

# flattening the json file
nearby_venues = json_normalize(venues) 

# filtering columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filtering the category for each row, using the function created in the previous cell
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# cleaning columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]


nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Glen Manor Ravine,Trail,43.676821,-79.293942
1,The Big Carrot Natural Food Market,Health Food Store,43.678879,-79.297734
2,Grover Pub and Grub,Pub,43.679181,-79.297215
3,Upper Beaches,Neighborhood,43.680563,-79.292869


In [13]:
# Let's check how many venues were returned.
print(f'{nearby_venues.shape[0]} venues were returned by Foursquare.')

4 venues were returned by Foursquare.


## 3. Applying Foursquare API to the whole dataset

Since the test was successful, I can now apply the Foursquare API to all the neighborhoods in the **df**.

* In this next cell, it's done all the process done in **section 2** in the format of a function.

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # creating the URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # making the request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

* So here, we can apply the function to all the neighborhoods from the df to get the venues from each one of them.

In [15]:
# The result will be a dataframe with the neighborhoods venues 
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'])

The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West,  Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High

* Now, I can examinate how many columns and rows this dataframe has it, and also visualize its structure.

In [16]:
print(toronto_venues.shape)
toronto_venues.head()

(1634, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


* Let's also examinate how many venues there is in each neighborhood.

In [17]:
toronto_venues.groupby('Neighborhood')['Venue'].count()

Neighborhood
Berczy Park                                                                                                    55
Brockton, Parkdale Village, Exhibition Place                                                                   23
Business reply mail Processing Centre, South Central Letter Processing Plant Toronto                           16
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport     16
Central Bay Street                                                                                             71
Christie                                                                                                       16
Church and Wellesley                                                                                           77
Commerce Court, Victoria Hotel                                                                                100
Davisville                                                                 

In [18]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 238 uniques categories.


* I'm gonna use some dummies to get the frequency of each kind of venue, so we can get the most common ones more ahead.

In [19]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [20]:
toronto_onehot.shape

(1634, 238)

* Next, we're gonna groupby the neighborhoods and get the mean.

In [21]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
#toronto_grouped

In [22]:
toronto_grouped.shape

(39, 238)

* Now, let's visualize the top 5 most frequent venues from each neighborhood.

In [23]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0         Coffee Shop  0.09
1  Seafood Restaurant  0.04
2            Beer Bar  0.04
3         Cheese Shop  0.04
4        Cocktail Bar  0.04


----Brockton, Parkdale Village, Exhibition Place----
            venue  freq
0            Café  0.13
1     Coffee Shop  0.09
2       Nightclub  0.09
3  Breakfast Spot  0.09
4   Grocery Store  0.04


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
              venue  freq
0       Pizza Place  0.06
1     Auto Workshop  0.06
2  Recording Studio  0.06
3        Restaurant  0.06
4           Butcher  0.06


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0    Airport Lounge  0.12
1   Airport Service  0.12
2       Coffee Shop  0.06
3   Harbor / Marina  0.06
4  Sculpture Garden  0.06


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.17
1   

* Let's put in evidence the most common venues for each neighbourhood! For that, I'm gonna create a function that returns the most common venues and then, reorganize the dataframe. The top number of venues will be 10.

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# creating columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# creating a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

# Let's visualize the new dataframe 
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Bakery,Cheese Shop,Beer Bar,Seafood Restaurant,Cocktail Bar,Restaurant,Farmers Market,Pharmacy,Breakfast Spot
1,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Breakfast Spot,Nightclub,Pet Store,Bar,Burrito Place,Restaurant,Climbing Gym,Performing Arts Venue
2,"Business reply mail Processing Centre, South C...",Recording Studio,Farmers Market,Brewery,Skate Park,Comic Shop,Park,Gym / Fitness Center,Butcher,Garden Center,Garden
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Coffee Shop,Harbor / Marina,Boutique,Boat or Ferry,Rental Car Location,Bar,Plane,Sculpture Garden
4,Central Bay Street,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Salad Place,Bubble Tea Shop,Burger Joint,Juice Bar,Japanese Restaurant,Thai Restaurant


### 4. Clustering

Now, let's cluster these neighborhoods.

In [26]:
# Selecting only the columns that will be used for the clustering
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# creating the model
kmeans = KMeans(n_clusters=3, random_state=12).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

In [27]:
# add clustering labels to th dataframe
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Health Food Store,Pub,Trail,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Women's Store
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Restaurant,Furniture / Home Store,Bookstore,Ice Cream Shop,Pub,Pizza Place,Lounge
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,2,Park,Sandwich Place,Fast Food Restaurant,Fish & Chips Shop,Pub,Steakhouse,Sushi Restaurant,Burrito Place,Pizza Place,Italian Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,2,Coffee Shop,Bakery,Gastropub,Brewery,Café,American Restaurant,Convenience Store,Seafood Restaurant,Cheese Shop,Pet Store
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,1,Park,Bus Line,Swim School,College Gym,College Rec Center,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


* To finnish this part, let's visualize the clusters using a **folium** map.

In [28]:
# creating map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# setting color scheme for the clusters
x = np.arange(3)
ys = [i + x + (i*x)**2 for i in range(3)]
colors_array = cm.Blues_r(np.linspace(0, 1, len(ys)))
blues = [colors.rgb2hex(i) for i in colors_array]

# adding markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=blues[cluster-1],
        fill=True,
        fill_color=blues[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine clusters 

Lastly, I'm gonna examine the three clusters I created.
#### 1. Cluster 1

In [29]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,0,Music Venue,Garden,Women's Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


This first cluster refers to only one entry, where the most common venues are stores and food related businesses.

#### 2. Cluster 2

In [30]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,1,Park,Bus Line,Swim School,College Gym,College Rec Center,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run
8,Central Toronto,1,Trail,Playground,Women's Store,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
10,Downtown Toronto,1,Park,Trail,Playground,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
23,Central Toronto,1,Park,Trail,Jewelry Store,Sushi Restaurant,College Rec Center,Colombian Restaurant,Eastern European Restaurant,Dumpling Restaurant,College Cafeteria,Donut Shop


The second cluster refers to 4 entries where the most common venues are parks and trails.

#### 3. Cluster 3

In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,2,Health Food Store,Pub,Trail,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Women's Store
1,East Toronto,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Restaurant,Furniture / Home Store,Bookstore,Ice Cream Shop,Pub,Pizza Place,Lounge
2,East Toronto,2,Park,Sandwich Place,Fast Food Restaurant,Fish & Chips Shop,Pub,Steakhouse,Sushi Restaurant,Burrito Place,Pizza Place,Italian Restaurant
3,East Toronto,2,Coffee Shop,Bakery,Gastropub,Brewery,Café,American Restaurant,Convenience Store,Seafood Restaurant,Cheese Shop,Pet Store
5,Central Toronto,2,Park,Hotel,Breakfast Spot,Dog Run,Sandwich Place,Food & Drink Shop,Department Store,Dance Studio,Gym / Fitness Center,Concert Hall
6,Central Toronto,2,Coffee Shop,Clothing Store,Yoga Studio,Mexican Restaurant,Salon / Barbershop,Diner,Restaurant,Fast Food Restaurant,Spa,Sporting Goods Shop
7,Central Toronto,2,Pizza Place,Sandwich Place,Dessert Shop,Sushi Restaurant,Coffee Shop,Italian Restaurant,Gym,Café,Restaurant,Seafood Restaurant
9,Central Toronto,2,Coffee Shop,Sports Bar,Vietnamese Restaurant,Bagel Shop,Pub,Light Rail Station,Bank,Restaurant,Liquor Store,American Restaurant
11,Downtown Toronto,2,Coffee Shop,Restaurant,Café,Pizza Place,Market,Pub,Bakery,Park,Chinese Restaurant,Italian Restaurant
12,Downtown Toronto,2,Coffee Shop,Gay Bar,Japanese Restaurant,Sushi Restaurant,Restaurant,Yoga Studio,Café,Men's Store,Hotel,Bubble Tea Shop


The last cluster has all the other entries and shows a broader variety of most common venues, mainly formed by coffee shops, bars and restaurants.