<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

# Part 1
## Create Initial Dataframe with Toronto Postal Codes, Boroughs, and Neighborhoods
Scrape Wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M in order to obtain the data that is in the table of postal codes and transform the data into a pandas dataframe.

Download Dependencies

In [14]:
import pandas as pd # library for data analysis

# !conda install -c conda-forge beautifulsoup4 --yes
from bs4 import BeautifulSoup

import requests # library to handle requests

Read the wiki page into a file and parse it.

In [15]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html_file = requests.get(url).text
soup = BeautifulSoup(html_file, 'lxml')

Drill down into the table to extract headings and rows, then create DataFrame with headings.

In [16]:
table = soup.find('table', class_='wikitable sortable') # Get the Postcode/Borough/Neighbourgood table.
headings = table.find_all('th') # Extract the 3 column headings.
rows = table.find_all('tr') # Get all rows in the table
# Create a DataFrame with the column headings, removing the newline character from the 3rd heading.
df = pd.DataFrame(columns=[headings[0].text, headings[1].text, headings[2].text.split('\n')[0]])

Loop through all rows to build the DataFrame one row at a time.

In [17]:
for row in rows[1:]: # Skip the 1st (header) row.
    # Get the Postcode, Borough, and Neighbourhood for the current row,
    # removing the trailing newline from the neighborhood.
    columns = row.find_all('td')
    postcode = columns[0].text
    borough = columns[1].text
    neighborhood = columns[2].text.split('\n')[0]
    if borough != 'Not assigned': # Skip any rows without a borough.
        if neighborhood == 'Not assigned': # Unassigned neighborhoods take on the name of their borough.
            neighborhood = borough
        if postcode in df['Postcode'].values:
            # Group neighboorhoods within same postcode into single postcode row.
            # Assumption: A postcode includes only one borough
            # (though boroughs may span multiple postcodes).
            df.loc[df['Postcode'] == postcode, 'Neighbourhood'] = \
            neighborhood + ", " + df[df['Postcode'] == postcode]['Neighbourhood']
        else: # Add row for new postcode, borough, and neighborhood.
            df = df.append(pd.Series([postcode, borough, neighborhood], index=df.columns), ignore_index=True)

In [18]:
df.head(12) # Display the first 12 rows of the DataFrame

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [19]:
df.shape

(103, 3)

# Part 2
## Get the latitude and longitude coordinates of each neighborhood
Add Latitude and Longitude columns to the DataFrame.

Download dependencies

In [20]:
# !conda install -c conda-forge geocoder --yes
import geocoder # import geocoder

Loop through each postal code to get latitude and longitude, then add to DataFrame as new columns

In [21]:
latitudes = []
longitudes = []
for postal_code in df['Postcode']:
    # using geocoder.arcgis rather than geocoder.google as the latter did not work
    g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))

    latitudes.append(g.latlng[0])
    longitudes.append(g.latlng[1])

df['Latitude'] = latitudes
df['Longitude'] = longitudes

In [22]:
df.head(12) # Display the first 12 rows of the DataFrame

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75242,-79.329242
1,M4A,North York,Victoria Village,43.7306,-79.313265
2,M5A,Downtown Toronto,Harbourfront,43.650295,-79.359166
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.451286
4,M7A,Downtown Toronto,Queen's Park,43.66115,-79.391715
5,M9A,Etobicoke,Islington Avenue,43.662299,-79.528195
6,M1B,Scarborough,"Malvern, Rouge",43.811525,-79.195517
7,M3B,North York,Don Mills North,43.749055,-79.362227
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.707535,-79.311773
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657363,-79.37818


In [23]:
df.shape

(103, 5)

# Part 3
## Explore and cluster the neighborhoods in Toronto

Download dependencies

In [24]:
import numpy as np # library to handle data in a vectorized manner

from sklearn.cluster import KMeans # import k-means from clustering stage

# !conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#### Given a list of location names, latitudes, and longitudes, use for Foursquare API to get nearby venues for each, within a default radius of 500 meters, returning a DataFrame with all nearby venues for all locations

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    CLIENT_ID = 'OUYX00KEB5TB2PFSH34L0VZ5DP0V2Z0KT4CGOKSC4GKZYD2I' # Foursquare ID
    CLIENT_SECRET = '3T5MIGFTBSF4AZGOJW5YK3IPERWYKE53HPG42RK5GXPCSU3T' # Foursquare Secret
    VERSION = '20180605' # Foursquare API version
    LIMIT = 100 # Max venues to return

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # Get the Foursquare items info for each of the nearby venues
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Append a row for each nearby venue, including the location name, latitude, and longitude
        # along with the venue name, latitude, longitude, and category name.
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Postcode Latitude', 
                  'Postcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Call getNearbyVenues on each Postcode to create a new dataframe with a row for each venue within each Postcode.

In [26]:
toronto_venues = getNearbyVenues(names=df['Postcode'], latitudes=df['Latitude'], longitudes=df['Longitude'])

Check out how many venues were returned and look at the first few rows of the resultant DataFrame.

In [27]:
print(toronto_venues.shape)
toronto_venues.head()

(2437, 7)


Unnamed: 0,Postcode,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.75242,-79.329242,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,43.75242,-79.329242,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M4A,43.7306,-79.313265,Wigmore Park,43.731023,-79.310771,Park
3,M4A,43.7306,-79.313265,Memories of Africa,43.726602,-79.312427,Grocery Store
4,M4A,43.7306,-79.313265,Guardian Drug,43.730584,-79.307432,Pharmacy


In [28]:
print('There are {} unique categories and {} Postcodes with venues.'.format \
      (len(toronto_venues['Venue Category'].unique()), len(toronto_venues['Postcode'].unique())))

There are 253 unique categories and 97 Postcodes with venues.


Of 103 Postcodes, venues were found for 97 of them, so 6 are without a single venue (as of this March 7, 2020 run).

We'll use one hot encoding to create a DataFrame suitable for k-means clustering.

In [29]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add Postcode column back to dataframe
toronto_onehot['Postcode'] = toronto_venues['Postcode'] 

# move Postcode column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Postcode,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now we examine the new DataFrame size.

In [30]:
toronto_onehot.shape

(2437, 254)

Next, let's group rows by Postcode, taking the mean of the frequency of occurrence of each venue category within each Postcode.

In [31]:
toronto_grouped = toronto_onehot.groupby('Postcode').mean().reset_index()
toronto_grouped

Unnamed: 0,Postcode,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1C,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
1,M1E,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
2,M1G,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
3,M1H,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
4,M1J,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
5,M1K,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
6,M1L,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
7,M1M,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
8,M1N,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000
9,M1P,0.0,0.00,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0000,0.0,0.000000,0.000000


Sort the venues in descending order.

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a new DataFrame with the top 10 venues for each Postcode.

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postcodes_venues_sorted = pd.DataFrame(columns=columns)
postcodes_venues_sorted['Postcode'] = toronto_grouped['Postcode']

for ind in np.arange(toronto_grouped.shape[0]):
    postcodes_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postcodes_venues_sorted.head()

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1C,History Museum,Bar,Cosmetics Shop,Costume Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,M1E,Construction & Landscaping,Park,Gym / Fitness Center,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Farm
2,M1G,Park,Business Service,Coffee Shop,Korean Restaurant,Yoga Studio,Ethiopian Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Falafel Restaurant
3,M1H,Playground,Trail,Yoga Studio,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Event Space
4,M1J,Restaurant,Train Station,Indian Restaurant,Grocery Store,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School


### Clustering
I settled on 5 as the optimal number of clusters because higher numbers of clusters resulted in too many very small clusters (multiple clusters of 1) and lower numbers of clusters were not granular enough. For example, 10 clusters resulted in 5 clusters with just 1 member each and 3 clusters resulted in 1 cluster with 92 members (95% of the members!).
#### Clustering counts summary
Number of members in cluster (x number of clusters with same number of members)
* 10 clusters: 1 x 5, 2, 4, 8, 15, 63
* 7 clusters: 1 x 2, 2, 3 x 2, 25, 62
* 6 clusters: 1 x 3, 2, 17, 75
* 5 clusters: 1, 2, 7, 18, 69
* 4 clusters: 1, 3, 13, 80
* 3 clusters: 1, 4, 92

Run k-means to cluster the neighborhood into 5 clusters.

In [34]:
kclusters = 5 # set number of clusters

toronto_grouped_clustering = toronto_grouped.drop('Postcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row (Postcode) in the dataframe
kmeans.labels_

array([2, 1, 0, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 3, 2, 2, 2, 1, 2, 2, 1, 2,
       0, 1, 2, 2, 0, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 1,
       2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2,
       2, 1, 1, 2, 2, 2, 2, 0, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2,
       4, 2, 4, 0, 2, 2, 2, 2, 2], dtype=int32)

Now let's create a new DataFrame that includes the cluster as well as the top 10 venues for each Postcode. Note that we do a right join because there are (6, as of March 7, 2020 run) fewer rows (Postcodes) in the clustered venues DataFrame than in the original DataFrame, since not all Postcodes had a venue within 500 meters.

In [35]:
# add clustering labels
postcodes_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(postcodes_venues_sorted.set_index('Postcode'), on='Postcode', how='right')

toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.75242,-79.329242,0,Food & Drink Shop,Park,Yoga Studio,Donut Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,M4A,North York,Victoria Village,43.7306,-79.313265,1,Grocery Store,Pharmacy,Park,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Yoga Studio
2,M5A,Downtown Toronto,Harbourfront,43.650295,-79.359166,2,Coffee Shop,Bakery,Café,Theater,Boat or Ferry,Gym Pool,Ice Cream Shop,Performing Arts Venue,Gastropub,Distribution Center
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.451286,2,Clothing Store,Furniture / Home Store,Men's Store,American Restaurant,Restaurant,Cosmetics Shop,Women's Store,Sushi Restaurant,Toy / Game Store,Metro Station
4,M7A,Downtown Toronto,Queen's Park,43.66115,-79.391715,2,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Bookstore,Pharmacy,Portuguese Restaurant,Indian Restaurant,College Auditorium,Sushi Restaurant


Get latitude and longitude for Toronto.

In [36]:
g = geocoder.arcgis('Toronto, Ontario')
toronto_latitude = g.latlng[0]
toronto_longitude = g.latlng[1]

### Create map to visualize the resulting clusters.
Each cluster is represented by a separate color. Cluster examination and overviews are in sections subsequent to the map.

In [37]:
# create map
map_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Postcode'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Finally, let's examine each of the 5 clusters.
The heading for each cluster below indicates the color of the corresponding dots on the above map, along with a general summary of the type of neighborhoods within the cluster, based on the predominant venues within that cluster.

#### Cluster 1 (red): Airport, businesses, and restaurants

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,0,Food & Drink Shop,Park,Yoga Studio,Donut Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
22,M1G,Scarborough,0,Park,Business Service,Coffee Shop,Korean Restaurant,Yoga Studio,Ethiopian Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Falafel Restaurant
40,M3K,North York,0,Airport,Food Court,Park,Coffee Shop,Yoga Studio,Event Space,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
49,M6L,North York,0,Bakery,Park,Yoga Studio,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Falafel Restaurant
57,M9M,North York,0,Coffee Shop,Nightclub,Park,Yoga Studio,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Event Space
68,M5P,Central Toronto,0,Park,Yoga Studio,Dog Run,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space
98,M8X,Etobicoke,0,Pool,Park,Yoga Studio,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Falafel Restaurant


#### Cluster 2 (violet): Residences, groceries, restaurants, and recreation

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M4A,North York,1,Grocery Store,Pharmacy,Park,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Yoga Studio
7,M3B,North York,1,Gas Station,Burger Joint,Soccer Field,Park,Coffee Shop,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
16,M6C,York,1,Grocery Store,Trail,Field,Hockey Arena,Park,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant
18,M1E,Scarborough,1,Construction & Landscaping,Park,Gym / Fitness Center,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Farm
21,M6E,York,1,Park,Market,Bakery,Mexican Restaurant,Gym,Beer Store,Sporting Goods Shop,Women's Store,Fast Food Restaurant,Farmers Market
26,M1H,Scarborough,1,Playground,Trail,Yoga Studio,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Event Space
35,M4J,East York,1,Bar,Italian Restaurant,Farmers Market,Park,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Falafel Restaurant
36,M5J,Downtown Toronto,1,Pier,Harbor / Marina,Park,Thrift / Vintage Store,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School
39,M2K,North York,1,Construction & Landscaping,Park,Trail,Dog Run,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Falafel Restaurant
41,M4K,East Toronto,1,Bus Line,Discount Store,Park,Grocery Store,Ice Cream Shop,Yoga Studio,Event Space,Eastern European Restaurant,Electronics Store,Elementary School


#### Cluster 3 (blue): Shopping, hotels, bars, and restaurants

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,2,Coffee Shop,Bakery,Café,Theater,Boat or Ferry,Gym Pool,Ice Cream Shop,Performing Arts Venue,Gastropub,Distribution Center
3,M6A,North York,2,Clothing Store,Furniture / Home Store,Men's Store,American Restaurant,Restaurant,Cosmetics Shop,Women's Store,Sushi Restaurant,Toy / Game Store,Metro Station
4,M7A,Downtown Toronto,2,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Bookstore,Pharmacy,Portuguese Restaurant,Indian Restaurant,College Auditorium,Sushi Restaurant
5,M9A,Etobicoke,2,Pharmacy,Bank,Park,Skating Rink,Shopping Mall,Grocery Store,Café,Elementary School,Dumpling Restaurant,Eastern European Restaurant
8,M4B,East York,2,Pizza Place,Rock Climbing Spot,Gastropub,Pet Store,Pharmacy,Gym / Fitness Center,Café,Bus Line,Athletics & Sports,Breakfast Spot
9,M5B,Downtown Toronto,2,Coffee Shop,Clothing Store,Japanese Restaurant,Middle Eastern Restaurant,Café,Theater,Bubble Tea Shop,Restaurant,Bookstore,Burger Joint
10,M6B,North York,2,Pizza Place,Latin American Restaurant,Ice Cream Shop,Fast Food Restaurant,Mediterranean Restaurant,Japanese Restaurant,Grocery Store,Asian Restaurant,Italian Restaurant,Gas Station
12,M1C,Scarborough,2,History Museum,Bar,Cosmetics Shop,Costume Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
13,M3C,North York,2,Beer Store,Bubble Tea Shop,Coffee Shop,Gym,Grocery Store,Intersection,Supermarket,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant
14,M4C,East York,2,Bus Line,Bakery,Café,Market,Metro Station,Middle Eastern Restaurant,Breakfast Spot,Coffee Shop,Pub,Doctor's Office


#### Cluster 4 (green): Most rural

In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
85,M1V,Scarborough,3,Pharmacy,Yoga Studio,Dog Run,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space


#### Cluster 5 (orange): More rural

In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,M9B,Etobicoke,4,Print Shop,Yoga Studio,Dog Run,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space
50,M9L,North York,4,Print Shop,Auto Garage,Yoga Studio,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space,Dog Run
