# Segmenting and Clustering Neighborhoods of Toronto

### 1. Fetching data to define neighborhood

To be able to cluster different neighborhoods of the city of Toronto, we will need to define their geospatial locations and boundaries.  We can obtain this data from `geopy`.  But, before we can do that we have to know more about each neigborhood like thier names and postal codes.

We will be able to achieve this by scraping the necessary data from a website.  the information we need is available at `https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M`

In [117]:
#!conda install -c conda-forge beautifulsoup4  #remove leading hashtag if beutifulsoup is not installed.

from bs4 import BeautifulSoup # the beautiful soup library will be used to scrape the data from wikipedia.

#!conda install -c conda-forge lxml # remove leading hashtag if lxml parser is not installed

#!conda install -c conda-forge requests # remove leading hashtag if requests library is not installed
import requests
import csv


In [118]:
#The contents of the webpage are fetched and stored in as the variable 'source'.  'source' is passed into beautiful soup and parsed to return the beautiful soup object.

source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'lxml')

# To work in the portion of the parse tree that we are concerned with, the contents of the table containing the data of interest is stored in the variable 'table'.
table = soup.table.tbody 

# A csv file is created so that the table contents can be written to it.
csv_file = open('toronto_hood.csv', 'w')
csv_writer = csv.writer(csv_file)

# Each row of the table can be looped through and written to the csv file 'toronto_hood.csv'.
for table_row in table.find_all('tr'):
    field_one = table_row.next_element.next_element
    column_one = field_one.text
    
    field_two = field_one.next_sibling.next_sibling
    column_two = field_two.text

    # The /n linending must be removed from the neighbourhood and the 'Not assigned' values be changed to NaN so that pandas can recognize them.
    field_three = field_two.next_sibling.next_sibling
    column_three = field_three.text
    column_three = column_three[:-1]
    
    if column_three == "Not assigned":
        column_three = 'NaN'

    table_row = table_row.next_sibling.next_sibling
    
    csv_writer.writerow([column_one, column_two, column_three])

csv_file.close()

In [119]:
# import libraries

import pandas as pd

In [120]:
# The dataframe is created from the csv.
df = pd.read_csv('toronto_hood.csv')

df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [121]:
# The rows containing NaN are recognized by pandas and can be dropped and the index reset.

df.dropna(axis=0, inplace=True)
df.reset_index(inplace=True, drop=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


In [122]:
# The neighbourhoods are grouped and concantenated into a pandas Series for each postcode and assigned to a variable.
build_list = df.groupby('Postcode')['Neighbourhood'].apply(', '.join)

# A vector containing the distinct postcodes is created so that it can be looped through so the new values for 'Neighbourhood' in 'df'
pc_frame = df['Postcode'].unique()

In [123]:
# By iterating through the the column of Postcodes in 'pc_frame' the concantenated neighbourhoods can be written into the 'Neighbourhood' column of the original dataframe.

for i in range(102):
    n_hood = pc_frame[i]
    df['Neighbourhood'].loc[df['Postcode']== n_hood] = build_list[n_hood]


In [124]:
# As each postcode had one row per each neighbourhood belonging to it, many duplicate rows were created in the for-loop above.  These duplicates are dropped to obtain the final dataframe.
df.drop_duplicates(inplace=True)
df.reset_index(inplace=True, drop=True)
df.shape

df.tail()

Unnamed: 0,Postcode,Borough,Neighbourhood
97,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
98,M4Y,Downtown Toronto,Church and Wellesley
99,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
100,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."
101,M8Z,Etobicoke,"Kingsway Park South West, Mimico NW, The Queen..."


### 2. Adding longitude and latitude data to dataframe

There was an issue in trying to obtain the geocoder library from Anaconda.  However, the geospatial data for Toronto is available from another source.

In [125]:
path = 'https://cocl.us/Geospatial_data'

lat_long_df = pd.read_csv(path, index_col=False)
lat_long_df.rename(columns={'Postal Code': 'Postcode'}, inplace=True)
lat_long_df.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [126]:
toronto_data = df.merge(lat_long_df)

toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242


### 3. Map of Toronto with Neighbourhoods

To dive into the data deeper, additional libraris are necessary for data manipulation and plotting.

In [128]:
import seaborn as sns

# !conda install -c conda-forge folium
import folium
import numpy as np

import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize

To begin, the map of Toronto is created.  As a starting point, the Downtown area will serve as the center of the map.

In [129]:
start_lat = toronto_data.at[2,'Latitude']
start_long = toronto_data.at[2, 'Longitude']

map_toronto = folium.Map(location=[start_lat, start_long], zoom_start=11)

# add the markers for each postcode
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

With the map ready, it's now time to call the Foursquare API.

In [130]:
# The code was removed by Watson Studio for sharing.

The resulting json is parsed for the venues by each neighborhood group.

In [131]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [132]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Parkwoods
Victoria Village
Harbourfront, Regent Park
Lawrence Heights, Lawrence Manor
Islington Avenue
Rouge, Malvern
Don Mills North
Woodbine Gardens, Parkview Hill
Ryerson, Garden District
Glencairn
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Highland Creek, Rouge Hill, Port Union
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
East Birchmount Park, Ionview, Kennedy Park
Bayview Village
CFB Toronto, Downsview East
The Danforth West,

In [133]:
venue_count = toronto_venues.shape[0]

int(venue_count)
toronto_venues.head()

print(venue_count)

2208


Our resulting data contains the top 120 venues found within 500 meters of the center of each of the 102 areas by postcode.  This could mean that neighbourhoods in larger postocodes may not in fact have any representation within the data.  Without neighbourhood-specific longitude and latitude data, this cannot be explored or confirmed.  The opposite of this "scarcity" issue may exist as well, where areas with higher density of both population and venues are likely to be defined in "smaller" postcodes.  The situation may arise where a venue may be within 500 meters of more than one geographic center of a postcode.  This can be evaluated by checking for duplicate venues within the data. 

In [134]:
toronto_venues.duplicated().value_counts()

False    2208
dtype: int64

The value counts returns 'False' for all venues from the data.  This means that in fact no venues were duplicated in the data fetched from the source.  In an attempt to force duplicate venues to occur, the data was fetched gain but with LIMIT increased from 120 to 250 and the radius increased from 500 to 800.  The result consisted of the same number of venues.  This would suggest that all venues in Toronto have been captured.

In [135]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,4,4,4,4,4,4
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",2,2,2,2,2,2
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",11,11,11,11,11,11
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Downsview North, Wilson Heights",18,18,18,18,18,18
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",22,22,22,22,22,22
Berczy Park,57,57,57,57,57,57
"Birch Cliff, Cliffside West",4,4,4,4,4,4


Seeing as we are healthy eaters, we are not interested in fast food restaurants.  These will be removed so as to not impact the clustering results.

In [136]:
for i in range(venue_count):
    if toronto_venues.loc[i,'Venue Category'] == 'Fast Food Restaurant':
        toronto_venues.drop([i], inplace = True)
toronto_venues.reset_index(inplace=True, drop=True)
        
toronto_venues.head()






Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [137]:
print(toronto_venues.shape)

(2171, 7)


In [138]:
# one hot encoding
# the line below considers the venue categories in the new dataframe.  The neighborhoods are dropped and will need to be added back in.
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']
# print(toronto_onehot.columns[0])

# The line above added the neighborhood column to the end of the frame.  This can be moved back to the first column...
while toronto_onehot.columns[0] != 'Neighborhood':
    fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
    toronto_onehot = toronto_onehot[fixed_columns]
    if toronto_onehot.columns[0] == 'Afghan Restaurant':
        break

toronto_onehot.head(12)

Unnamed: 0,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,...,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Motel,Movie Theater,Moving Target,Museum,Music Store,Music Venue
0,Parkwoods,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Harbourfront, Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [139]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').sum().reset_index(drop=False)

print(toronto_grouped.shape)

fun_hoods = toronto_grouped.shape[0]
print(fun_hoods)

(98, 274)
98


Before ranking which venue types are found the most for each neighborhood, any neighborhood with fewer than 8 venues will be dropped.  For instance, if we want to cluster neighborhoods by the top 5 most occuring types of venue but there are only 3 venues in a neighborhood, the 4th and 5th most occurring venue type will have zero venues of that type in the target neighborhood.

In [140]:
for i in range(fun_hoods):
    if toronto_grouped.agg(np.sum, axis=1)[i] <8:
        toronto_grouped.drop([i], inplace=True)
toronto_grouped.reset_index(inplace=True, drop=True)

toronto_grouped.head()

IndexError: index out of bounds

In [141]:
def return_most_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [142]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns for number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_venues(toronto_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(98, 6)


Unnamed: 0,Neighborhood,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant
1,Agincourt,Clothing Store,Lounge,Skating Rink,Breakfast Spot,Music Venue
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Accessories Store,Airport,Airport Food Court
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Beer Store,Fried Chicken Joint,Discount Store,Sandwich Place
4,"Alderwood, Long Branch",Pizza Place,Pool,Pharmacy,Gym,Coffee Shop


### 4. Clustering of Neighborhoods

In [143]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each of first 10 rows
kmeans.labels_[0:10]

array([0, 1, 1, 1, 1, 1, 1, 1, 2, 1], dtype=int32)

In [144]:
# add clusters labels to the dataframe
neighborhoods_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), how='right', on='Neighbourhood')

toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Label,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,1,Park,Food & Drink Shop,Accessories Store,Airport,Airport Food Court
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,French Restaurant,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,2,Coffee Shop,Park,Bakery,Café,Theater
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,1,Miscellaneous Shop,Boutique,Furniture / Home Store,Event Space,Vietnamese Restaurant
6,M3B,North York,Don Mills North,43.745906,-79.352188,1,Café,Baseball Field,Japanese Restaurant,Caribbean Restaurant,Gym / Fitness Center


In [145]:
toronto_merged.tail()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Label,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
97,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,1,Park,River,American Restaurant,Airport,Airport Food Court
98,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,2,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant
99,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558,1,Light Rail Station,Garden,Burrito Place,Brewery,Comic Shop
100,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509,1,Baseball Field,Pool,Asian Restaurant,Arts & Crafts Store,Art Museum
101,M8Z,Etobicoke,"Kingsway Park South West, Mimico NW, The Queen...",43.628841,-79.520999,1,Flower Shop,Supplement Shop,Bakery,Hardware Store,Wings Joint


In [146]:
toronto_merged.shape

(98, 11)

In [147]:
map_clusters = folium.Map(location=[start_lat, start_long], zoom_start=11)

# set color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Label']):
    label = folium.Popup(str(poi) + 'Cluster' + str(cluster), parse_html=True)
    folium.CircleMarker(
    [lat, lon],
    radius=5,
    popup=label,
    color=rainbow[cluster-1],
    fill=True,
    fill_color=rainbow[cluster-1],
    fill_opacity=0.7).add_to(map_clusters)


In [148]:
map_clusters

Each of the 5 clusters can be evaluated in closer detail to illustrate the differences between them.

In [149]:
toronto_merged.loc[toronto_merged['Cluster Label'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1])) ]]

Unnamed: 0,Neighbourhood,Cluster Label,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
14,St. James Town,0,Coffee Shop,Hotel,Café,Restaurant,Italian Restaurant
23,Central Bay Street,0,Coffee Shop,Italian Restaurant,Café,Ice Cream Shop,Burger Joint
29,"Adelaide, King, Richmond",0,Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant
35,"Harbourfront East, Toronto Islands, Union Station",0,Coffee Shop,Hotel,Aquarium,Café,Italian Restaurant
41,"Design Exchange, Toronto Dominion Centre",0,Coffee Shop,Café,Hotel,Italian Restaurant,Restaurant
47,"Commerce Court, Victoria Hotel",0,Coffee Shop,Hotel,Café,American Restaurant,Restaurant
91,Stn A PO Boxes 25 The Esplanade,0,Coffee Shop,Café,Restaurant,Cocktail Bar,Hotel
96,"First Canadian Place, Underground city",0,Coffee Shop,Café,Steakhouse,Hotel,Restaurant


In [150]:
toronto_merged.loc[toronto_merged['Cluster Label'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1])) ]]

Unnamed: 0,Neighbourhood,Cluster Label,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
0,Parkwoods,1,Park,Food & Drink Shop,Accessories Store,Airport,Airport Food Court
1,Victoria Village,1,French Restaurant,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant
3,"Lawrence Heights, Lawrence Manor",1,Miscellaneous Shop,Boutique,Furniture / Home Store,Event Space,Vietnamese Restaurant
6,Don Mills North,1,Café,Baseball Field,Japanese Restaurant,Caribbean Restaurant,Gym / Fitness Center
7,"Woodbine Gardens, Parkview Hill",1,Pizza Place,Gym / Fitness Center,Breakfast Spot,Intersection,Bank
9,Glencairn,1,Sushi Restaurant,Japanese Restaurant,Pub,Pizza Place,Park
10,"Cloverdale, Islington, Martin Grove, Princess ...",1,Bank,Music Venue,Airport Terminal,Afghan Restaurant,Airport
11,"Highland Creek, Rouge Hill, Port Union",1,Bar,Music Venue,Airport Terminal,Afghan Restaurant,Airport
12,"Flemingdon Park, Don Mills South",1,Gym,Coffee Shop,Asian Restaurant,Beer Store,Chinese Restaurant
13,Woodbine Heights,1,Asian Restaurant,Park,Athletics & Sports,Cosmetics Shop,Pharmacy


In [151]:
toronto_merged.loc[toronto_merged['Cluster Label'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1])) ]]

Unnamed: 0,Neighbourhood,Cluster Label,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
2,"Harbourfront, Regent Park",2,Coffee Shop,Park,Bakery,Café,Theater
19,Berczy Park,2,Coffee Shop,Cocktail Bar,Seafood Restaurant,Bakery,Café
22,Leaside,2,Coffee Shop,Sporting Goods Shop,Burger Joint,Grocery Store,Furniture / Home Store
36,"Little Portugal, Trinity",2,Bar,Coffee Shop,Asian Restaurant,Boutique,Bakery
40,"The Danforth West, Riverdale",2,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store
53,Studio District,2,Café,Coffee Shop,American Restaurant,Gastropub,Bakery
58,Willowdale South,2,Coffee Shop,Ramen Restaurant,Café,Japanese Restaurant,Restaurant
73,"The Annex, North Midtown, Yorkville",2,Coffee Shop,Café,Sandwich Place,Pizza Place,Burger Joint
78,Davisville,2,Pizza Place,Dessert Shop,Sandwich Place,Sushi Restaurant,Thai Restaurant
79,"Harbord, University of Toronto",2,Café,Bar,Bookstore,Bakery,Japanese Restaurant


In [152]:
toronto_merged.loc[toronto_merged['Cluster Label'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1])) ]]

Unnamed: 0,Neighbourhood,Cluster Label,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
83,"Chinatown, Grange Park, Kensington Market",3,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Vietnamese Restaurant,Dumpling Restaurant


In [153]:
toronto_merged.loc[toronto_merged['Cluster Label'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1])) ]]

Unnamed: 0,Neighbourhood,Cluster Label,1st Most Venue,2nd Most Venue,3rd Most Venue,4th Most Venue,5th Most Venue
8,"Ryerson, Garden District",4,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Bookstore
32,"Fairview, Henry Farm, Oriole",4,Clothing Store,Coffee Shop,Toy / Game Store,Japanese Restaurant,Bus Station
