# Clustering Neighborhoods in Toronto

This document corresponds to the assignment for Week 3 of the IBM Data Science Professional Certificate Capstone Course.
## Table of Contents  
* [Problem 1 (Generate DataFrame from Wikipedia](#one)
* [Problem 2 (Determine Coordinates for Each Neighborhood)](#two)
* [Problem 3 (Cluster Neighborhoods Based on Venues)](#three)

In [1]:
import requests
import lxml.html as lh
import pandas as pd
import numpy as np
import folium
import seaborn as sns
from sklearn.cluster import KMeans
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

We are going to use the requests and lxml libraries to parse the html table from wikipedia. We begin grabbing the page, then loading into an lxml document. The lxml library is helpful because we can use it to filter the table by the html tag. Using the library we are able to quickly parse the 3 column table by grabbing the column names before repeating with the actual data. We then convert this to a dictionary for pandas.

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(url)
doc = lh.fromstring(page.content)

tr_elements = doc.xpath('//tr')
col = []
i = 0
for t in tr_elements[0]:
    i += 1
    name = t.text_content()
    name = name.replace('\n', '')
    col.append((name, []))

for j in range(1, len(tr_elements)):
    row = tr_elements[j]
    if len(row) != 3:
        break
    i = 0
    
    for t in row.iterchildren():
        data = t.text_content()
        col[i][1].append(data)
        i += 1

neighbor_dict = {title: column for title, column in col}

We need to load the dictionary into a dataframe. Then we are going to apply various transformations to make the data workable. First, we eliminate the newline characters. We then replace 'Not assigned' with NaN values to be used later. The DataFrame.combine_first function is used to default NaN Neighborhoods to the Borough name. Finally we drop any remaining NaNs. We also drop the extra column as the author is accustomed to the American spelling of 'Neighborhood'.
<a name="one"> </a>
## First Dataframe

In [3]:
nbhd_df = pd.DataFrame(neighbor_dict)
nbhd_df.head()
#Clean up new lines
nbhd_df['Neighborhood'] = nbhd_df['Neighbourhood'].apply(lambda x: x.replace('\n', ''))
nbhd_df = nbhd_df.drop(['Neighbourhood'], axis=1)
nbhd_df = nbhd_df.replace('Not assigned', np.nan)
nbhd_df.Neighborhood = nbhd_df.Neighborhood.combine_first(nbhd_df.Borough)
nbhd_df = nbhd_df.dropna(axis=0, how='any')
nbhd_df = nbhd_df.reset_index(drop=True)
nbhd_df.head(12)
#Eliminate future key errors

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [4]:
nbhd_df.shape

(211, 3)

In [5]:
# from geopy.geocoders import GoogleV3
# from geopy.extra.rate_limiter import RateLimiter

Get locations for each neighborhood using google API for which I already had a key. Code is commented out but left for example. Data will now be imported using the written locations.

In [6]:
# geolocator = GoogleV3(api_key='', domain='maps.google.ca')
# geo = RateLimiter(geolocator.geocode, min_delay_seconds=.1, max_retries=5)
# locations = nbhd_df[['Neighborhood', 'Borough']].apply(lambda x: x[0] if x[0] == x[1] else x[0] + ' ' + x[1], axis=1)
# locations = locations.apply(lambda x: x + ' Toronto' if not 'toronto' in x.lower() else x)

In [7]:
# nbhd_df['loc'] = locations.apply(geo)
# nbhd_df['latitude'] = nbhd_df['loc'].apply(lambda x: x.latitude if x else None)
# nbhd_df['longitude'] = nbhd_df['loc'].apply(lambda x: x.longitude if x else None)
# nbhd_df.to_csv('Neighborhoods.csv')

From here on out, we will be loading our data set from within the folder.

In [9]:
nbhd_df = pd.read_csv('Neighborhoods.csv')
nbhd_df = nbhd_df.set_index(['Unnamed: 0'])
nbhd_df = nbhd_df.reset_index(drop=True)
nbhd_df = nbhd_df.drop('loc', axis=1)

<a name="two"> </a>
## Second Dataframe

In [10]:
nbhd_df.head(12)

Unnamed: 0,Postcode,Borough,Neighborhood,latitude,longitude
0,M3A,North York,Parkwoods,43.755361,-79.32684
1,M4A,North York,Victoria Village,43.735735,-79.312418
2,M5A,Downtown Toronto,Harbourfront,43.640552,-79.378937
3,M5A,Downtown Toronto,Regent Park,43.660323,-79.362044
4,M6A,North York,Lawrence Heights,43.722774,-79.450928
5,M6A,North York,Lawrence Manor,43.728011,-79.439446
6,M7A,Queen's Park,Queen's Park,43.664366,-79.392328
7,M9A,Etobicoke,Islington Avenue,43.682467,-79.540162
8,M1B,Scarborough,Rouge,43.804929,-79.165842
9,M1B,Scarborough,Malvern,43.80916,-79.22169


I'm going to assign colors to neighborhoods based on their borough for visualization purposes. I generate a 'Paired' color palette so I can fill the points with a lighter color. I then create a dictionary to hold all of the feature groups and colors so the colors are standardized.

In [11]:
boroughs = nbhd_df['Borough'].unique()
palette = sns.color_palette('Paired',2 * len(boroughs))
p1 = palette.as_hex()[::2]
p2 = palette.as_hex()[1::2]
layer_names = {}
for name, c1, c2 in zip(boroughs, p1, p2):
    layer_names[name] = (folium.map.FeatureGroup(name=name), (c1, c2))

Create markers for each neighborhood with popups. Add each marker to a borough FeatureGroup so they can be toggled in layers.

In [12]:
map_toronto = folium.Map(location=[43.761539, -79.411079], zoom_start=10)
nbhd_df = nbhd_df.dropna(axis=0, how='any')
nbhd_df.loc[nbhd_df['Neighborhood'] == 'Dufferin', 'latitude'] = 43.6557992
nbhd_df.loc[nbhd_df['Neighborhood'] == 'Dufferin', 'longitude'] = -79.43668
for lat, lng, label, bor in zip(nbhd_df['latitude'], nbhd_df['longitude'], nbhd_df['Neighborhood'], nbhd_df['Borough']):
    label = folium.Popup(label + ', ' + bor, parse_html=True)
    fgroup = layer_names[bor][0]
    c1 = layer_names[bor][1][1]
    c2 = layer_names[bor][1][0]
    fgroup.add_child(folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=c1,
        fill=True,
        fill_color=c2,
        fill_opacity=.7,
        parse_html=False
        ))
for x in layer_names.values():
    map_toronto.add_child(x[0])
map_toronto.add_child(folium.map.LayerControl())
map_toronto

Roughly color coded based on the neighborhood, with popups added. This set of locations will be my last revision of the map. Dufferin was the last misbehaving point and was set manually. 

### Exploring Neighborhoods

In [13]:
# CLIENT_ID = ''
# CLIENT_SECRET = ''
# VERSION = 20190417
# LIMIT = 100

Code to pull data from Foursquare, commented out in favor of locally stored data to lighten api requests

In [14]:
# def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
#     venues_list=[]
#     for name, lat, lng in zip(names, latitudes, longitudes):
            
#         # create the API request URL
#         url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
#             CLIENT_ID, 
#             CLIENT_SECRET, 
#             VERSION, 
#             lat, 
#             lng, 
#             radius, 
#             LIMIT)
            
#         # make the GET request
#         results = requests.get(url).json()["response"]['groups'][0]['items']
        
#         # return only relevant information for each nearby venue
#         venues_list.append([(
#             name, 
#             lat, 
#             lng, 
#             v['venue']['name'], 
#             v['venue']['location']['lat'], 
#             v['venue']['location']['lng'],  
#             v['venue']['categories'][0]['name']) for v in results])

#     nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
#     nearby_venues.columns = ['Neighborhood', 
#                   'Neighborhood Latitude', 
#                   'Neighborhood Longitude', 
#                   'Venue', 
#                   'Venue Latitude', 
#                   'Venue Longitude', 
#                   'Venue Category']
    
#     return(nearby_venues)

Generate list of venues

In [15]:
#venues = getNearbyVenues(nbhd_df['Neighborhood'], nbhd_df['latitude'], nbhd_df['longitude'])
venues = pd.read_csv('venues.csv')
venues = venues.set_index(['Unnamed: 0'])
venues = venues.reset_index(drop=True)
venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Victoria Village,43.735735,-79.312418,Jatujak,43.736208,-79.307668,Thai Restaurant
1,Victoria Village,43.735735,-79.312418,house of contractors,43.736574,-79.311743,Outdoor Supply Store
2,Victoria Village,43.735735,-79.312418,Artistic Nails,43.73612,-79.30808,Spa
3,Victoria Village,43.735735,-79.312418,The Prince - Shisha Lounge,43.736603,-79.307812,Middle Eastern Restaurant
4,Victoria Village,43.735735,-79.312418,Tagpuan,43.735943,-79.307651,Asian Restaurant


In [16]:
print(venues.shape)
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

(5357, 7)
There are 328 uniques categories.


One Hot encoding

In [17]:
tor_onehot = pd.get_dummies(venues[['Venue Category']], prefix='', prefix_sep='')
tor_onehot['Name'] = venues['Neighborhood']

fixed_columns = [tor_onehot.columns[-1]] + list(tor_onehot.columns[:-1])
tor_onehot = tor_onehot[fixed_columns]

tor_onehot.head()

Unnamed: 0,Name,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Generate a frequency of categories

In [18]:
tor_grp = tor_onehot.groupby('Name').mean().reset_index()
tor_grp.head()

Unnamed: 0,Name,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Albion Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alderwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
tor_grp.shape

(205, 329)

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Get a list of Top venues from data set by which we can cluster the data.

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = tor_grp['Name']

for ind in np.arange(tor_grp.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tor_grp.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Hotel,Restaurant,Thai Restaurant,Sushi Restaurant,Pizza Place,Bar,Sporting Goods Shop,Theater,French Restaurant
1,Agincourt,Food Court,Coffee Shop,Food & Drink Shop,Falafel Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space
2,Agincourt North,Park,Zoo,Exhibit,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space
3,Albion Gardens,Bank,Pharmacy,Zoo,Falafel Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space
4,Alderwood,Pizza Place,Pool,Gym,Pub,Pharmacy,Coffee Shop,Dance Studio,Donut Shop,Skating Rink,Convenience Store


Cluster the neighborhoods

In [22]:
k = 5
tor_grp_clustering = tor_grp.drop('Name', 1)

kmeans = KMeans(n_clusters=k, random_state=0).fit(tor_grp_clustering)

kmeans.labels_[0:10]

array([0, 0, 3, 0, 0, 2, 0, 0, 0, 0])

In [23]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tor_merged = nbhd_df

tor_merged = tor_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [24]:
tor_merged.head(10)

Unnamed: 0,Postcode,Borough,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.755361,-79.32684,,,,,,,,,,,
1,M4A,North York,Victoria Village,43.735735,-79.312418,0.0,Bus Line,Middle Eastern Restaurant,Wings Joint,Spa,Asian Restaurant,Chinese Restaurant,Outdoor Supply Store,Thai Restaurant,Ethiopian Restaurant,Eastern European Restaurant
2,M5A,Downtown Toronto,Harbourfront,43.640552,-79.378937,0.0,Coffee Shop,Boat or Ferry,Café,Pizza Place,Restaurant,Hotel,Sports Bar,Park,Fried Chicken Joint,Bakery
3,M5A,Downtown Toronto,Regent Park,43.660323,-79.362044,0.0,Coffee Shop,Thai Restaurant,Pub,Sushi Restaurant,Auto Dealership,Restaurant,Electronics Store,Grocery Store,Beer Store,Food Truck
4,M6A,North York,Lawrence Heights,43.722774,-79.450928,0.0,Clothing Store,Coffee Shop,American Restaurant,Accessories Store,Fast Food Restaurant,Men's Store,Toy / Game Store,Electronics Store,Sporting Goods Shop,Cosmetics Shop
5,M6A,North York,Lawrence Manor,43.728011,-79.439446,2.0,Playground,Gym / Fitness Center,Park,Skating Rink,Zoo,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store
6,M7A,Queen's Park,Queen's Park,43.664366,-79.392328,0.0,Coffee Shop,Sushi Restaurant,Café,Park,Sandwich Place,Theater,Gym / Fitness Center,Chinese Restaurant,Office,Museum
7,M9A,Etobicoke,Islington Avenue,43.682467,-79.540162,0.0,Sandwich Place,Park,Smoothie Shop,Coffee Shop,Baseball Field,Zoo,Event Space,Eastern European Restaurant,Egyptian Restaurant,Electronics Store
8,M1B,Scarborough,Rouge,43.804929,-79.165842,2.0,Park,Fast Food Restaurant,Zoo,Exhibit,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
9,M1B,Scarborough,Malvern,43.80916,-79.22169,0.0,Pizza Place,Pharmacy,Park,Sandwich Place,Grocery Store,Fast Food Restaurant,Bubble Tea Shop,Zoo,Dumpling Restaurant,Eastern European Restaurant


<a name="three"> </a>
## Cluster Map
Build map and color according to clusters

In [25]:
cl_map_toronto = folium.Map(location=[43.761539, -79.411079], zoom_start=10)
tor_merged = tor_merged.dropna(axis=0, how='any')

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0,1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lng, poi, cluster in zip(tor_merged['latitude'], tor_merged['longitude'], tor_merged['Neighborhood'], tor_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(int(cluster)), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(cl_map_toronto)
cl_map_toronto.add_child(folium.map.LayerControl())
cl_map_toronto

In [26]:
tor_merged.loc[tor_merged['Cluster Labels'] == 0, tor_merged.columns[[2] + list(range(5, tor_merged.shape[1]))]].head(10)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,0.0,Bus Line,Middle Eastern Restaurant,Wings Joint,Spa,Asian Restaurant,Chinese Restaurant,Outdoor Supply Store,Thai Restaurant,Ethiopian Restaurant,Eastern European Restaurant
2,Harbourfront,0.0,Coffee Shop,Boat or Ferry,Café,Pizza Place,Restaurant,Hotel,Sports Bar,Park,Fried Chicken Joint,Bakery
3,Regent Park,0.0,Coffee Shop,Thai Restaurant,Pub,Sushi Restaurant,Auto Dealership,Restaurant,Electronics Store,Grocery Store,Beer Store,Food Truck
4,Lawrence Heights,0.0,Clothing Store,Coffee Shop,American Restaurant,Accessories Store,Fast Food Restaurant,Men's Store,Toy / Game Store,Electronics Store,Sporting Goods Shop,Cosmetics Shop
6,Queen's Park,0.0,Coffee Shop,Sushi Restaurant,Café,Park,Sandwich Place,Theater,Gym / Fitness Center,Chinese Restaurant,Office,Museum
7,Islington Avenue,0.0,Sandwich Place,Park,Smoothie Shop,Coffee Shop,Baseball Field,Zoo,Event Space,Eastern European Restaurant,Egyptian Restaurant,Electronics Store
9,Malvern,0.0,Pizza Place,Pharmacy,Park,Sandwich Place,Grocery Store,Fast Food Restaurant,Bubble Tea Shop,Zoo,Dumpling Restaurant,Eastern European Restaurant
10,Don Mills North,0.0,Japanese Restaurant,Caribbean Restaurant,Café,Exhibit,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space
11,Woodbine Gardens,0.0,Chinese Restaurant,Arts & Crafts Store,Pet Store,Burger Joint,Zoo,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
12,Parkview Hill,0.0,Pizza Place,Bank,Fast Food Restaurant,Rock Climbing Spot,Athletics & Sports,Intersection,Café,Gastropub,Pet Store,Pharmacy


In [27]:
tor_merged.loc[tor_merged['Cluster Labels'] == 1, tor_merged.columns[[2] + list(range(5, tor_merged.shape[1]))]].head(10)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Guildwood,1.0,Baseball Field,Train Station,Zoo,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit
46,Downsview North,1.0,Baseball Field,Health & Beauty Service,Zoo,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit
71,Downsview East,1.0,Baseball Field,Health & Beauty Service,Zoo,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit
89,Downsview,1.0,Baseball Field,Health & Beauty Service,Zoo,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit
98,Downsview Central,1.0,Baseball Field,Health & Beauty Service,Zoo,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit
111,Downsview Northwest,1.0,Baseball Field,Health & Beauty Service,Zoo,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit


In [28]:
tor_merged.loc[tor_merged['Cluster Labels'] == 2, tor_merged.columns[[2] + list(range(5, tor_merged.shape[1]))]].head(10)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Lawrence Manor,2.0,Playground,Gym / Fitness Center,Park,Skating Rink,Zoo,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store
8,Rouge,2.0,Park,Fast Food Restaurant,Zoo,Exhibit,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
20,West Deane Park,2.0,Scenic Lookout,Tennis Court,Park,Sushi Restaurant,Event Space,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant
28,Humewood-Cedarvale,2.0,Park,Hockey Arena,Field,Trail,Zoo,Event Space,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store
34,Morningside,2.0,Park,Convenience Store,Tennis Court,Sandwich Place,Beer Store,Fast Food Restaurant,Supermarket,Coffee Shop,Discount Store,Zoo
39,Woburn,2.0,Coffee Shop,Korean Restaurant,Business Service,Park,Dumpling Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space
45,Bathurst Manor,2.0,Convenience Store,Playground,Park,Baseball Field,Exhibit,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
54,Scarborough Village,2.0,Gym,Shopping Mall,Park,Coffee Shop,Zoo,Event Space,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant
56,Henry Farm,2.0,Park,Restaurant,Zoo,Event Space,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
66,East Birchmount Park,2.0,Park,Skating Rink,Gym,College Stadium,Gym Pool,General Entertainment,Zoo,Eastern European Restaurant,Egyptian Restaurant,Electronics Store


In [29]:
tor_merged.loc[tor_merged['Cluster Labels'] == 3, tor_merged.columns[[2] + list(range(5, tor_merged.shape[1]))]].head(10)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Port Union,3.0,Park,Zoo,Exhibit,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space
137,Kingsview Village,3.0,Park,Zoo,Exhibit,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space
146,Swansea,3.0,Park,Pilates Studio,Dance Studio,Zoo,Falafel Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
155,Agincourt North,3.0,Park,Zoo,Exhibit,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space


In [30]:
tor_merged.loc[tor_merged['Cluster Labels'] == 4, tor_merged.columns[[2] + list(range(5, tor_merged.shape[1]))]].head(10)

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
169,Railway Lands,4.0,Art Gallery,Zoo,Falafel Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit


Group 0: Restaurants and Cafes Primarily  
Group 1: Downsview and Guildwood, Baseball Field and Zoo  
Group 2: Parks, Fields, Gyms, etc.  
Group 3: Park, Zoo, Exhibit  
Group 4: Single Member. Should be grouped with Group 3?  