# Segmenting and Clustering Neighborhoods in Toronto

## TOC:
* [1. Getting neighbourhood data for Toronto](#first-question)
* [2a. Get geographical data for each neighbourhood](#second-question)
* [2b. Cluster analysis of neighbourhoods using FourSquare](#third-question)

## 1. Getting neighbourhood data for Toronto<a class="anchor" id="first-question"></a>
We will scrape data from Wikipedia in order to get the list of neighbourhoods, their boroughs and postal codes.


In [1]:
import pandas as pd

Read in the data table from Wikipedia.

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
postalcodes=pd.read_html(url,match='Postcode',header=0)[0]
postalcodes=pd.DataFrame(postalcodes)
postalcodes.columns = ['PostalCode','Borough','Neighbourhood']

In [3]:
postalcodes.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


Drop all unassigned postalcodes, and make sure that there are no duplicates in our table.

In [4]:
postalcodes = postalcodes[postalcodes['Borough']!='Not assigned']

In [5]:
boroughs = postalcodes[['PostalCode','Borough']].drop_duplicates()

In [6]:
boroughs.head(5)

Unnamed: 0,PostalCode,Borough
2,M3A,North York
3,M4A,North York
4,M5A,Downtown Toronto
6,M6A,North York
8,M7A,Queen's Park


Group neighbourhoods that have the same postalcode.

In [7]:
neighbourhoods = pd.DataFrame(postalcodes[['PostalCode','Neighbourhood']].groupby('PostalCode')['Neighbourhood'].apply(lambda x : ', '.join(x)))

In [8]:
neighbourhoods.reset_index(inplace=True)
neighbourhoods.head(10)

Unnamed: 0,PostalCode,Neighbourhood
0,M1B,"Rouge, Malvern"
1,M1C,"Highland Creek, Rouge Hill, Port Union"
2,M1E,"Guildwood, Morningside, West Hill"
3,M1G,Woburn
4,M1H,Cedarbrae
5,M1J,Scarborough Village
6,M1K,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,"Clairlea, Golden Mile, Oakridge"
8,M1M,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,"Birch Cliff, Cliffside West"


Merge our Neighbourhood and Bouroughs table into one.

In [9]:
postalcodes = boroughs.merge(neighbourhoods,on='PostalCode',how='outer')

In [10]:
postalcodes.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Not assigned
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


Change any Neighbourhoods that aren't assigned to the Borough name.

In [11]:
postalcodes['Neighbourhood'][postalcodes['Neighbourhood']=="Not assigned"] = \
    postalcodes['Borough'][postalcodes['Neighbourhood']=="Not assigned"]

In [12]:
postalcodes.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


Print the number of rows of the dataframe.

In [13]:
print("The dataframe 'postalcodes' has {0} columns and {1} rows."
      .format(postalcodes.shape[1],postalcodes.shape[0]))

The dataframe 'postalcodes' has 3 columns and 103 rows.


## 2a. Get geographical data for each neighbourhood<a class="anchor" id="second-question"></a>

The code for finding the latitudes and longitudes with geocoder is below, but I didn't use it as it took to long to get a response.

In [14]:
# import geocoder 
# from tqdm import tqdm

# lat_lng_coords = None
# latitudes = []
# longitudes = []
# for postal_code in tqdm(postalcodes['PostalCode']):
#     while(lat_lng_coords is None):
#         g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#         lat_lng_coords = g.latlng
#     latitudes += [lat_lng_coords[0]]
#     longitudes += [lat_lng_coords[1]]
#
# # Assuming that geocoder returns latitudes and longitudes as floats in units of degrees:
# postalcodes['Latitude'] = latitudes
# postalcodes['Longitude'] = longitudes

Instead, read the coordinates from the csv file, and then merge the two dataframes.

In [15]:
coordinates = pd.read_csv('Geospatial_Coordinates.csv')

In [16]:
coordinates.columns = ['PostalCode','Latitude','Longitude']

In [17]:
postalcodes = postalcodes.merge(coordinates,on='PostalCode',how='left')

In [18]:
postalcodes.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


## 2b. Cluster analysis of neighbourhoods using FourSquare<a class="anchor" id="third-question"></a>

In [19]:
import requests
import numpy as np
from sklearn.cluster import KMeans
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

Use the functions from the New York analysis.  Modify getNearbyVenues to skip a borough if no FourSquare data found for that borough.

In [20]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
def getNearbyVenues(neighbourhood, latitudes, longitudes, radius=500):

    venues_list=[]
    for neighbourhood, lat, lng in zip(
        neighbourhood, latitudes, longitudes):
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except: 
            print('No data found for postalcode {0}.'.format(postalcode))
        else:
            venues_list.append([(
                neighbourhood, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Define Foursquare credentials

In [104]:
CLIENT_ID = '#####' # your Foursquare ID
CLIENT_SECRET = '#####' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: #####
CLIENT_SECRET:#####


Run the code from the New York analysis on all Toronto neighbourhoods.  We will use the neighbourhoods as the unique identifier for each region in Toronto, since the boroughs aren't very fine and there are many postalcodes contained in each neighbourhood. However, this means some neighbourhoods are already grouped together because they have the same postal code.

In [23]:
toronto_venues=getNearbyVenues(
    postalcodes['Neighbourhood'],postalcodes['Latitude'],postalcodes['Longitude'])

Check the venues dataframe that we've created.

In [24]:
print(toronto_venues.shape)
toronto_venues.head()

(2251, 5)


Unnamed: 0,Neighbourhood,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,Portugril,43.725819,-79.312785,Portuguese Restaurant


Check the number of venues for each postalcode.

In [25]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"Adelaide, King, Richmond",100,100,100,100
Agincourt,4,4,4,4
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",2,2,2,2
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",10,10,10,10
"Alderwood, Long Branch",10,10,10,10
"Bathurst Manor, Downsview North, Wilson Heights",17,17,17,17
Bayview Village,4,4,4,4
"Bedford Park, Lawrence Manor East",25,25,25,25
Berczy Park,54,54,54,54
"Birch Cliff, Cliffside West",4,4,4,4


Look at the number of unique Toronto venue categories.

In [26]:
print('There are {} unique venue categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 276 unique venue categories.


Prepare the data for cluster analysis.  
First, change the category data to one-hot encoding.

In [27]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now, group the data by postalcode and find the frequency of different venue categories in each Neighbourhood.

In [28]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.010000,0.000000,0.000000,0.000000,0.0000,0.010000,0.000000,0.010000,0.000000
1,Agincourt,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
4,"Alderwood, Long Branch",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
5,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.058824,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
6,Bayview Village,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
7,"Bedford Park, Lawrence Manor East",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
8,Berczy Park,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000
9,"Birch Cliff, Cliffside West",0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.000000,0.000000


Look at the top 5 venues in each Neighbourhood.  We will do this for the first 5, since there are too many neighbourhoods to print this out for all of them.

In [29]:
num_top_venues = 5

for neighbourhood in toronto_grouped['Neighbourhood'][0:5]:
    print("----{0}----".format(neighbourhood))
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == neighbourhood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0          Coffee Shop  0.06
1                 Café  0.05
2      Thai Restaurant  0.04
3  American Restaurant  0.04
4           Steakhouse  0.04


----Agincourt----
                        venue  freq
0              Sandwich Place  0.25
1              Breakfast Spot  0.25
2                      Lounge  0.25
3                Skating Rink  0.25
4  Modern European Restaurant  0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
               venue  freq
0         Playground   0.5
1               Park   0.5
2  Mobile Phone Shop   0.0
3      Moving Target   0.0
4      Movie Theater   0.0


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                  venue  freq
0         Grocery Store   0.2
1              Pharmacy   0.1
2           Pizza Place   0.1
3  Fast Food Restaurant   0.1
4           Coffee Shop   0.1


----Alderwood, Long Branch

Save the top 5 venues for each neighbourhood.

In [80]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :],
                                                                          num_top_venues)
neighbourhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Thai Restaurant,American Restaurant
1,Agincourt,Sandwich Place,Lounge,Skating Rink,Breakfast Spot,Yoga Studio
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Playground,Park,Yoga Studio,Eastern European Restaurant,Discount Store
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pharmacy,Coffee Shop,Beer Store,Liquor Store
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Pharmacy,Gym,Dance Studio
5,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Frozen Yogurt Shop,Fast Food Restaurant,Fried Chicken Joint,Bank
6,Bayview Village,Japanese Restaurant,Café,Bank,Chinese Restaurant,Dumpling Restaurant
7,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sushi Restaurant,Juice Bar,Fast Food Restaurant
8,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Bakery,Cheese Shop
9,"Birch Cliff, Cliffside West",College Stadium,General Entertainment,Skating Rink,Café,Concert Hall


Perform k-means clustering on our data.

In [93]:
# set number of clusters
kclusters = 20

# First, remove columns we don't want to use in the clustering
toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1, n_init=20).fit(toronto_grouped_clustering)

Merge all of our datasets together to create one with neighbourhood, latitude, longitude, cluster label, and most common venues.

In [94]:
toronto_merged = pd.DataFrame(toronto_grouped['Neighbourhood'])
toronto_merged = toronto_merged.merge(postalcodes[['Neighbourhood','Latitude','Longitude']],on='Neighbourhood',how='left')

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() 

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",43.650571,-79.384568,15,Coffee Shop,Café,Steakhouse,Thai Restaurant,American Restaurant
1,Agincourt,43.7942,-79.262029,15,Sandwich Place,Lounge,Skating Rink,Breakfast Spot,Yoga Studio
2,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,4,Playground,Park,Yoga Studio,Eastern European Restaurant,Discount Store
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437,1,Grocery Store,Pharmacy,Coffee Shop,Beer Store,Liquor Store
4,"Alderwood, Long Branch",43.602414,-79.543484,1,Pizza Place,Coffee Shop,Pharmacy,Gym,Dance Studio


Let's look at the distribution of our clusters on the map.

In [105]:
# create map
longitude  = -79.3832
latitude = 43.700
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], 
                                  toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now, let's look at the first 8 of our clusters.

In [96]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
36,43.728496,Korean Restaurant,Food Truck,Electronics Store,Doner Restaurant


In [97]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,43.739416,Pharmacy,Coffee Shop,Beer Store,Liquor Store
4,43.602414,Coffee Shop,Pharmacy,Gym,Dance Studio
5,43.754328,Frozen Yogurt Shop,Fast Food Restaurant,Fried Chicken Joint,Bank
10,43.643515,Convenience Store,Beer Store,Liquor Store,Café
24,43.781638,Fast Food Restaurant,Thai Restaurant,Noodle House,Chinese Restaurant
60,43.799525,Chinese Restaurant,Pharmacy,Japanese Restaurant,Grocery Store
91,43.696319,Chinese Restaurant,Intersection,Coffee Shop,Sandwich Place
94,43.782736,Coffee Shop,Butcher,Grocery Store,Pharmacy
96,43.706397,Pizza Place,Pet Store,Café,Bank


In [98]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,43.711112,Bakery,Soccer Field,Fast Food Restaurant,Park
57,43.693781,Field,Hockey Arena,Park,Eastern European Restaurant
62,43.72802,Bus Line,Park,Swim School,Dumpling Restaurant
65,43.713756,Construction & Landscaping,Park,Bakery,Electronics Store


In [99]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
40,43.685347,Convenience Store,Yoga Studio,Eastern European Restaurant,Discount Store
92,43.706876,Convenience Store,Yoga Studio,Eastern European Restaurant,Discount Store


In [100]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,43.815252,Park,Yoga Studio,Eastern European Restaurant,Discount Store
13,43.737473,Airport,Playground,Park,Eastern European Restaurant
67,43.689574,Gym,Park,Tennis Court,Yoga Studio
74,43.679563,Playground,Trail,Yoga Studio,Dumpling Restaurant


In [101]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
17,43.636966,Hotel,American Restaurant,Gym / Fitness Center,Mediterranean Restaurant
30,43.686412,Pub,Sushi Restaurant,Sports Bar,Fried Chicken Joint
31,43.691116,Restaurant,Coffee Shop,Convenience Store,Check Cashing Service
39,43.727929,Department Store,Coffee Shop,Chinese Restaurant,Train Station
70,43.76798,Miscellaneous Shop,Massage Studio,Coffee Shop,Bar
73,43.662301,Gym,Sushi Restaurant,Japanese Restaurant,Diner
84,43.676357,Pub,Neighborhood,Gym / Fitness Center,Yoga Studio
90,43.725882,Hockey Arena,Intersection,Portuguese Restaurant,Yoga Studio
95,43.770992,Pharmacy,Korean Restaurant,Dumpling Restaurant,Discount Store


In [102]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
58,43.688905,Bus Line,Park,Mobile Phone Shop,Drugstore
87,43.673185,Bus Line,Grocery Store,Convenience Store,Eastern European Restaurant


In [103]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 7, 
               toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
52,43.784535,Moving Target,Yoga Studio,Dog Run,Doner Restaurant
