# Applied Data Science Capstone

## Week 3: Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

### First Question

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

In [1]:
# following this youtube tutorial: https://www.youtube.com/watch?v=ng2o98k983k
from bs4 import BeautifulSoup
import requests

import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import json # library to handle JSON files

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

print('Libraries imported.')

Libraries imported.


In [2]:
#pip install folium in anaconda prompt
import folium # map rendering library
from pandas.io.json import json_normalize
print('Folium imported.')

Folium imported.


In [3]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(source,'lxml')
#print(soup.prettify())

In [4]:
table = soup.find('table')
#print(table.prettify())

In [5]:
output_rows = []
for table_row in table.findAll('tr'):

    output_row = []
    #headings will be defined in the data frame with "column = [...]"
    #columns = table_row.findAll('th')
    #for column in columns:
    #    output_row.append(column.text.replace('\n', ' ').strip())
    
    columns = table_row.findAll('td')
    for column in columns:
        output_row.append(column.text.replace('\n', ' ').strip())
    output_rows.append(output_row)

output = pd.DataFrame(output_rows,columns = ['PostalCode','Borough','Neighborhood'])

# delete all rows where the borough is not assigned
output = output[output.Borough != 'Not assigned'] 

# drop the headings - they are showing as null
output.dropna(subset=["PostalCode"], axis=0, inplace=True) 

# replace Not assigned neighborhoods by their borough name 
#it affects Queen's Park for example
output["Neighborhood"].replace("Not assigned", output["Borough"], inplace=True)

#combining neighborhoods
#it affects M5A for example, more than 1 neighborhood per postalcode
output = output.groupby(['PostalCode','Borough'])['Neighborhood'].apply(','.join).reset_index()

#first 20 records
output.head(20)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [6]:
output.shape

(103, 3)

### Second Question 

Adding the latitude and the longitude coordinates of each neighborhood. 
Using the csv file http://cocl.us/Geospatial_data

In [7]:
filename = "http://cocl.us/Geospatial_data"
headers = ["PostalCode","Latitude","Longitude"]
coordinates = pd.read_csv(filename, names = headers,skiprows=1)
coordinates.head(10)

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [8]:
coordinates.shape

(103, 3)

In [9]:
new_output = pd.merge(output, coordinates, on='PostalCode')
new_output.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


In [10]:
new_output.shape

(103, 5)

### Third Question

Explore and cluster the neighborhoods in Toronto. 
You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

### Exploring Central Toronto

In [11]:
# selecting only "Central Toronto"
subset = new_output[new_output.Borough =='Central Toronto'].reset_index()

In [12]:
#there are 9 postal codes "Central Toronto"
subset.shape

(9, 6)

In [13]:
subset

Unnamed: 0,index,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,45,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,46,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
3,47,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,48,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
5,49,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049
6,63,M5N,Central Toronto,Roselawn,43.711695,-79.416936
7,64,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.696948,-79.411307
8,65,M5R,Central Toronto,"The Annex,North Midtown,Yorkville",43.67271,-79.405678


In [14]:
#toronto coordinates: https://gps-coordinates.org/toronto-latitude.php
tor_lat = 43.6529
tor_long = -79.3849

# create map
map_toronto = folium.Map(location=[tor_lat, tor_long], zoom_start=11)

# add markers to the map
for lat, lon, poi in zip(subset['Latitude'], subset['Longitude'], subset['Borough']):
    label = folium.Popup(str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(map_toronto)

map_toronto

Map also available in https://github.com/AnaMariaAnaMaria/Coursera_Capstone/blob/master/map.JPG.

In [15]:
#credentials as per suggestion in https://www.coursera.org/learn/applied-data-science-capstone/discussions/weeks/3/threads/VCjKK35VEemkuBJz3kVAHA
secrets = json.load(open('secrets.json'))
CLIENT_ID = secrets['CLIENT_ID']
CLIENT_SECRET = secrets['CLIENT_SECRET']
VERSION = secrets['VERSION']

print('Credentials loaded')


Credentials loaded


In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500,LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
toronto_venues = getNearbyVenues(names=subset['Neighborhood'],
                                   latitudes=subset['Latitude'],
                                   longitudes=subset['Longitude']
                                  )


Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville


In [19]:
print(toronto_venues.shape)
toronto_venues.head()

(116, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,The Photo School – Toronto,43.730429,-79.388767,Photography Studio
2,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
3,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
4,Davisville North,43.712751,-79.390197,Sherwood Park,43.716551,-79.387776,Park


In [20]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Davisville,34,34,34,34,34,34
Davisville North,10,10,10,10,10,10
"Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West",15,15,15,15,15,15
"Forest Hill North,Forest Hill West",4,4,4,4,4,4
Lawrence Park,4,4,4,4,4,4
"Moore Park,Summerhill East",3,3,3,3,3,3
North Toronto West,20,20,20,20,20,20
Roselawn,3,3,3,3,3,3
"The Annex,North Midtown,Yorkville",23,23,23,23,23,23


In [21]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 62 uniques categories.


Analyzing each neighborhood

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,...,Summer Camp,Supermarket,Sushi Restaurant,Swim School,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Lawrence Park,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
3,Lawrence Park,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,Davisville North,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
toronto_onehot.shape

(116, 63)

Grouping rows by neighborhood and taking the mean of the frequency of occurrence of each category

In [24]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,...,Summer Camp,Supermarket,Sushi Restaurant,Swim School,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Davisville,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.058824,...,0.0,0.0,0.058824,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0
1,Davisville North,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,...,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0
3,"Forest Hill North,Forest Hill West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0
4,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
5,"Moore Park,Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
6,North Toronto West,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
7,Roselawn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"The Annex,North Midtown,Yorkville",0.043478,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.130435,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0


In [25]:
toronto_grouped.shape

(9, 63)

Top 5 most common venues for each neighborhood

In [26]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Davisville----
                venue  freq
0         Pizza Place  0.12
1      Sandwich Place  0.09
2        Dessert Shop  0.09
3  Italian Restaurant  0.06
4                Café  0.06


----Davisville North----
                  venue  freq
0     Food & Drink Shop   0.1
1                 Hotel   0.1
2         Grocery Store   0.1
3                   Gym   0.1
4  Gym / Fitness Center   0.1


----Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West----
                 venue  freq
0                  Pub  0.13
1          Coffee Shop  0.13
2  American Restaurant  0.07
3          Supermarket  0.07
4         Liquor Store  0.07


----Forest Hill North,Forest Hill West----
              venue  freq
0     Jewelry Store  0.25
1             Trail  0.25
2  Sushi Restaurant  0.25
3              Park  0.25
4        Restaurant  0.00


----Lawrence Park----
                 venue  freq
0             Bus Line  0.25
1          Swim School  0.25
2                 Park  0.25
3   Photography Stu

Function to sort venues in descending order

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Dataframe with top 5 venues for each neighborhood

In [28]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Davisville,Pizza Place,Dessert Shop,Sandwich Place,Italian Restaurant,Sushi Restaurant
1,Davisville North,Hotel,Gym,Sandwich Place,Park,Clothing Store
2,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",Pub,Coffee Shop,Sports Bar,Vietnamese Restaurant,Light Rail Station
3,"Forest Hill North,Forest Hill West",Trail,Jewelry Store,Park,Sushi Restaurant,Yoga Studio
4,Lawrence Park,Bus Line,Photography Studio,Park,Swim School,Yoga Studio


Clustering the neighborhoods

In [29]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 3, 1, 0, 2, 0])

Dataframe that includes the clusters

In [30]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = subset

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,index,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,3,Bus Line,Photography Studio,Park,Swim School,Yoga Studio
1,45,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Hotel,Gym,Sandwich Place,Park,Clothing Store
2,46,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Gift Shop
3,47,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Pizza Place,Dessert Shop,Sandwich Place,Italian Restaurant,Sushi Restaurant
4,48,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316,1,Playground,Summer Camp,Trail,Diner,Farmers Market


Map with Clusters

In [31]:
# create map
map_clusters = folium.Map(location=[tor_lat, tor_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Map also available in https://github.com/AnaMariaAnaMaria/Coursera_Capstone/blob/master/Clusters.JPG.

# Examining Clusters

### Cluster 1 - Stores, Restaurants

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,M4P,-79.390197,0,Hotel,Gym,Sandwich Place,Park,Clothing Store
2,M4R,-79.405678,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Yoga Studio,Gift Shop
3,M4S,-79.38879,0,Pizza Place,Dessert Shop,Sandwich Place,Italian Restaurant,Sushi Restaurant
5,M4V,-79.400049,0,Pub,Coffee Shop,Sports Bar,Vietnamese Restaurant,Light Rail Station
7,M5P,-79.411307,0,Trail,Jewelry Store,Park,Sushi Restaurant,Yoga Studio
8,M5R,-79.405678,0,Coffee Shop,Café,Sandwich Place,Pizza Place,American Restaurant


### Cluster 2 - The Playground and the Summer Camp

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,M4T,-79.38316,1,Playground,Summer Camp,Trail,Diner,Farmers Market


### Cluster 3 - The Home Service

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,M5N,-79.416936,2,Home Service,Music Venue,Garden,Yoga Studio,Diner


### Cluster 4 - Bus Line, Park, Swim School and Studios

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M4N,-79.38879,3,Bus Line,Photography Studio,Park,Swim School,Yoga Studio
