# Segmenting and Clustering Neighborhoods in Toronto

Let's import the required libraries

In [64]:
import pandas as pd
import numpy as np

import requests

from pandas.io.html import read_html

from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

from geopy.geocoders import Nominatim

import folium

print('Libraries imported')

Libraries imported


## 1. Download and Prepare the Dataset

The required data for segmenting and clustering neighborhoods in Toronto is not readily available. But, there is a source from the wikipedia with required informations that can be used.

The link to the source is- https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

We have to scrape the data from this webpage and prepare it for further use.

We will be using pandas to scrape the data. Let's first prepare the given link to the webpage.

In [65]:
page = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

page

'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Now, we will read the data from this page using the link and to acquire the table we tend to, we have to pass the following dictionary <span>{'class' : 'wikitable'}</span>
to the attrs arguement in read_html function.

Note: Tables in a wikipedia page has the class 'wikitable'.

In [66]:
wikitable = read_html(page, attrs={'class':'wikitable'})

print('Extracted table is {} type of length'.format(type(wikitable), len(wikitable)))

Extracted table is <class 'list'> type of length


Now, let's create a pandas data frame from the list type table.

In [67]:
df = pd.DataFrame(wikitable[0])

df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Let's get the shape of the data frame.

In [68]:
df.shape

(180, 3)

From the data frame we can see that there exists some postal codes which are not assigned to any Boroughs. Let's get rid of those rows.

In [69]:
new_df = df.drop(df.index[df['Borough'] == 'Not assigned'], axis = 0)

new_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Now, let's check if there is any 'Neighbourhood' with the value 'Not assigned'

In [70]:
x = new_df.index[new_df.loc[:, 'Neighbourhood'] =='Not assigned']
len(x)

0

There length of the list may be 0. If not, then let's set the values for those neighbourhoods same as their boroughs.

In [71]:
for i in x:
    new_df.loc[i, 'Neighbourhood'] = new_df.loc[i, 'Borough']

Now, let's reset the index of the data frame.

In [72]:
new_df = new_df.reset_index(drop=True)
new_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Finally, let's print the shape of the final table.

## Data Frame shape(1st Problem)

In [73]:
new_df.shape

(103, 3)

Now, let's get the location coordinates for each postal code.

We'll read it from a csv file.

In [74]:
path = 'Geospatial_Coordinates.csv'

location_df = pd.read_csv(path)
location_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now, let's merge the two data frames together into a single dataframe

## Data Frame Created(2nd Problem)

In [99]:
toronto_df = new_df.merge(location_df, on=['Postal Code', 'Postal Code'])

toronto_df.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


Now, check if the data frame is of the right shape.

In [76]:
toronto_df.shape

(103, 5)

Now, let's check how many borough and neighbourhood we are left with.

In [77]:
print('The dataframe has {} boroughs.'.format(
        len(toronto_df['Borough'].unique())
    )
)

The dataframe has 10 boroughs.


In [78]:
toronto_boroughs = toronto_df['Borough'].unique().tolist()

toronto_boroughs

['North York',
 'Downtown Toronto',
 'Etobicoke',
 'Scarborough',
 'East York',
 'York',
 'East Toronto',
 'West Toronto',
 'Central Toronto',
 'Mississauga']

__Use geopy library to get the latitude and longitude values of Toronto.__

In [79]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [80]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Now, we will start using foresquare api to explore neighborhoods.

Let's define foresquare credentials.

In [81]:
LIMIT = 150

CLIENT_ID = 'QD5MIGEEOHFWRTVDFS0GANDCEDQYQYUM1GKT2Q1PPXX343QG'
CLIENT_SECRET = '2DUQMPPTBTRSI1FD1KBZ55XS4S4SLXWPPOVKFZ2ZUUWU1FW5'
VERSION = '20180605'

Now, let's get the 'getNearbyVenues' from previous lab to get the nearby venues of each neighbourhood.

In [82]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # extract the last neighbourhood where multiple neighbourhood exist for a single postal code
        #try:
        #    name = name.split(',')[-1].strip()
        #except:
        #    continue-->
        
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now, let's get the nearby venues for each neighbourhood in different boroughs.

In [83]:
boroughwise_venues={}

for borough in toronto_boroughs:
    
    temp_df = toronto_df[toronto_df['Borough'] == borough] #data frame of the borough
    
    try:
        borough = borough.lower().replace(' ', '_') + '_venues'
    except:
        borough = borough + '_venues'
        
    boroughwise_venues[borough] = getNearbyVenues(names=temp_df['Neighbourhood'],
                                                  latitudes=temp_df['Latitude'],
                                                  longitudes=temp_df['Longitude']
                                  )
    
    print('{} acquired'.format(borough))

Parkwoods
Victoria Village
Lawrence Manor, Lawrence Heights
Don Mills
Glencairn
Don Mills
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Fairview, Henry Farm, Oriole
Northwood Park, York University
Bayview Village
Downsview
York Mills, Silver Hills
Downsview
North Park, Maple Leaf Park, Upwood Park
Humber Summit
Willowdale, Newtonbrook
Downsview
Bedford Park, Lawrence Manor East
Humberlea, Emery
Willowdale, Willowdale East
Downsview
York Mills West
Willowdale, Willowdale West
north_york_venues acquired
Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airp

In [84]:
boroughwise_venues.keys()

dict_keys(['north_york_venues', 'downtown_toronto_venues', 'etobicoke_venues', 'scarborough_venues', 'east_york_venues', 'york_venues', 'east_toronto_venues', 'west_toronto_venues', 'central_toronto_venues', 'mississauga_venues'])

In [85]:
north_york_venues = boroughwise_venues['north_york_venues']
north_york_venues

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
...,...,...,...,...,...,...,...
237,"Willowdale, Willowdale West",43.782736,-79.442259,Shoppers Drug Mart,43.784847,-79.446028,Pharmacy
238,"Willowdale, Willowdale West",43.782736,-79.442259,RBC Royal Bank,43.783894,-79.446603,Bank
239,"Willowdale, Willowdale West",43.782736,-79.442259,Tim Hortons,43.780940,-79.444231,Coffee Shop
240,"Willowdale, Willowdale West",43.782736,-79.442259,Price Chopper,43.783237,-79.446339,Grocery Store


In [86]:
north_york_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst Manor, Wilson Heights, Downsview North",22,22,22,22,22,22
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",21,21,21,21,21,21
Don Mills,25,25,25,25,25,25
Downsview,16,16,16,16,16,16
"Fairview, Henry Farm, Oriole",65,65,65,65,65,65
Glencairn,6,6,6,6,6,6
Hillcrest Village,5,5,5,5,5,5
Humber Summit,2,2,2,2,2,2
"Humberlea, Emery",1,1,1,1,1,1


In [87]:
print('There are {} uniques categories.'.format(len(north_york_venues['Venue Category'].unique())))

There are 104 uniques categories.


## 2. Analyzing Neighbourhoods

One hot encoding is needed to feed the data into clustering algorithm.

Let's do one hot encoding to all the neighbourhoods.

In [88]:
borough_grouped = {}
for borough_venues in boroughwise_venues.keys():
    temp_venues = boroughwise_venues[borough_venues]
    # one hot encoding
    temp_onehot = pd.get_dummies(temp_venues[['Venue Category']], prefix="", prefix_sep="")

    # add neighborhood column back to dataframe
    temp_onehot['Neighbourhood'] = temp_venues['Neighbourhood'] 

    # move neighborhood column to the first column
    fixed_columns = [temp_onehot.columns[-1]] + list(temp_onehot.columns[:-1])
    temp_onehot = temp_onehot[fixed_columns]
    
    temp_grouped = temp_onehot.groupby('Neighbourhood').mean().reset_index()

    borough_grouped[borough_venues+'_grouped'] = temp_grouped

In [89]:
borough_grouped.keys()

dict_keys(['north_york_venues_grouped', 'downtown_toronto_venues_grouped', 'etobicoke_venues_grouped', 'scarborough_venues_grouped', 'east_york_venues_grouped', 'york_venues_grouped', 'east_toronto_venues_grouped', 'west_toronto_venues_grouped', 'central_toronto_venues_grouped', 'mississauga_venues_grouped'])

In [90]:
north_york_venues_grouped = borough_grouped['north_york_venues_grouped']

north_york_venues_grouped.head()
north_york_venues_grouped.shape

(20, 105)

First, let's write a function to sort the venues in descending order.

In [91]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [92]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = north_york_venues_grouped['Neighbourhood']

for ind in np.arange(north_york_venues_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(north_york_venues_grouped.iloc[ind, :], num_top_venues)
    

neighbourhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Diner,Park,Pizza Place,Bridal Shop,Ice Cream Shop,Deli / Bodega,Restaurant,Pharmacy
1,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Diner,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
2,"Bedford Park, Lawrence Manor East",Sandwich Place,Italian Restaurant,Coffee Shop,Greek Restaurant,Grocery Store,Indian Restaurant,Juice Bar,Liquor Store,Comfort Food Restaurant,Café
3,Don Mills,Gym,Coffee Shop,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Discount Store,Clothing Store,Chinese Restaurant,Caribbean Restaurant
4,Downsview,Park,Grocery Store,Food Truck,Baseball Field,Discount Store,Playground,Shopping Mall,Hotel,Home Service,Business Service
5,"Fairview, Henry Farm, Oriole",Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Bank,Bakery,Chinese Restaurant,Japanese Restaurant,Juice Bar,Shoe Store
6,Glencairn,Pizza Place,Metro Station,Asian Restaurant,Sushi Restaurant,Pub,Japanese Restaurant,Women's Store,Dessert Shop,Clothing Store,Coffee Shop
7,Hillcrest Village,Golf Course,Pool,Athletics & Sports,Mediterranean Restaurant,Dog Run,Women's Store,Dim Sum Restaurant,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
8,Humber Summit,Pizza Place,Home Service,Women's Store,Diner,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop
9,"Humberlea, Emery",Baseball Field,Women's Store,Discount Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store


## 3. Cluster Neighborhoods

Now, let's do k-means clustering to this North York Dataset

In [93]:
# set number of clusters
kclusters = 3

north_york_venues_grouped_clustering = north_york_venues_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(north_york_venues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_.shape 

(20,)

In [94]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
north_york_venues_merged = toronto_df[toronto_df['Borough'] == 'North York'].reset_index(drop=True)
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
north_york_venues_merged = north_york_venues_merged.merge(neighbourhoods_venues_sorted, on=['Neighbourhood', 'Neighbourhood'])

north_york_venues_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Park,Food & Drink Shop,Construction & Landscaping,Women's Store,Dim Sum Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,2,Pizza Place,Coffee Shop,Hockey Arena,Portuguese Restaurant,Intersection,Women's Store,Dim Sum Restaurant,Clothing Store,Comfort Food Restaurant,Construction & Landscaping
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2,Clothing Store,Accessories Store,Furniture / Home Store,Event Space,Coffee Shop,Boutique,Vietnamese Restaurant,Food & Drink Shop,Fast Food Restaurant,Gas Station
3,M3B,North York,Don Mills,43.745906,-79.352188,2,Gym,Coffee Shop,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Discount Store,Clothing Store,Chinese Restaurant,Caribbean Restaurant
4,M3C,North York,Don Mills,43.7259,-79.340923,2,Gym,Coffee Shop,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Discount Store,Clothing Store,Chinese Restaurant,Caribbean Restaurant


In [95]:
address = 'North York, ON'

geolocator = Nominatim(user_agent="foresquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(north_york_venues_merged['Latitude'], north_york_venues_merged['Longitude'], north_york_venues_merged['Neighbourhood'], north_york_venues_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Manhattan are 43.7543263, -79.44911696639593.


## 4. Examine the clusters

__Cluster 1__

In [96]:
north_york_venues_merged.loc[north_york_venues_merged['Cluster Labels'] == 0, north_york_venues_merged.columns[[2] + list(range(6, north_york_venues_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,Park,Food & Drink Shop,Construction & Landscaping,Women's Store,Dim Sum Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop
18,"Willowdale, Newtonbrook",Park,Women's Store,Golf Course,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
22,York Mills West,Park,Convenience Store,Women's Store,Golf Course,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega


__Cluster 2__

In [97]:
north_york_venues_merged.loc[north_york_venues_merged['Cluster Labels'] == 1, north_york_venues_merged.columns[[2] + list(range(6, north_york_venues_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,"York Mills, Silver Hills",Cafeteria,Women's Store,Chocolate Shop,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store


__Cluster 3__

In [98]:
north_york_venues_merged.loc[north_york_venues_merged['Cluster Labels'] == 2, north_york_venues_merged.columns[[2] + list(range(6, north_york_venues_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,Pizza Place,Coffee Shop,Hockey Arena,Portuguese Restaurant,Intersection,Women's Store,Dim Sum Restaurant,Clothing Store,Comfort Food Restaurant,Construction & Landscaping
2,"Lawrence Manor, Lawrence Heights",Clothing Store,Accessories Store,Furniture / Home Store,Event Space,Coffee Shop,Boutique,Vietnamese Restaurant,Food & Drink Shop,Fast Food Restaurant,Gas Station
3,Don Mills,Gym,Coffee Shop,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Discount Store,Clothing Store,Chinese Restaurant,Caribbean Restaurant
4,Don Mills,Gym,Coffee Shop,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Discount Store,Clothing Store,Chinese Restaurant,Caribbean Restaurant
5,Glencairn,Pizza Place,Metro Station,Asian Restaurant,Sushi Restaurant,Pub,Japanese Restaurant,Women's Store,Dessert Shop,Clothing Store,Coffee Shop
6,Hillcrest Village,Golf Course,Pool,Athletics & Sports,Mediterranean Restaurant,Dog Run,Women's Store,Dim Sum Restaurant,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
7,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Diner,Park,Pizza Place,Bridal Shop,Ice Cream Shop,Deli / Bodega,Restaurant,Pharmacy
8,"Fairview, Henry Farm, Oriole",Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Bank,Bakery,Chinese Restaurant,Japanese Restaurant,Juice Bar,Shoe Store
9,"Northwood Park, York University",Coffee Shop,Furniture / Home Store,Caribbean Restaurant,Miscellaneous Shop,Falafel Restaurant,Bar,Massage Studio,Dim Sum Restaurant,Comfort Food Restaurant,Construction & Landscaping
10,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Diner,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop


__As the data are prepared we can now segment and cluster data from any borough to analyze neighbourhood.__