# Segmenting and Clustering Toronto Neighbourhoods

## Obtaining the data... ##

First, import the libraries we will require in this notebook.
We use:
* <b>urllib.request</b> to open the Wikipedia URL provided (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)
* <b>BeautifulSoup</b> to parse our HTML file and scrape the table
* <b>pandas</b> to work with the data we have in the Wikipedia table more easily, in a dataframe

In [1]:
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd

So now we can use urlopen() from the urllib.request library to open the URL provided for the Canadian Postal Codes Wikipedia page...

In [2]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urllib.request.urlopen(url)

... and we used BeautifulSoup to parse the HTML file into our notebook. \
<b>Note:</b> soup.prettify() formats the parse tree so that it is easier to read and work with

In [3]:
soup = BeautifulSoup(page, "lxml")
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"628d4537-c294-4879-8495-90635801e84b","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":969048979,"wgRevisionId":969048979,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toron

So in the cell above we can see we have parsed the HTML for the full Wikipedia page. We don't need the whole page - we only want to work with the 'Postal Codes' table. All tables with a HTML file are coded using the 'table' tag, so let's first find <i>all</i> tables in the Wikipedia page. As you can see in the output cell below, this gives us four tables:

In [4]:
table=soup.find_all("table")
table

[<table class="wikitable sortable">
 <tbody><tr>
 <th>Postal Code
 </th>
 <th>Borough
 </th>
 <th>Neighbourhood
 </th></tr>
 <tr>
 <td>M1A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M2A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M3A
 </td>
 <td>North York
 </td>
 <td>Parkwoods
 </td></tr>
 <tr>
 <td>M4A
 </td>
 <td>North York
 </td>
 <td>Victoria Village
 </td></tr>
 <tr>
 <td>M5A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Regent Park, Harbourfront
 </td></tr>
 <tr>
 <td>M6A
 </td>
 <td>North York
 </td>
 <td>Lawrence Manor, Lawrence Heights
 </td></tr>
 <tr>
 <td>M7A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Queen's Park, Ontario Provincial Government
 </td></tr>
 <tr>
 <td>M8A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M9A
 </td>
 <td>Etobicoke
 </td>
 <td>Islington Avenue, Humber Valley Village
 </td></tr>
 <tr>
 <td>M1B
 </td>
 <td>Scarborough
 </td>
 <td>Malvern, Rouge
 </td></tr>
 <tr>
 <td>M2B

The table we want to work with is the only table with class 'wikitable sortable', so let's use this distinguishing feature to save the Toronto Postal Codes table as the 'right_table'...

In [5]:
right_table=soup.find('table', class_='wikitable sortable')
right_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td

Perfect! Now we have to start looping through the rows to get the data for every postal code in the table. This is fairly straightforward as each postal code has it’s own defined row in the table.


We set up three empty lists (A, B and C) to store our data in, one for each column in the table. We use the Beautiful Soup ‘find_all’ function again and set it to look for the string ‘tr’, which notates the start of a new row. We will then set up a FOR loop for each row within that array and set Python to loop through the rows, one by one.

Within the loop we are going to use find_all again to search each row for td tags. We will add all of these to a variable called ‘cells’ and then check to make sure that there are 3 items in our ‘cells’ array (i.e. one for each column). If there are then we use the find(text=True)) option to extract the content string from within each <td> element in that row and add them to the A-C lists we created at the start of this step...

In [6]:
A=[]
B=[]
C=[]

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

Now we use our lists to create a dataframe, assigning each of the lists A-C into a column with the name of our source table columns i.e Postal Code, Borough, Neighbourhood...

In [7]:
df=pd.DataFrame(A,columns=['Postal Code'])
df['Borough']=B
df['Neighbourhood']=C
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Now we must ignore all postal codes without an assigned borough. To do this, first we should ensure the data type of values in the 'Borough' column is 'string'...

In [8]:
df['Borough']=df['Borough'].astype('str')
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned\n,Not assigned
1,M2A,Not assigned\n,Not assigned
2,M3A,North York\n,Parkwoods
3,M4A,North York\n,Victoria Village
4,M5A,Downtown Toronto\n,"Regent Park, Harbourfront"


There must have been a new line after each value in the table, notated by '\n'. We can clean our table by removing every '\n' from the Neighbourhood column...

In [9]:
df = df.replace('\n','', regex=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


That's better! Now we can drop all postal codes which have 'Not assigned' as their borough...

In [10]:
df.drop(df[df.Borough=='Not assigned'].index, axis=0, inplace=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


If more than one neighborhood exists in one postal code area, we want to combine all rows for that postal code into one row, with the neighborhoods comma separated. We can do this using the code below...

In [11]:
df.groupby('Postal Code').agg({'Borough' : 'first', 'Neighbourhood' : ','.join})

Unnamed: 0_level_0,Borough,Neighbourhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,Scarborough,"Malvern, Rouge"
M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
M1E,Scarborough,"Guildwood, Morningside, West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae
M1J,Scarborough,Scarborough Village
M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
M1N,Scarborough,"Birch Cliff, Cliffside West"


We have dropped all postal codes without a borough, but let's check to see whether any neighbourhoods are left unassigned. First we should clean the 'Neighbourhood' column by converting values to type 'string' and removing all '\n's, just as we did with the Borough column...

In [12]:
df['Neighbourhood']=df['Neighbourhood'].astype('str')
df = df.replace('\n','', regex=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Now we can see if any of the remaining postal codes in our table have 'Not assigned' as the value in the 'Neighbourhood' column...

In [13]:
df[df.Neighbourhood == 'Not assigned'].count()

Postal Code      0
Borough          0
Neighbourhood    0
dtype: int64

Perfect! Now that we have cleaned our dataframe, let's see how many rows we are left with...

In [14]:
df.shape

(103, 3)

# 103 rows and 3 columns!

We will read the following csv file, which contains a list of postal codes and the corresponsing co-ordinates, into a pandas dataframe. Our new dataframe will be called 'Coords'...

In [15]:
Coords=pd.read_csv('http://cocl.us/Geospatial_data')

In [16]:
Coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now we can use the 'set_index' function to set the 'Postal Code' column in both dataframes ('Coords' and 'df') as the index, then join the two dataframes into one.
Our new dataframe will be called 'df_Toronto' and will contain a list of all Toronto postal codes along with their corresponding borough, neighbourhoods, latitude and logitude...

In [17]:
df_Toronto=df.set_index('Postal Code').join(Coords.set_index('Postal Code'))
df_Toronto.head()

Unnamed: 0_level_0,Borough,Neighbourhood,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M3A,North York,Parkwoods,43.753259,-79.329656
M4A,North York,Victoria Village,43.725882,-79.315572
M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


# Success!

## Now we want to explore and cluster our Toronto neighbourhoods. 
So that we can visualize our neighbourhoods on a map, we need to import some more libraries...
* We install <b>Geopy</b> and import <b>Nominatim</b> so that we can convert addresses into latitude and longitude
* We install and import <b>Folium</b> so that we can render maps
* We import <b>malplotlib</b> so that we can visualize our data
* We import <b>Kmeans</b> from the <b>sklearn</b> library so that we can cluster our neighbourhoods
* We import <b>requests</b> so that we can handle requests
* We import <b>json_normalize</b> so that we can transform JSON files into pandas dataframes
* We import <b>numpy</b> to handle data

In [18]:
!pip install geopy
from geopy.geocoders import Nominatim



In [19]:
!pip install folium
import folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 6.5MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


In [20]:
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import requests
from pandas.io.json import json_normalize
import numpy as np

Now we use the geopy library to get the latitude and longitude values of Toronto, first defining our user_agent as 'toronto_explorer'...

In [21]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical co-ordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical co-ordinates of Toronto are 43.6534817, -79.3839347.


## Ok, great! Now we can create a map of Toronto with the neighbourhoods superimposed on top...

In [24]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, borough, neighbourhood in zip(df_Toronto['Latitude'], df_Toronto['Longitude'], df_Toronto['Borough'], df_Toronto['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Nice! Now let's explore our neighbourhoods using the Foursquare API...

In [25]:
CLIENT_ID = 'H0GR3YVC2EI2B2JKASDR0EVPDDCHOTGY3YLBQNWWL1WCKQJA' 
CLIENT_SECRET = '05MBKP205HIXRH2IUPPCNLT2XFOHFNX2VP0QLWI1JPHIVUEL'
VERSION = '20180605'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: H0GR3YVC2EI2B2JKASDR0EVPDDCHOTGY3YLBQNWWL1WCKQJA
CLIENT_SECRET:05MBKP205HIXRH2IUPPCNLT2XFOHFNX2VP0QLWI1JPHIVUEL


Let's reset the index for our dataframe, then get the latitude and longitude of the first neighbourhood in our dataframe, which is Parkwoods...

In [26]:
df_Toronto=df_Toronto.reset_index()

In [27]:
neighbourhood_latitude = df_Toronto.loc[0, 'Latitude']
neighbourhood_longitude = df_Toronto.loc[0, 'Longitude'] 

neighbourhood_name = df_Toronto.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


Now let's get the top 100 venues within 800 metres of Parkwoods...

In [28]:
LIMIT=100
radius=800
url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,neighbourhood_latitude,neighbourhood_longitude,radius,LIMIT)
results=requests.get(url).json()

results

{'meta': {'code': 200, 'requestId': '5f1947c2803a4b4e51034118'},
 'response': {'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 5,
  'suggestedBounds': {'ne': {'lat': 43.76045860720001,
    'lng': -79.31970728375885},
   'sw': {'lat': 43.7460585928, 'lng': -79.33960571624115}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',

The below function extracts the category of the venue...

In [29]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

So that we can add a category to each of the venues in our dataframe...

In [30]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues)
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,DVP at York Mills,Road,43.758899,-79.334099
3,TTC Stop #09083,Bus Stop,43.759655,-79.332223
4,TTC Stop 9083,Bus Stop,43.759251,-79.334


Now let's see how many venues are within 800m of Parkwoods...

In [92]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

5 venues were returned by Foursquare.


Let's now find the top 100 venues within 800m of all Toronto neighbourhoods...

In [32]:
def getNearbyVenues(names, latitudes, longitudes, radius=800):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Then run through all neighbourhoods to create a <i>new dataframe</i> called Toronto_venues which has the co-ordinates of each venue in each Toronto neighbourhood...

In [35]:
Toronto_venues = getNearbyVenues(names=df_Toronto['Neighbourhood'],
                                   latitudes=df_Toronto['Latitude'],
                                   longitudes=df_Toronto['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [36]:
print(Toronto_venues.shape)
Toronto_venues.head()

(3952, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,DVP at York Mills,43.758899,-79.334099,Road
3,Parkwoods,43.753259,-79.329656,TTC Stop #09083,43.759655,-79.332223,Bus Stop
4,Parkwoods,43.753259,-79.329656,TTC Stop 9083,43.759251,-79.334,Bus Stop


## So we have 3952 venues in total. Now let's see how many venues there are in each of our neighbourhoods...

In [37]:
Toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,19,19,19,19,19,19
"Alderwood, Long Branch",14,14,14,14,14,14
"Bathurst Manor, Wilson Heights, Downsview North",24,24,24,24,24,24
Bayview Village,10,10,10,10,10,10
"Bedford Park, Lawrence Manor East",40,40,40,40,40,40
Berczy Park,100,100,100,100,100,100
"Birch Cliff, Cliffside West",6,6,6,6,6,6
"Brockton, Parkdale Village, Exhibition Place",96,96,96,96,96,96
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",58,58,58,58,58,58
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",32,32,32,32,32,32


Now to analyze each neighbourhood...

In [40]:
Toronto_onehot = pd.get_dummies(Toronto_venues[['Venue Category']], prefix="", prefix_sep="")

Toronto_onehot['Neighbourhood'] = Toronto_venues['Neighbourhood'] 

fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Neighbourhood,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [41]:
Toronto_onehot.shape

(3952, 330)

Next, let's group rows by neighborhood and take the mean of the frequency of occurrence of each category...

In [42]:
Toronto_grouped = Toronto_onehot.groupby('Neighbourhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighbourhood,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000
1,"Alderwood, Long Branch",0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000
3,Bayview Village,0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.025,0.000000,0.000000
5,Berczy Park,0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.000000,0.010417,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000
8,"Business reply mail Processing Centre, South C...",0.000000,0.000000,0.000000,0.00,0.000000,0.00000,0.00000,0.0000,0.00000,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.017241
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.000000,0.000000,0.00,0.031250,0.03125,0.03125,0.0625,0.09375,...,0.000000,0.00,0.000000,0.000000,0.0,0.000000,0.000000,0.000,0.000000,0.000000


The following functions sorts the venue categories into descending order...

In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

... so that we can put this information into a dataframe (top 10 venue categories in each neighbourhood)...

In [219]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = Toronto_grouped['Neighbourhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Fabric Shop,Lounge,Sandwich Place,Discount Store,Seafood Restaurant,Latin American Restaurant,Sushi Restaurant,Motorcycle Shop,Supermarket
1,"Alderwood, Long Branch",Convenience Store,Pizza Place,Park,Coffee Shop,Pub,Gym,Sandwich Place,Discount Store,Pharmacy,Donut Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Pizza Place,Coffee Shop,Bank,Park,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Bridal Shop,Supermarket
3,Bayview Village,Japanese Restaurant,Bank,Skating Rink,Dog Run,Park,Grocery Store,Chinese Restaurant,Café,Donut Shop,Discount Store
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Pizza Place,Cosmetics Shop,Restaurant,Sandwich Place,Grocery Store,Locksmith,Bagel Shop,Bakery


## Ok, now we can use this information to cluster our neighbourhoods by venue category...

In [220]:
kclusters = 7

Toronto_grouped_clustering = Toronto_grouped.drop('Neighbourhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

kmeans.labels_[0:10] 

array([1, 2, 2, 0, 1, 1, 1, 1, 1, 1], dtype=int32)

Now we can add this to our dataframe...

In [221]:
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Agincourt,Chinese Restaurant,Fabric Shop,Lounge,Sandwich Place,Discount Store,Seafood Restaurant,Latin American Restaurant,Sushi Restaurant,Motorcycle Shop,Supermarket
1,2,"Alderwood, Long Branch",Convenience Store,Pizza Place,Park,Coffee Shop,Pub,Gym,Sandwich Place,Discount Store,Pharmacy,Donut Shop
2,2,"Bathurst Manor, Wilson Heights, Downsview North",Pizza Place,Coffee Shop,Bank,Park,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Bridal Shop,Supermarket
3,0,Bayview Village,Japanese Restaurant,Bank,Skating Rink,Dog Run,Park,Grocery Store,Chinese Restaurant,Café,Donut Shop,Discount Store
4,1,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Pizza Place,Cosmetics Shop,Restaurant,Sandwich Place,Grocery Store,Locksmith,Bagel Shop,Bakery


Now we want to merge this dataframe with the original dataframe we created containing geographical co-ordinates and borough information, called df_Toronto. In order to join the two dataframes together, both must have the 'Neighbourhood' column set as the index. 

In [222]:
Toronto_merged = df_Toronto.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

In [223]:
Toronto_merged.head()

Unnamed: 0,index,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Bus Stop,Road,Park,Food & Drink Shop,Dumpling Restaurant,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant
1,1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Portuguese Restaurant,Coffee Shop,Sporting Goods Shop,French Restaurant,Park,Pizza Place,Café,Hockey Arena,Fabric Shop,Diner
2,2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Park,Bakery,Café,Pub,Theater,Restaurant,Thai Restaurant,Breakfast Spot,Performing Arts Venue
3,3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Furniture / Home Store,Fast Food Restaurant,Fried Chicken Joint,Dessert Shop,Clothing Store,Vietnamese Restaurant,Coffee Shop,Athletics & Sports,Café,Arts & Crafts Store
4,4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Italian Restaurant,Bubble Tea Shop,Bookstore,Gastropub,Park,French Restaurant,Japanese Restaurant,Ice Cream Shop,Sushi Restaurant


In Toronto_merged, we can see that the data type of the 'Cluster Labels' values has changed from integer to float. If we want to use the cluster labels to categorize our clusters then they should be integers, however we aren't able to convert the values in the 'Cluster Labels' column from Float to Integer as there is one NaN values ('Upper Rouge' has not been assigned a cluster as there are no venues with 800m).\
Let's drop those rows...

In [224]:
Toronto_merged.isna().sum()

index                     0
Postal Code               0
Borough                   0
Neighbourhood             0
Latitude                  0
Longitude                 0
Cluster Labels            1
1st Most Common Venue     1
2nd Most Common Venue     1
3rd Most Common Venue     1
4th Most Common Venue     1
5th Most Common Venue     1
6th Most Common Venue     1
7th Most Common Venue     1
8th Most Common Venue     1
9th Most Common Venue     1
10th Most Common Venue    1
dtype: int64

In [225]:
Toronto_merged=Toronto_merged.dropna()
Toronto_merged['Cluster Labels']=Toronto_merged['Cluster Labels'].astype(int)
Toronto_merged.head()

Unnamed: 0,index,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Bus Stop,Road,Park,Food & Drink Shop,Dumpling Restaurant,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant
1,1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Portuguese Restaurant,Coffee Shop,Sporting Goods Shop,French Restaurant,Park,Pizza Place,Café,Hockey Arena,Fabric Shop,Diner
2,2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Park,Bakery,Café,Pub,Theater,Restaurant,Thai Restaurant,Breakfast Spot,Performing Arts Venue
3,3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1,Furniture / Home Store,Fast Food Restaurant,Fried Chicken Joint,Dessert Shop,Clothing Store,Vietnamese Restaurant,Coffee Shop,Athletics & Sports,Café,Arts & Crafts Store
4,4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Italian Restaurant,Bubble Tea Shop,Bookstore,Gastropub,Park,French Restaurant,Japanese Restaurant,Ice Cream Shop,Sushi Restaurant


## Perfect! Now we can plot the neghbourhood clusters onto our map of Toronto...

In [226]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighbourhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

## Finally let's review the venues we have assigned to each cluster...

<b>Cluster 0</b> is represented by blue dots. Neighbourhoods in this cluster are slightly out of town with parks, trails, leisure facilities and some grocery stores.

In [227]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,-79.329656,0,Bus Stop,Road,Park,Food & Drink Shop,Dumpling Restaurant,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant
21,M6E,-79.453512,0,Pharmacy,Café,Grocery Store,Park,Japanese Restaurant,Beer Store,Cosmetics Shop,Mexican Restaurant,Gym,Bank
39,M2K,-79.385975,0,Japanese Restaurant,Bank,Skating Rink,Dog Run,Park,Grocery Store,Chinese Restaurant,Café,Donut Shop,Discount Store
57,M9M,-79.532242,0,Gas Station,Convenience Store,Discount Store,Italian Restaurant,Park,Baseball Field,Food Service,Shipping Store,Ethiopian Restaurant,Event Space
66,M2P,-79.400049,0,Park,Tennis Court,Intersection,Convenience Store,Restaurant,Pet Store,Yoga Studio,Donut Shop,Discount Store,Distribution Center
68,M5P,-79.411307,0,Italian Restaurant,Park,Sushi Restaurant,Bagel Shop,Bakery,Bank,Japanese Restaurant,Coffee Shop,Pharmacy,Gastropub
83,M4T,-79.38316,0,Park,Grocery Store,Candy Store,Gym,Sandwich Place,Japanese Restaurant,Café,Thai Restaurant,Gym / Fitness Center,Playground
91,M4W,-79.377529,0,Trail,Park,Playground,Candy Store,Grocery Store,Bank,Yoga Studio,Donut Shop,Discount Store,Distribution Center
101,M8Y,-79.498509,0,Poutine Place,Construction & Landscaping,Park,Business Service,Falafel Restaurant,Doner Restaurant,Diner,Discount Store,Distribution Center,Farmers Market


<b>Cluster 1</b> is represented by purple dots on the map. Neighbourhoods in this cluster are close to the centre town with lots of coffee shops, restaurants, eateries, banks and hotels.

In [228]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,M4A,-79.315572,1,Portuguese Restaurant,Coffee Shop,Sporting Goods Shop,French Restaurant,Park,Pizza Place,Café,Hockey Arena,Fabric Shop,Diner
2,M5A,-79.360636,1,Coffee Shop,Park,Bakery,Café,Pub,Theater,Restaurant,Thai Restaurant,Breakfast Spot,Performing Arts Venue
3,M6A,-79.464763,1,Furniture / Home Store,Fast Food Restaurant,Fried Chicken Joint,Dessert Shop,Clothing Store,Vietnamese Restaurant,Coffee Shop,Athletics & Sports,Café,Arts & Crafts Store
4,M7A,-79.389494,1,Coffee Shop,Italian Restaurant,Bubble Tea Shop,Bookstore,Gastropub,Park,French Restaurant,Japanese Restaurant,Ice Cream Shop,Sushi Restaurant
6,M1B,-79.194353,1,Fast Food Restaurant,Coffee Shop,Trail,Spa,Hobby Shop,African Restaurant,Paper / Office Supplies Store,Chinese Restaurant,Dumpling Restaurant,Discount Store
7,M3B,-79.352188,1,Japanese Restaurant,Gym,Restaurant,Beer Store,Coffee Shop,Bike Shop,Sporting Goods Shop,Supermarket,Café,Middle Eastern Restaurant
9,M5B,-79.378937,1,Coffee Shop,Clothing Store,Japanese Restaurant,Plaza,Gastropub,Diner,Italian Restaurant,Pizza Place,Cosmetics Shop,Theater
11,M9B,-79.554724,1,Pizza Place,Hotel,Convenience Store,Café,Theater,Coffee Shop,Gym,Mexican Restaurant,Bank,Restaurant
12,M1C,-79.160497,1,Breakfast Spot,Bar,Italian Restaurant,Burger Joint,Yoga Studio,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
13,M3C,-79.340923,1,Japanese Restaurant,Gym,Restaurant,Beer Store,Coffee Shop,Bike Shop,Sporting Goods Shop,Supermarket,Café,Middle Eastern Restaurant


<b>Cluster 2</b> is represented by red dots. Neighbourhoods in this cluster are further out of town with takeaways, gas stations, and larger commercial buildings such as grocery stores and shopping malls.

In [229]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,M9A,-79.532242,2,Pharmacy,Park,Playground,Café,Shopping Mall,Grocery Store,Bank,Skating Rink,Yoga Studio,Discount Store
8,M4B,-79.309937,2,Fast Food Restaurant,Pizza Place,Brewery,Pet Store,Athletics & Sports,Gastropub,Intersection,Rock Climbing Spot,Bakery,Bank
10,M6B,-79.445073,2,Grocery Store,Fast Food Restaurant,Pizza Place,Coffee Shop,Gas Station,Playground,Restaurant,Mediterranean Restaurant,Fish Market,Flower Shop
14,M4C,-79.318389,2,Park,Pharmacy,Bus Stop,Athletics & Sports,Bus Line,Skating Rink,Curling Ice,Sandwich Place,Beer Store,Pizza Place
16,M6C,-79.428191,2,Pizza Place,Grocery Store,Field,Park,Sandwich Place,Bagel Shop,Frozen Yogurt Shop,Middle Eastern Restaurant,Italian Restaurant,Dance Studio
18,M1E,-79.188711,2,Pizza Place,Restaurant,Fast Food Restaurant,Coffee Shop,Fried Chicken Joint,Rental Car Location,Beer Store,Supermarket,Bank,Sports Bar
27,M2H,-79.363452,2,Pharmacy,Park,Coffee Shop,Ice Cream Shop,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Chinese Restaurant,Sandwich Place,Shopping Mall
28,M3H,-79.442259,2,Pizza Place,Coffee Shop,Bank,Park,Ice Cream Shop,Mediterranean Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Bridal Shop,Supermarket
29,M4H,-79.349372,2,Indian Restaurant,Coffee Shop,Sandwich Place,Turkish Restaurant,Bank,Afghan Restaurant,Grocery Store,Gas Station,Fried Chicken Joint,Fast Food Restaurant
32,M1J,-79.239476,2,Ice Cream Shop,Coffee Shop,Fast Food Restaurant,Sandwich Place,Pizza Place,Restaurant,Convenience Store,Dumpling Restaurant,Discount Store,Distribution Center


The remaining clusters are further out of town and are not as vastly populated...

In [230]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,M9L,-79.565963,3,Bakery,Arts & Crafts Store,Pizza Place,Yoga Studio,Eastern European Restaurant,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop


In [231]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,M2L,-79.374714,4,Pool,Yoga Studio,Electronics Store,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


In [232]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 5, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,M9W,-79.594054,5,Rental Car Location,Yoga Studio,Eastern European Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop


In [233]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 6, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,M1G,-79.216917,6,Coffee Shop,Park,Business Service,Convenience Store,Eastern European Restaurant,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant
49,M6L,-79.490074,6,Park,Bakery,Yoga Studio,Eastern European Restaurant,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


# Thank you for reviewing this notebook!