# Clustering of Toronto's Neighbourhoods

In [1]:
import pandas as pd
import numpy as np

## Part 1 - Scraping and Cleaning Data

Import Canadian postal codes from url using Pandas

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df

[    Postcode           Borough          Neighbourhood
 0        M1A      Not assigned           Not assigned
 1        M2A      Not assigned           Not assigned
 2        M3A        North York              Parkwoods
 3        M4A        North York       Victoria Village
 4        M5A  Downtown Toronto           Harbourfront
 ..       ...               ...                    ...
 283      M8Z         Etobicoke              Mimico NW
 284      M8Z         Etobicoke     The Queensway West
 285      M8Z         Etobicoke  Royal York South West
 286      M8Z         Etobicoke         South of Bloor
 287      M9Z      Not assigned           Not assigned
 
 [288 rows x 3 columns],
                                                   0   \
 0                                                NaN   
 1  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
 2                                                 NL   
 3                                                  A   
 
                          

Only the first table in the dataframe is of interest

In [3]:
df = df[0]
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
...,...,...,...
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West
286,M8Z,Etobicoke,South of Bloor


This dataframe has the requested three columns: PostalCode, Borough, and Neighbourhood

Only interested in cells that have an assigned borough. Cells with a borough that is Not assigned are dropped.

In [4]:
indexes_to_drop = df.index[df['Borough'] == 'Not assigned']
indexes_to_drop

Int64Index([  0,   1,   9,  13,  20,  21,  30,  36,  37,  45,  46,  50,  51,
             52,  54,  55,  59,  60,  61,  73,  74,  75,  88,  89,  90, 104,
            105, 106, 120, 121, 136, 137, 148, 149, 155, 161, 162, 167, 175,
            181, 182, 188, 189, 190, 194, 195, 201, 202, 203, 204, 209, 210,
            223, 224, 237, 238, 241, 242, 247, 248, 253, 254, 258, 259, 260,
            261, 263, 264, 274, 275, 276, 277, 278, 279, 280, 281, 287],
           dtype='int64')

In [5]:
df.drop(indexes_to_drop, inplace=True)
df

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
...,...,...,...
282,M8Z,Etobicoke,Kingsway Park South West
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West


When more than one Neighbourhood exist in one postal code area the rows are to be combined into one row with the Neighbourhoods separated by a comma.

In [6]:
df_grouped = df.groupby(['Postcode'])

In [7]:
Neighbourhoods = df_grouped.agg(list)
Neighbourhoods

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,"[Scarborough, Scarborough]","[Rouge, Malvern]"
M1C,"[Scarborough, Scarborough, Scarborough]","[Highland Creek, Rouge Hill, Port Union]"
M1E,"[Scarborough, Scarborough, Scarborough]","[Guildwood, Morningside, West Hill]"
M1G,[Scarborough],[Woburn]
M1H,[Scarborough],[Cedarbrae]
...,...,...
M9N,[York],[Weston]
M9P,[Etobicoke],[Westmount]
M9R,"[Etobicoke, Etobicoke, Etobicoke, Etobicoke]","[Kingsview Village, Martin Grove Gardens, Rich..."
M9V,"[Etobicoke, Etobicoke, Etobicoke, Etobicoke, E...","[Albion Gardens, Beaumond Heights, Humbergate,..."


avoid the repeating lists in Borough by replacing the column with the first list value

In [8]:
Neighbourhoods['Borough'] = Neighbourhoods['Borough'].apply(lambda x: x[0])
Neighbourhoods

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,Scarborough,"[Rouge, Malvern]"
M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
M1G,Scarborough,[Woburn]
M1H,Scarborough,[Cedarbrae]
...,...,...
M9N,York,[Weston]
M9P,Etobicoke,[Westmount]
M9R,Etobicoke,"[Kingsview Village, Martin Grove Gardens, Rich..."
M9V,Etobicoke,"[Albion Gardens, Beaumond Heights, Humbergate,..."


change the Neighbourhood column content from a list to a joined string

In [9]:
Neighbourhoods['Neighbourhood'] = Neighbourhoods['Neighbourhood'].apply(lambda x: ', '.join(x))
Neighbourhoods

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,Scarborough,"Rouge, Malvern"
M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
M1E,Scarborough,"Guildwood, Morningside, West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae
...,...,...
M9N,York,Weston
M9P,Etobicoke,Westmount
M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."


If a cell has a borough but a 'Not assigned' Neighbourhood, then the Neighbourhood should be set equal the borough.

Check for incidences of Neighbourhood = 'Not assigned':

In [10]:
NA_indexes = Neighbourhoods.index[Neighbourhoods['Neighbourhood'] == 'Not assigned']
NA_indexes

Index(['M7A'], dtype='object', name='Postcode')

In [11]:
Neighbourhoods.loc[NA_indexes, :]

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M7A,Queen's Park,Not assigned


One incidence was found and is being fixed manually below

In [12]:
Neighbourhoods.at['M7A', 'Neighbourhood'] = "Queen's Park"
Neighbourhoods.loc[['M7A'], :]

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M7A,Queen's Park,Queen's Park


The requested dataframe have now been generated. However due to row indexing differences the following extract is made for easy comparision

In [13]:
Neighbourhoods.loc[['M5G', 'M2H', 'M4B', 'M1J', 'M4G', 'M4M', 'M1R', 'M9V', 'M9L', 'M5V', 'M1B', 'M5A'], :].reset_index()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M5G,Downtown Toronto,Central Bay Street
1,M2H,North York,Hillcrest Village
2,M4B,East York,"Woodbine Gardens, Parkview Hill"
3,M1J,Scarborough,Scarborough Village
4,M4G,East York,Leaside
5,M4M,East Toronto,Studio District
6,M1R,Scarborough,"Maryvale, Wexford"
7,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."
8,M9L,North York,Humber Summit
9,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo..."


as requested please find the shape of the dataframe below

In [14]:
Neighbourhoods.reset_index().shape

(103, 3)

## Part 2 - Add Geospatial Coordinates

Load the Geospatial Coordinates into a Pandas dataframe via the provided .csv file

In [15]:
df = pd.read_csv('Geospatial_Coordinates.csv')
df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


rename 'Postal Code' column so it matches

In [16]:
df.rename(columns={'Postal Code':'Postcode'}, inplace=True)

add the coordinates to the dataframe by joining the two tables

In [17]:
Neighbourhoods = Neighbourhoods.join(df.set_index('Postcode'))

The requested dataframe have now been generated. However due to row indexing differences the following extract is made for easy comparision

In [18]:
Neighbourhoods.loc[['M5G', 'M2H', 'M4B', 'M1J', 'M4G', 'M4M', 'M1R', 'M9V', 'M9L', 'M5V', 'M1B', 'M5A'], :].reset_index()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
1,M2H,North York,Hillcrest Village,43.803762,-79.363452
2,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
3,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
4,M4G,East York,Leaside,43.70906,-79.363452
5,M4M,East Toronto,Studio District,43.659526,-79.340923
6,M1R,Scarborough,"Maryvale, Wexford",43.750072,-79.295849
7,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437
8,M9L,North York,Humber Summit,43.756303,-79.565963
9,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo...",43.628947,-79.39442


In [19]:
Neighbourhoods.reset_index(inplace=True)

## Part 3 - Cluster Analysis

Exploration and clustering of the neighbourhoods in Toronto. The analysis is by far a replicate of the NY analysis in the exercise module.

#### imports

In [20]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans # import k-means from clustering stage
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#### Use geopy library to get the latitude and longitude values of Toronto.

In [21]:
address = 'Toronto City, CA'

geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.7189883, -79.44157.


#### Map of Toronto with Neighbourhoods superimposed on top

In [22]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, Neighbourhood in zip(Neighbourhoods['Latitude'], Neighbourhoods['Longitude'], Neighbourhoods['Borough'], Neighbourhoods['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

For now all the neighbourhoods are considered.

In [23]:
#toronto_data = Neighbourhoods[Neighbourhoods['Borough'].str.contains("Toronto")].reset_index(drop=True)
#toronto_data.head()

#### Define Foursquare Credentials and Version

In [24]:
CLIENT_ID = 'QUYUMKRM1VQVFX1G0I5G5SJFJB4PHR3T1VQMHUW1COOD5UEJ' # your Foursquare ID
CLIENT_SECRET = 'KCN02PP1RIXEKIGXSHITULMECMZJ1Q3LDJVBICDX1L1GKPR5' # your Foursquare Secret
VERSION = '20190825' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QUYUMKRM1VQVFX1G0I5G5SJFJB4PHR3T1VQMHUW1COOD5UEJ
CLIENT_SECRET:KCN02PP1RIXEKIGXSHITULMECMZJ1Q3LDJVBICDX1L1GKPR5


#### Get the top 100 venues that are within a radius of 500 meters

input parameters

In [25]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

defines the function that extracts the 100 venues for each neighbourhoods

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

excecute the above function on each Neighbourhood and create a new dataframe called *toronto_venues*.

In [27]:
toronto_venues = getNearbyVenues(names=Neighbourhoods['Neighbourhood'],
                                   latitudes=Neighbourhoods['Latitude'],
                                   longitudes=Neighbourhoods['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

#### Evaluate the dataframe content

In [28]:
print(toronto_venues.shape)
toronto_venues.head()

(2248, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,RIGHT WAY TO GOLF,43.785177,-79.161108,Golf Course
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


check if all Neighbourhood are still represented

In [29]:
toronto_venues['Neighbourhood'].unique().shape

(100,)

Three neighbourhood are missing because they did not have any venues within the specified radius.

Let's check how many venues were returned for each Neighbourhood

In [30]:
sorted_count = toronto_venues.groupby('Neighbourhood').count().sort_values(by='Venue', ascending=False)
sorted_count

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
St. James Town,100,100,100,100,100,100
"Ryerson, Garden District",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"Harbourfront East, Toronto Islands, Union Station",100,100,100,100,100,100
Stn A PO Boxes 25 The Esplanade,98,98,98,98,98,98
Church and Wellesley,86,86,86,86,86,86


For simplicity only neighbourhoods with at least 20 nearby venues are considered from now on.

In [31]:
# indexes to be kept / which have more than 20 venues
neighbourhood_indexes_to_keep = sorted_count[sorted_count['Venue'] > 20].index
neighbourhood_indexes_to_keep.shape

(29,)

In [32]:
# reduce the number of condidered neighbourhoods
toronto_venues = toronto_venues.set_index('Neighbourhood').loc[neighbourhood_indexes_to_keep].reset_index()
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Adelaide, King, Richmond",43.650571,-79.384568,Four Seasons Centre for the Performing Arts,43.650592,-79.385806,Concert Hall
1,"Adelaide, King, Richmond",43.650571,-79.384568,The Keg Steakhouse & Bar,43.649937,-79.384196,Steakhouse
2,"Adelaide, King, Richmond",43.650571,-79.384568,Nathan Phillips Square,43.65227,-79.383516,Plaza
3,"Adelaide, King, Richmond",43.650571,-79.384568,Rosalinda,43.650252,-79.385156,Vegetarian / Vegan Restaurant
4,"Adelaide, King, Richmond",43.650571,-79.384568,Shangri-La Toronto,43.649129,-79.386557,Hotel


In [33]:
# reduce the number of condidered neighbourhoods
Neighbourhoods = Neighbourhoods.set_index('Neighbourhood').loc[neighbourhood_indexes_to_keep].sort_values(by='Postcode').reset_index()
Neighbourhoods.head()

Unnamed: 0,Neighbourhood,Postcode,Borough,Latitude,Longitude
0,"Fairview, Henry Farm, Oriole",M2J,North York,43.778517,-79.346556
1,Willowdale South,M2N,North York,43.77012,-79.408493
2,"Flemingdon Park, Don Mills South",M3C,North York,43.7259,-79.340923
3,Leaside,M4G,East York,43.70906,-79.363452
4,"The Danforth West, Riverdale",M4K,East Toronto,43.679557,-79.352188


#### Find out how many unique categories can be curated from all the returned venues

In [34]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 220 uniques categories.


#### Analyze the Neighbourhoods

In [35]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add Neighbourhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move Neighbourhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Indoor Play Area,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
4,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [36]:
toronto_onehot.shape

(1757, 221)

#### Next, let's group rows by Neighbourhood and by taking the mean of the frequency of occurrence of each category

In [37]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Indoor Play Area,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.07,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0
1,"Bedford Park, Lawrence Manor East",0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.017544,0.035088,0.0,0.0,0.0,0.0,0.017544,0.017544,0.035088,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.035088,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.017544,0.052632,0.087719,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.035088,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.040816,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.040816,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.020408,0.020408,0.061224,0.020408,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.035294,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.035294,0.0,0.0,0.011765,0.0,0.141176,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.0,0.011765,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.047059,0.023529,0.0,0.0,0.0,0.0,0.047059,0.023529,0.0,0.0,0.0,0.011765,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.011765,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.035294,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.023529,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.011765,0.0,0.0,0.011765


#### Let's confirm the new size

In [38]:
toronto_grouped.shape

(29, 221)

#### Let's print each Neighbourhood along with the top 5 most common venues

In [39]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
          venue  freq
0   Coffee Shop  0.07
1          Café  0.05
2    Steakhouse  0.04
3           Bar  0.04
4  Burger Joint  0.03


----Bedford Park, Lawrence Manor East----
                venue  freq
0         Coffee Shop  0.09
1  Italian Restaurant  0.09
2           Juice Bar  0.05
3      Sandwich Place  0.05
4    Sushi Restaurant  0.05


----Berczy Park----
            venue  freq
0     Coffee Shop  0.09
1    Cocktail Bar  0.05
2          Bakery  0.04
3     Cheese Shop  0.04
4  Farmers Market  0.04


----Cabbagetown, St. James Town----
         venue  freq
0  Pizza Place  0.06
1  Coffee Shop  0.06
2         Café  0.04
3         Park  0.04
4          Pub  0.04


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.14
1                Café  0.06
2      Ice Cream Shop  0.05
3  Italian Restaurant  0.05
4      Sandwich Place  0.04


----Chinatown, Grange Park, Kensington Market----
                           venue  freq
0    

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each Neighbourhood.

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
Neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    Neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

Neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,Hotel,American Restaurant,Burger Joint,Restaurant,Breakfast Spot,Cosmetics Shop
1,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Comfort Food Restaurant,Pharmacy,Café,Sandwich Place,Restaurant,Pub,Pizza Place,Cosmetics Shop
2,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Italian Restaurant,Steakhouse,Seafood Restaurant,Beer Bar,Café,Bakery,Farmers Market
3,"Cabbagetown, St. James Town",Coffee Shop,Pizza Place,Restaurant,Pub,Italian Restaurant,Park,Café,Bakery,Grocery Store,Farmers Market
4,Central Bay Street,Coffee Shop,Café,Ice Cream Shop,Italian Restaurant,Burger Joint,Sandwich Place,Chinese Restaurant,Bar,Bubble Tea Shop,Salad Place


#### Cluster Neighbourhoods

Run *k*-means to cluster the Neighbourhood into 5 clusters.

In [42]:
# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 1, 2, 1, 0, 1, 0, 1, 0, 1, 2, 0, 0, 2, 1, 2, 1, 1, 0,
       1, 0, 0, 0, 0, 1, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighbourhood.

In [43]:
# drop Cluster Labels if the column already exist
try: 
    Neighbourhoods_venues_sorted.drop(columns='Cluster Labels', inplace=True)
except:
    pass

In [44]:
# add clustering labels
Neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [45]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each Neighbourhood
toronto_merged = Neighbourhoods.join(Neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
toronto_merged.head()

Unnamed: 0,Neighbourhood,Postcode,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Fairview, Henry Farm, Oriole",M2J,North York,43.778517,-79.346556,1,Clothing Store,Fast Food Restaurant,Coffee Shop,Women's Store,Bus Station,Sporting Goods Shop,Asian Restaurant,Bakery,Japanese Restaurant,Burrito Place
1,Willowdale South,M2N,North York,43.77012,-79.408493,1,Ramen Restaurant,Coffee Shop,Restaurant,Café,Sandwich Place,Japanese Restaurant,Sushi Restaurant,Pizza Place,Ice Cream Shop,Middle Eastern Restaurant
2,"Flemingdon Park, Don Mills South",M3C,North York,43.7259,-79.340923,1,Gym,Beer Store,Asian Restaurant,Coffee Shop,Discount Store,Sandwich Place,Bike Shop,Restaurant,Clothing Store,Fast Food Restaurant
3,Leaside,M4G,East York,43.70906,-79.363452,1,Coffee Shop,Sporting Goods Shop,Burger Joint,Furniture / Home Store,Grocery Store,Shopping Mall,Food & Drink Shop,Sports Bar,Breakfast Spot,Brewery
4,"The Danforth West, Riverdale",M4K,East Toronto,43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Pub,Spa,Brewery,Bubble Tea Shop


In [46]:
#toronto_merged[toronto_merged['Cluster Labels'].isnull()]

Finally, let's visualize the resulting clusters

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

the clusters appear to have a very distinct geographical pattern

#### Examine the Clusters

Finally, each of the clusters are examined and the different categories are attempted described.

#### Cluster 1

In [48]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,M4M,0,Café,Coffee Shop,American Restaurant,Italian Restaurant,Bakery,Yoga Studio,Coworking Space,Seafood Restaurant,Sandwich Place,Cheese Shop
8,M4X,0,Coffee Shop,Pizza Place,Restaurant,Pub,Italian Restaurant,Park,Café,Bakery,Grocery Store,Farmers Market
10,M5A,0,Coffee Shop,Café,Pub,Park,Bakery,Gym / Fitness Center,Mexican Restaurant,Breakfast Spot,Theater,Event Space
12,M5C,0,Coffee Shop,Restaurant,Café,Hotel,Italian Restaurant,Beer Bar,Cosmetics Shop,Gastropub,Cocktail Bar,Clothing Store
13,M5E,0,Coffee Shop,Cocktail Bar,Cheese Shop,Italian Restaurant,Steakhouse,Seafood Restaurant,Beer Bar,Café,Bakery,Farmers Market
15,M5H,0,Coffee Shop,Café,Steakhouse,Bar,Hotel,American Restaurant,Burger Joint,Restaurant,Breakfast Spot,Cosmetics Shop
16,M5J,0,Coffee Shop,Aquarium,Hotel,Italian Restaurant,Café,Bakery,Pizza Place,Fried Chicken Joint,Sporting Goods Shop,Scenic Lookout
17,M5K,0,Coffee Shop,Hotel,Café,Restaurant,Italian Restaurant,Bar,Deli / Bodega,Gastropub,American Restaurant,Gym
18,M5L,0,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gym,Seafood Restaurant,Deli / Bodega,Steakhouse,Bakery
19,M5M,0,Italian Restaurant,Coffee Shop,Comfort Food Restaurant,Pharmacy,Café,Sandwich Place,Restaurant,Pub,Pizza Place,Cosmetics Shop


Cluster 1 looks like it's an area with high density of Coffee shops and cafes.

#### Cluster 2

In [49]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M2J,1,Clothing Store,Fast Food Restaurant,Coffee Shop,Women's Store,Bus Station,Sporting Goods Shop,Asian Restaurant,Bakery,Japanese Restaurant,Burrito Place
1,M2N,1,Ramen Restaurant,Coffee Shop,Restaurant,Café,Sandwich Place,Japanese Restaurant,Sushi Restaurant,Pizza Place,Ice Cream Shop,Middle Eastern Restaurant
2,M3C,1,Gym,Beer Store,Asian Restaurant,Coffee Shop,Discount Store,Sandwich Place,Bike Shop,Restaurant,Clothing Store,Fast Food Restaurant
3,M4G,1,Coffee Shop,Sporting Goods Shop,Burger Joint,Furniture / Home Store,Grocery Store,Shopping Mall,Food & Drink Shop,Sports Bar,Breakfast Spot,Brewery
4,M4K,1,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Pub,Spa,Brewery,Bubble Tea Shop
6,M4R,1,Coffee Shop,Sporting Goods Shop,Clothing Store,Yoga Studio,Mexican Restaurant,Park,Spa,Café,Salon / Barbershop,Chinese Restaurant
7,M4S,1,Dessert Shop,Sandwich Place,Pizza Place,Restaurant,Sushi Restaurant,Thai Restaurant,Café,Italian Restaurant,Coffee Shop,Pharmacy
9,M4Y,1,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Hotel,Fast Food Restaurant,Burger Joint,Café,Pub
11,M5B,1,Coffee Shop,Clothing Store,Cosmetics Shop,Café,Middle Eastern Restaurant,Fast Food Restaurant,Pizza Place,Bookstore,Plaza,Italian Restaurant
14,M5G,1,Coffee Shop,Café,Ice Cream Shop,Italian Restaurant,Burger Joint,Sandwich Place,Chinese Restaurant,Bar,Bubble Tea Shop,Salad Place


Cluster 2 looks like an area with more clothing stores

#### Cluster 3

In [50]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Postcode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,M5S,2,Café,Restaurant,Bookstore,Japanese Restaurant,Bakery,Bar,College Arts Building,Sandwich Place,Chinese Restaurant,Pub
22,M5T,2,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Bar,Bakery,Vietnamese Restaurant,Dumpling Restaurant,Mexican Restaurant,Coffee Shop,Cocktail Bar
25,M6J,2,Bar,Coffee Shop,Asian Restaurant,Cocktail Bar,Restaurant,Bakery,Pizza Place,Men's Store,French Restaurant,Café
26,M6P,2,Café,Mexican Restaurant,Bar,Fast Food Restaurant,Flea Market,Speakeasy,Bowling Alley,Bookstore,Fried Chicken Joint,Furniture / Home Store


Cluster 3 looks like an area with a lot of bars and cafes.

End of assignment