# Segmenting and Clustering Neighborhoods in Toronto

## Task 1 Transform the Wiipedia page table into data frame and clean it

**1. Scrape the following Wikipedia page**

In [1]:
## install libraries and packages
import pandas as pd
%pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [2]:
## Read HTML table into a list of DataFrame objects
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_list=pd.read_html(url) 
df_list

[    Postal Code           Borough  \
 0           M1A      Not assigned   
 1           M2A      Not assigned   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 ..          ...               ...   
 175         M5Z      Not assigned   
 176         M6Z      Not assigned   
 177         M7Z      Not assigned   
 178         M8Z         Etobicoke   
 179         M9Z      Not assigned   
 
                                          Neighbourhood  
 0                                         Not assigned  
 1                                         Not assigned  
 2                                            Parkwoods  
 3                                     Victoria Village  
 4                            Regent Park, Harbourfront  
 ..                                                 ...  
 175                                       Not assigned  
 176                                       Not assigned  
 177                

In [3]:
## select the target table
df=df_list[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


We are required to make the neighborhood the same as the borough, if a cell has a borough but a Not assigned neighborhood. However, from the following lines of code we can reach the conclusion that if a Neighbourhood is not assigned, the corresponding neighborhood is also not assigned. 

In [4]:
## select the Boroughs where the neighbourhood is not assigned and store them in df1
df1=df[df['Neighbourhood']=='Not assigned']
df1.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
7,M8A,Not assigned,Not assigned
10,M2B,Not assigned,Not assigned
15,M7B,Not assigned,Not assigned


In [5]:
## select assigned boroughs that have a not assigned neighbourhood
df2=df1[df1['Borough']!='Not assigned']
df2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood


In [6]:
## Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned
## Read HTML tables into a list of DataFrame objects and convert Not assigned to NaN
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_list=pd.read_html(url,na_values=["Not assigned"]) 
df_list

[    Postal Code           Borough  \
 0           M1A               NaN   
 1           M2A               NaN   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 ..          ...               ...   
 175         M5Z               NaN   
 176         M6Z               NaN   
 177         M7Z               NaN   
 178         M8Z         Etobicoke   
 179         M9Z               NaN   
 
                                          Neighbourhood  
 0                                                  NaN  
 1                                                  NaN  
 2                                            Parkwoods  
 3                                     Victoria Village  
 4                            Regent Park, Harbourfront  
 ..                                                 ...  
 175                                                NaN  
 176                                                NaN  
 177                

**2. Transform the data into a pandas dataframe which consist of three columns: PostalCode, Borough and Neighborhood**

In [7]:
## select the target table
df=df_list[0]
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,,
176,M6Z,,
177,M7Z,,
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


**3. Ignore cells with a borough that is Not assigned.Reset the index and group the data by PostalCode so that two neighborhoods with the same PostalCode are combined into one row with neighborhoods separated with a commoa.**

In [8]:
df.dropna(subset=["Borough"],axis=0, inplace=True)

In [9]:
df.reset_index(drop=True,inplace=True)

In [10]:
df.groupby("Postal Code")
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


**4. Use .shape method to print the number of rows of the data frame**

In [11]:
print('Data Frame has {} rows.'.format(df.shape[0]))

Data Frame has 103 rows.


## Task 2 Get Latitude and Longitude of each Postal Code and add it into the datafrome 

I tried to use Geocider Python package and run the while loop to get postal code a lot of times. But it didn't work. I decided to use the cvs file under the link ( http://cocl.us/Geospatial_data) to get the geographical coordinates.

In [12]:
data_url='http://cocl.us/Geospatial_data'
df_Coor=pd.read_csv(data_url)
df_Coor.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
df_merge=pd.merge(df,df_Coor,on='Postal Code')
df_merge.head(11)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## Task 3 Explore and cluster the neighborhoods in Toronto

I am going to work with only boroughs that contain the word Toronto i.e. segment and cluster only boroughs that contain the word Toronto. 

In [14]:
# select data of Boroughs that contain the word Toronto
df_Toronto=df_merge[df_merge['Borough'].str.contains("Toronto")]
df_Toronto.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [15]:
# find out how many Boroughs that contain the word Toronto
print('There are {} Boroughs that contain the word Toronto.'.format(df_Toronto.shape[0])) 

There are 39 Boroughs that contain the word Toronto.


In [16]:
# reindex the data frame
df_Toronto.reset_index(drop=True,inplace=True)
df_Toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


Get the geographical coordinates of Toronto

In [17]:
## import libraries and packages
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into Latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # transform JSON file into a pandas dataframe

# import Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib. colors as colors

# import k-means from clustering stage
from sklearn. cluster import KMeans

import folium # map rendering library

print('Libraries imported')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

In [18]:
address='Toronto, Canada'
geolocator = Nominatim (user_agent="ca_explorer")
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geographical coordinate of Toronto are {},{}.'.format(latitude, longitude))

The geographical coordinate of Toronto are 43.6534817,-79.3839347.


Visualize Boroughs in Toronto that contains the word Toronto and the neighborhoods in it. 

In [19]:
#create map of Toronto using latitude and longitude values
map_toronto=folium.Map(location=[latitude, longitude],zoom_start=11)

#add markers to map
for lat, lng,label in zip(df_Toronto['Latitude'], df_Toronto['Longitude'], df_Toronto['Neighbourhood']):
    label=folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

map_toronto

Utilizing the Foursquare API to explore the neighbourhoods and segment them

In [20]:
CLIENT_ID='PHXA1F5GG3YMULB2ZTRIFNGFAX2Y1Y4LDTCJSXN0ROJO30S0'
CLIENT_SECRET='0LLYYF3HXS4DDBKK2KCHKPB33IKCRJUGSZZBC2TXKCXDGJDY'
VERSION='20180605'
LIMIT=10
print('Your credentials:')
print('CLIENT_ID:' + CLIENT_ID)
print('CLIENT_SECRET:'+CLIENT_SECRET)

Your credentials:
CLIENT_ID:PHXA1F5GG3YMULB2ZTRIFNGFAX2Y1Y4LDTCJSXN0ROJO30S0
CLIENT_SECRET:0LLYYF3HXS4DDBKK2KCHKPB33IKCRJUGSZZBC2TXKCXDGJDY


In [21]:
def getNearbyVenues(borough,neighbourhood, latitudes, longitudes, radius=500):
    venues_list=[]
    for borough, neighbourhood, lat, lng in zip (borough, neighbourhood, latitudes, longitudes):
        print(borough,neighbourhood)
        
        # create the API request URL
        url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        lng,
        radius,
        LIMIT)
        
        # make the GET request
        results=requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
        borough,
        neighbourhood,
        lat,
        lng,
        v['venue']['name'],
        v['venue']['location']['lat'],
        v['venue']['location']['lng'],
        v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues= pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns=['Borough',
                           'Neighbourhood',
                           'Neighbourhood Latitude',
                           'Neighbourhood Longitude',
                           'Venue',
                           'Venue Latitude',
                           'Venue Longitude',
                           'Venue Category']
    return(nearby_venues)
        

    

In [22]:
Toronto_venues=getNearbyVenues(borough=df_Toronto['Borough'],
                               neighbourhood=df_Toronto['Neighbourhood'],
                               latitudes=df_Toronto['Latitude'],
                               longitudes=df_Toronto['Longitude'])
Toronto_venues.head()

Downtown Toronto Regent Park, Harbourfront
Downtown Toronto Queen's Park, Ontario Provincial Government
Downtown Toronto Garden District, Ryerson
Downtown Toronto St. James Town
East Toronto The Beaches
Downtown Toronto Berczy Park
Downtown Toronto Central Bay Street
Downtown Toronto Christie
Downtown Toronto Richmond, Adelaide, King
West Toronto Dufferin, Dovercourt Village
Downtown Toronto Harbourfront East, Union Station, Toronto Islands
West Toronto Little Portugal, Trinity
East Toronto The Danforth West, Riverdale
Downtown Toronto Toronto Dominion Centre, Design Exchange
West Toronto Brockton, Parkdale Village, Exhibition Place
East Toronto India Bazaar, The Beaches West
Downtown Toronto Commerce Court, Victoria Hotel
East Toronto Studio District
Central Toronto Lawrence Park
Central Toronto Roselawn
Central Toronto Davisville North
Central Toronto Forest Hill North & West, Forest Hill Road Park
West Toronto High Park, The Junction South
Central Toronto North Toronto West, Lawrenc

Unnamed: 0,Borough,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


Check the size of the DataFrame Toronto_venues

In [23]:
print(Toronto_venues.shape)
Toronto_venues.head()

(348, 8)


Unnamed: 0,Borough,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


Check the number of venues that were returned for each neighbourhood.

In [25]:
Toronto_venues.groupby(['Borough','Neighbourhood']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Neighbourhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Central Toronto,Davisville,10,10,10,10,10,10
Central Toronto,Davisville North,9,9,9,9,9,9
Central Toronto,"Forest Hill North & West, Forest Hill Road Park",4,4,4,4,4,4
Central Toronto,Lawrence Park,3,3,3,3,3,3
Central Toronto,"Moore Park, Summerhill East",2,2,2,2,2,2
Central Toronto,"North Toronto West, Lawrence Park",10,10,10,10,10,10
Central Toronto,Roselawn,2,2,2,2,2,2
Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park",10,10,10,10,10,10
Central Toronto,"The Annex, North Midtown, Yorkville",10,10,10,10,10,10
Downtown Toronto,Berczy Park,10,10,10,10,10,10


Find out how many unique categories can be curated from all the returned venues

In [26]:
print('There are {} uniques categories.'.format(len(Toronto_venues['Venue Category'].unique())))

There are 120 uniques categories.


One hot encoding

In [49]:
# one hot encoding
Toronto_onehot=pd.get_dummies(Toronto_venues[['Venue Category']],prefix="",prefix_sep="")
Toronto_onehot.head()

Unnamed: 0,Airport,Airport Food Court,Airport Gate,Airport Lounge,American Restaurant,Antique Shop,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Distribution Center,Dog Run,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Historic Site,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Korean Restaurant,Lake,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Skate Park,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Tailor Shop,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [50]:
# add neighbourhood column back to dataframe
Toronto_onehot['Borough']=Toronto_venues['Borough']
Toronto_onehot['Neighbourhood']=Toronto_venues['Neighbourhood']

# move neighborhood column to the first column
fixed_columns=[Toronto_onehot.columns[-2]]+[Toronto_onehot.columns[-1]]+list(Toronto_onehot.columns[:-2])
Toronto_onehot=Toronto_onehot[fixed_columns]
Toronto_onehot.head()

Unnamed: 0,Borough,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,American Restaurant,Antique Shop,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Distribution Center,Dog Run,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Historic Site,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Korean Restaurant,Lake,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Skate Park,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Tailor Shop,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Examine the new dataframe size.

In [56]:
Toronto_onehot.shape

(348, 122)

In [57]:
Toronto_grouped=Toronto_onehot.groupby(['Borough','Neighbourhood'])[Toronto_onehot.columns[2:]].mean().reset_index()
Toronto_grouped.head(15)

Unnamed: 0,Borough,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,American Restaurant,Antique Shop,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Distribution Center,Dog Run,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Historic Site,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Korean Restaurant,Lake,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Skate Park,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Tailor Shop,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,Central Toronto,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Central Toronto,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
3,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Toronto,"Moore Park, Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
5,Central Toronto,"North Toronto West, Lawrence Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
6,Central Toronto,Roselawn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Central Toronto,"The Annex, North Midtown, Yorkville",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Downtown Toronto,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0


Confirm the new size of the Toronto_grouped

In [58]:
Toronto_grouped.shape

Print each borough and neighbourhood along with the top 5 most common venues.

In [77]:
for hood in Toronto_grouped['Neighbourhood']:
        print("----"+hood+"----")
        temp=Toronto_grouped[Toronto_grouped['Neighbourhood']==hood].T.reset_index()
        temp.columns=['venue','freq']
        temp=temp.iloc[2:]
        temp=temp.reset_index()
        temp=temp.drop('index',axis=1)
        temp['freq']=temp['freq'].astype(float)
        temp=temp.round({'freq':2})
        print (temp.sort_values('freq',ascending=False).reset_index(drop=True).head(num_top_venues))
        print('\n')


----Davisville----
              venue  freq
0      Dessert Shop   0.2
1       Coffee Shop   0.1
2  Sushi Restaurant   0.1
3              Park   0.1
4              Café   0.1


----Davisville North----
               venue  freq
0               Park  0.11
1   Department Store  0.11
2     Sandwich Place  0.11
3  Food & Drink Shop  0.11
4            Dog Run  0.11


----Forest Hill North & West, Forest Hill Road Park----
                 venue  freq
0                 Park  0.25
1                Trail  0.25
2     Sushi Restaurant  0.25
3        Jewelry Store  0.25
4  Japanese Restaurant  0.00


----Lawrence Park----
           venue  freq
0           Park  0.33
1    Swim School  0.33
2       Bus Line  0.33
3        Airport  0.00
4  Jewelry Store  0.00


----Moore Park, Summerhill East----
                 venue  freq
0                Trail   0.5
1           Playground   0.5
2              Airport   0.0
3  Japanese Restaurant   0.0
4      Organic Grocery   0.0


----North Toronto West, Lawr

Next step is to put these data into a pandas dataframe.


In [78]:
## sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories=row.iloc[2:]
    row_categories_sorted=row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [84]:
num_top_venues=10

indicators=['st','nd','rd']

# create columns according to number of top venues
columns=['Borough','Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
# create a new dataframe
neighbourhoods_venues_sorted=pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Borough']=Toronto_grouped['Borough']
neighbourhoods_venues_sorted['Neighbourhood']=Toronto_grouped['Neighbourhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 2:]=return_most_common_venues(Toronto_grouped.iloc[ind,:], num_top_venues)
    
neighbourhoods_venues_sorted.head()


Unnamed: 0,Borough,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Davisville,Dessert Shop,Park,Sushi Restaurant,Pizza Place,Café,Italian Restaurant,Indian Restaurant,Seafood Restaurant,Coffee Shop,Fish Market
1,Central Toronto,Davisville North,Park,Food & Drink Shop,Dog Run,Hotel,Department Store,Gym / Fitness Center,Sandwich Place,Dance Studio,Breakfast Spot,Cuban Restaurant
2,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",Park,Sushi Restaurant,Trail,Jewelry Store,Distribution Center,Diner,Dessert Shop,Department Store,Dance Studio,Cuban Restaurant
3,Central Toronto,Lawrence Park,Park,Swim School,Bus Line,Donut Shop,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant
4,Central Toronto,"Moore Park, Summerhill East",Trail,Playground,Yoga Studio,Eastern European Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio


**Cluster Neighborhoods**
Run k-means to cluster the neighbourhood into 5 clusters

In [85]:
## set number of clusters
kclusters=5

Toronto_grouped_clustering=Toronto_grouped.drop(['Borough','Neighbourhood'],1)
Toronto_grouped_clustering.head()

Unnamed: 0,Airport,Airport Food Court,Airport Gate,Airport Lounge,American Restaurant,Antique Shop,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Distribution Center,Dog Run,Donut Shop,Eastern European Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Historic Site,History Museum,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Korean Restaurant,Lake,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Skate Park,Skating Rink,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Tailor Shop,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0


In [95]:
## run k-means clustering
kmeans=KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the datafrome
kmeans.labels_[0:10]

array([1, 3, 3, 3, 0, 1, 4, 1, 2, 3], dtype=int32)

In [96]:
## add clustering labels into the dataframe
neighbourhoods_venues_sorted.insert(0,'Cluster Labels',kmeans.labels_)

In [111]:
neighbourhoods_venues_sorted=neighbourhoods_venues_sorted.drop(columns=['Borough'])
neighbourhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Davisville,Dessert Shop,Park,Sushi Restaurant,Pizza Place,Café,Italian Restaurant,Indian Restaurant,Seafood Restaurant,Coffee Shop,Fish Market
1,3,Davisville North,Park,Food & Drink Shop,Dog Run,Hotel,Department Store,Gym / Fitness Center,Sandwich Place,Dance Studio,Breakfast Spot,Cuban Restaurant
2,3,"Forest Hill North & West, Forest Hill Road Park",Park,Sushi Restaurant,Trail,Jewelry Store,Distribution Center,Diner,Dessert Shop,Department Store,Dance Studio,Cuban Restaurant
3,3,Lawrence Park,Park,Swim School,Bus Line,Donut Shop,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant
4,0,"Moore Park, Summerhill East",Trail,Playground,Yoga Studio,Eastern European Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio


In [113]:
neighbourhoods_venues_sorted=neighbourhoods_venues_sorted.set_index('Neighbourhood')
neighbourhoods_venues_sorted.head()

Unnamed: 0_level_0,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Davisville,1,Dessert Shop,Park,Sushi Restaurant,Pizza Place,Café,Italian Restaurant,Indian Restaurant,Seafood Restaurant,Coffee Shop,Fish Market
Davisville North,3,Park,Food & Drink Shop,Dog Run,Hotel,Department Store,Gym / Fitness Center,Sandwich Place,Dance Studio,Breakfast Spot,Cuban Restaurant
"Forest Hill North & West, Forest Hill Road Park",3,Park,Sushi Restaurant,Trail,Jewelry Store,Distribution Center,Diner,Dessert Shop,Department Store,Dance Studio,Cuban Restaurant
Lawrence Park,3,Park,Swim School,Bus Line,Donut Shop,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant
"Moore Park, Summerhill East",0,Trail,Playground,Yoga Studio,Eastern European Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio


In [114]:
Toronto_merged=df_Toronto.iloc[:,1:]
Toronto_merged.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,East Toronto,The Beaches,43.676357,-79.293031


In [115]:
Toronto_merged=Toronto_merged.join(neighbourhoods_venues_sorted, on='Neighbourhood')
Toronto_merged.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Park,Gym / Fitness Center,Pub,Distribution Center,Restaurant,Breakfast Spot,Historic Site,Spa,Bakery,Coffee Shop
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Yoga Studio,Sushi Restaurant,Italian Restaurant,Distribution Center,Creperie,Portuguese Restaurant,Beer Bar,Park,Asian Restaurant
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Music Venue,Pizza Place,Ramen Restaurant,Burger Joint,Burrito Place,Plaza,Café,Theater,Comic Shop,Clothing Store
3,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Gastropub,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Gym,Food Truck,Creperie,Japanese Restaurant,Cuban Restaurant
4,East Toronto,The Beaches,43.676357,-79.293031,0,Neighborhood,Trail,Health Food Store,Pub,Dance Studio,Distribution Center,Diner,Dessert Shop,Department Store,Cuban Restaurant


Visualize the resulting clusters

In [117]:
# create map
map_clusters=folium.Map(location=[latitude,longitude],zoom_start=11)

# set color scheme for the clusters
x=np.arange(kclusters)
ys=[i+x+(i*x)**2 for i in range(kclusters)]
colors_array=cm.rainbow(np.linspace(0,1,len(ys)))
rainbow=[colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors=[]
for lat, lon, nei, cluster in zip(Toronto_merged['Latitude'],Toronto_merged['Longitude'],Toronto_merged['Neighbourhood'],Toronto_merged['Cluster Labels']):
    label=folium.Popup(str(nei)+'Cluster'+str(cluster),parse_html=True)
    folium.CircleMarker(
    [lat,lon],
    radius=5,
    popup=label,
    color=rainbow[cluster-1],
    fill=True,
    fill_color=rainbow[cluster-1],
    fill_opacity=0.7).add_to(map_clusters)

map_clusters

**Examine Clusters**


**Cluster 1**

In [122]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==0, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,The Beaches,Neighborhood,Trail,Health Food Store,Pub,Dance Studio,Distribution Center,Diner,Dessert Shop,Department Store,Cuban Restaurant
29,"Moore Park, Summerhill East",Trail,Playground,Yoga Studio,Eastern European Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio


**Cluster 2**

In [124]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==1, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Queen's Park, Ontario Provincial Government",Coffee Shop,Yoga Studio,Sushi Restaurant,Italian Restaurant,Distribution Center,Creperie,Portuguese Restaurant,Beer Bar,Park,Asian Restaurant
3,St. James Town,Coffee Shop,Gastropub,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Gym,Food Truck,Creperie,Japanese Restaurant,Cuban Restaurant
6,Central Bay Street,Coffee Shop,Gastropub,Sushi Restaurant,Italian Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Park,Arts & Crafts Store,Donut Shop,Airport Lounge
12,"The Danforth West, Riverdale",Greek Restaurant,Ice Cream Shop,Yoga Studio,Brewery,Italian Restaurant,Cosmetics Shop,Fruit & Vegetable Store,Creperie,Cuban Restaurant,Donut Shop
14,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Bar,Pet Store,Climbing Gym,Café,Breakfast Spot,Italian Restaurant,Furniture / Home Store,Gym,Food & Drink Shop
15,"India Bazaar, The Beaches West",Park,Fish & Chips Shop,Ice Cream Shop,Pub,Fast Food Restaurant,Gym,Italian Restaurant,Sushi Restaurant,Liquor Store,Brewery
17,Studio District,Coffee Shop,Fish Market,Pet Store,Café,Italian Restaurant,Ice Cream Shop,Bakery,Bookstore,Gay Bar,Food & Drink Shop
23,"North Toronto West, Lawrence Park",Yoga Studio,Fast Food Restaurant,Chinese Restaurant,Mexican Restaurant,Restaurant,Diner,Salon / Barbershop,Coffee Shop,Spa,Clothing Store
25,"Parkdale, Roncesvalles",Gift Shop,Eastern European Restaurant,Italian Restaurant,Movie Theater,Cuban Restaurant,Dessert Shop,Restaurant,Dog Run,Coffee Shop,Concert Hall
26,Davisville,Dessert Shop,Park,Sushi Restaurant,Pizza Place,Café,Italian Restaurant,Indian Restaurant,Seafood Restaurant,Coffee Shop,Fish Market


**Cluster 3**

In [125]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==2, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Park,Gym / Fitness Center,Pub,Distribution Center,Restaurant,Breakfast Spot,Historic Site,Spa,Bakery,Coffee Shop
2,"Garden District, Ryerson",Music Venue,Pizza Place,Ramen Restaurant,Burger Joint,Burrito Place,Plaza,Café,Theater,Comic Shop,Clothing Store
7,Christie,Café,Grocery Store,Restaurant,Candy Store,Italian Restaurant,Coffee Shop,Cuban Restaurant,Creperie,Cosmetics Shop,Donut Shop
9,"Dufferin, Dovercourt Village",Bakery,Supermarket,Music Venue,Middle Eastern Restaurant,Grocery Store,Bar,Bank,Brewery,Café,Dog Run
13,"Toronto Dominion Centre, Design Exchange",Coffee Shop,Café,Gym / Fitness Center,Bakery,Restaurant,Hotel,Pub,Gym,Diner,Dessert Shop
16,"Commerce Court, Victoria Hotel",Café,Gastropub,Coffee Shop,Restaurant,Pub,Bakery,Gym,Gym / Fitness Center,Museum,Cuban Restaurant
22,"High Park, The Junction South",Gastropub,Speakeasy,Flea Market,Italian Restaurant,Mexican Restaurant,Café,Bar,Park,Arts & Crafts Store,Antique Shop
24,"The Annex, North Midtown, Yorkville",Café,Park,BBQ Joint,Middle Eastern Restaurant,Burger Joint,Donut Shop,Indian Restaurant,History Museum,Coffee Shop,Fish Market
30,"Kensington Market, Chinatown, Grange Park",Café,Cocktail Bar,Arts & Crafts Store,Mexican Restaurant,Organic Grocery,Caribbean Restaurant,Bakery,Wine Bar,Asian Restaurant,Cosmetics Shop
35,"St. James Town, Cabbagetown",Café,Italian Restaurant,Diner,Japanese Restaurant,Restaurant,Jewelry Store,General Entertainment,Bakery,Indian Restaurant,Distribution Center


**Cluster 4**

In [126]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==3, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Berczy Park,Park,Farmers Market,Concert Hall,Liquor Store,Restaurant,Cocktail Bar,Beer Bar,Museum,Vegetarian / Vegan Restaurant,Thai Restaurant
8,"Richmond, Adelaide, King",Restaurant,Speakeasy,Vegetarian / Vegan Restaurant,Neighborhood,Concert Hall,Gym / Fitness Center,Asian Restaurant,Hotel,Steakhouse,Plaza
10,"Harbourfront East, Union Station, Toronto Islands",Hotel,Neighborhood,Salad Place,Park,Performing Arts Venue,Skating Rink,Lake,Dessert Shop,Plaza,Sporting Goods Shop
11,"Little Portugal, Trinity",Cocktail Bar,Korean Restaurant,Wine Bar,Ice Cream Shop,Cuban Restaurant,Greek Restaurant,Pizza Place,Asian Restaurant,Brewery,Beer Store
18,Lawrence Park,Park,Swim School,Bus Line,Donut Shop,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant
20,Davisville North,Park,Food & Drink Shop,Dog Run,Hotel,Department Store,Gym / Fitness Center,Sandwich Place,Dance Studio,Breakfast Spot,Cuban Restaurant
21,"Forest Hill North & West, Forest Hill Road Park",Park,Sushi Restaurant,Trail,Jewelry Store,Distribution Center,Diner,Dessert Shop,Department Store,Dance Studio,Cuban Restaurant
33,Rosedale,Park,Trail,Playground,Dog Run,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Cosmetics Shop,Creperie
34,Stn A PO Boxes,Park,Thai Restaurant,Cocktail Bar,Restaurant,Food Truck,Tailor Shop,Beer Bar,Fountain,Vegetarian / Vegan Restaurant,Museum
37,Church and Wellesley,Park,Salon / Barbershop,Restaurant,Beer Bar,Ramen Restaurant,Bubble Tea Shop,Dance Studio,Mexican Restaurant,Theme Restaurant,Breakfast Spot


**Cluster 5**

In [127]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==4, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Roselawn,Garden,Music Venue,Garden Center,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Department Store
