## Toronto neighborhoods segmentation and clustering

In this notebook we will scrape data about Toronto neighborhoods from wikipedia, filter relevant information and divide neighborhoods into clusters.

In [158]:
import random
import numpy as np
import pandas as pd

import requests
from bs4 import BeautifulSoup

import matplotlib.pyplot as plt

# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

from geopy.geocoders import Nominatim # import geocoder

import folium # map rendering library

CLIENT_ID = 'T333BSMX3WEFKRBFKD2APTRQWOXPQ4TFZGA52RBUBT5XBOTL'
CLIENT_SECRET = '1NK2F4NNK3LLFITZ5CG3G0PXISBXBE3TH2T4VTI4YX04DNEF'
VERSION = '20190531' # Foursquare API version

print('Libraries imported.')

Libraries imported.


First we need to scrape data from wikipedia page. To do so, we'll use libraries requests and bs4.

In [159]:
website = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(website, "lxml")

wiki_table = soup.find('table',{'class':'wikitable sortable'})
wiki_table = str(wiki_table)

And create a dataframe with pandas library.

In [160]:

nbh_table = pd.read_html(wiki_table)[0]
nbh_table.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


Let's filter data with not assigned borough

In [161]:
nbh_table.drop(nbh_table[nbh_table.Borough == "Not assigned"].index, inplace=True)
nbh_table.reset_index(drop=True, inplace=True)
nbh_table.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


And modify neighbourhoods with Not assigned value. For this purpose we use values from coresponding boroughs.

In [162]:
nbh_table.loc[nbh_table.Neighbourhood == "Not assigned", "Neighbourhood"] = nbh_table["Borough"]
nbh_table.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


Finally let's group neighbourhoods with the same postcode.

In [163]:
nbh_table = nbh_table.groupby(['Postcode', 'Borough']).Neighbourhood.aggregate(lambda x: ", ".join(x)).reset_index()
nbh_table.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


We get a table with dimensions:

In [164]:
print(nbh_table.shape)

(103, 3)


Another task is to add latitude and longitude coordinates. For this task we could use library geocoder, but csv file with geolocations were provided for this purpose.

In [165]:
geodata = requests.get('http://cocl.us/Geospatial_data').content.decode()
geodata_table = pd.DataFrame([x.split(",") for x in geodata.split('\r\n')])
geodata_table.set_axis(["Postcode", "Latitude", "Longitude"], axis=1, inplace=True)
geodata_table = geodata_table.iloc[1:]
geodata_table.head(10)

Unnamed: 0,Postcode,Latitude,Longitude
1,M1B,43.8066863,-79.1943534
2,M1C,43.7845351,-79.1604971
3,M1E,43.7635726,-79.1887115
4,M1G,43.7709921,-79.2169174
5,M1H,43.773136,-79.2394761
6,M1J,43.7447342,-79.2394761
7,M1K,43.7279292,-79.2620294
8,M1L,43.7111117,-79.2845772
9,M1M,43.716316,-79.2394761
10,M1N,43.692657,-79.2648481


And merge these two tables.

In [166]:
nbh_table = pd.merge(nbh_table, geodata_table)
nbh_table.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.8066863,-79.1943534
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.7845351,-79.1604971
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7635726,-79.1887115
3,M1G,Scarborough,Woburn,43.7709921,-79.2169174
4,M1H,Scarborough,Cedarbrae,43.773136,-79.2394761
5,M1J,Scarborough,Scarborough Village,43.7447342,-79.2394761
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.7279292,-79.2620294
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.7111117,-79.2845772
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.2394761
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.2648481


###Clustering neighbourhoods

Let's visualize neighbourhoods on a map of Toronto. To do so, fisrt we get the location of Toronto, that's where the center of the map is going to be.

In [167]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [168]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(nbh_table['Latitude'], nbh_table['Longitude'], nbh_table['Borough'], nbh_table['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

To ease the segmentation, and because of limits of requests against Foursquare API, 
we'll process just neighbourhoods that include Toronto in it's name. Let's drop the others.

In [133]:
nbh_table = nbh_table[nbh_table["Neighbourhood"].str.contains("Toronto")].reset_index(drop=True)
nbh_table.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3K,North York,"CFB Toronto, Downsview East",43.7374732,-79.4647633
1,M4J,East York,East Toronto,43.685347,-79.3381065
2,M4R,Central Toronto,North Toronto West,43.7153834,-79.4056784
3,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.6408157,-79.3817523
4,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.6471768,-79.3815764


And visualize the data once again.

In [169]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(nbh_table['Latitude'], nbh_table['Longitude'], nbh_table['Borough'], nbh_table['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

For analysis of these neighbourhoods we will use Foursquare API. Let's define two methods that get category types of a venue and all nearby venues according to a location. These two methods were provided by the Capstone tutors.

In [170]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

And call the method.

In [171]:
toronto_venues = getNearbyVenues(names=nbh_table['Neighbourhood'],
                                   latitudes=nbh_table['Latitude'],
                                   longitudes=nbh_table['Longitude']
                                  )

Rouge, Malvern


Highland Creek, Rouge Hill, Port Union


Guildwood, Morningside, West Hill


Woburn


Cedarbrae


Scarborough Village


East Birchmount Park, Ionview, Kennedy Park


Clairlea, Golden Mile, Oakridge


Cliffcrest, Cliffside, Scarborough Village West


Birch Cliff, Cliffside West


Dorset Park, Scarborough Town Centre, Wexford Heights


Maryvale, Wexford


Agincourt


Clarks Corners, Sullivan, Tam O'Shanter


Agincourt North, L'Amoreaux East, Milliken, Steeles East


L'Amoreaux West


Upper Rouge


Hillcrest Village


Fairview, Henry Farm, Oriole


Bayview Village


Silver Hills, York Mills


Newtonbrook, Willowdale


Willowdale South


York Mills West


Willowdale West


Parkwoods


Don Mills North


Flemingdon Park, Don Mills South


Bathurst Manor, Downsview North, Wilson Heights


Northwood Park, York University


CFB Toronto, Downsview East


Downsview West


Downsview Central


Downsview Northwest


Victoria Village


Woodbine Gardens, Parkview Hill


Woodbine Heights


The Beaches


Leaside


Thorncliffe Park


East Toronto


The Danforth West, Riverdale


The Beaches West, India Bazaar


Studio District


Lawrence Park


Davisville North


North Toronto West


Davisville


Moore Park, Summerhill East


Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West


Rosedale


Cabbagetown, St. James Town


Church and Wellesley


Harbourfront, Regent Park


Ryerson, Garden District


St. James Town


Berczy Park


Central Bay Street


Adelaide, King, Richmond


Harbourfront East, Toronto Islands, Union Station


Design Exchange, Toronto Dominion Centre


Commerce Court, Victoria Hotel


Bedford Park, Lawrence Manor East


Roselawn


Forest Hill North, Forest Hill West


The Annex, North Midtown, Yorkville


Harbord, University of Toronto


Chinatown, Grange Park, Kensington Market


CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara


Stn A PO Boxes 25 The Esplanade


First Canadian Place, Underground city


Lawrence Heights, Lawrence Manor


Glencairn


Humewood-Cedarvale


Caledonia-Fairbanks


Christie


Dovercourt Village, Dufferin


Little Portugal, Trinity


Brockton, Exhibition Place, Parkdale Village


Downsview, North Park, Upwood Park


Del Ray, Keelesdale, Mount Dennis, Silverthorn


The Junction North, Runnymede


High Park, The Junction South


Parkdale, Roncesvalles


Runnymede, Swansea


Queen's Park


Canada Post Gateway Processing Centre


Business Reply Mail Processing Centre 969 Eastern


Humber Bay Shores, Mimico South, New Toronto


Alderwood, Long Branch


The Kingsway, Montgomery Road, Old Mill North


Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea


Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor


Islington Avenue


Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park


Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe


Humber Summit


Emery, Humberlea


Weston


Westmount


Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips


Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown


Northwest


Now we have a table with nearby venues. Let's create a table that represents how many venues of a certain category are in each neighbourhood.

In [218]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

And print five most common ones from each neighbourhood.

In [220]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue  freq
0      Coffee Shop  0.06
1             Café  0.05
2  Thai Restaurant  0.04
3       Steakhouse  0.04
4              Bar  0.04


----Agincourt----
            venue  freq
0          Lounge  0.25
1  Sandwich Place  0.25
2  Breakfast Spot  0.25
3    Skating Rink  0.25
4   Metro Station  0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                             venue  freq
0                       Playground   0.5
1                             Park   0.5
2                      Yoga Studio   0.0
3                      Men's Store   0.0
4  Molecular Gastronomy Restaurant   0.0


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----


                 venue  freq
0        Grocery Store  0.18
1          Pizza Place  0.09
2  Japanese Restaurant  0.09
3          Coffee Shop  0.09
4       Discount Store  0.09


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.22
1             Pub  0.11
2        Pharmacy  0.11
3             Gym  0.11
4  Sandwich Place  0.11


----Bathurst Manor, Downsview North, Wilson Heights----
                  venue  freq
0           Coffee Shop  0.11
1                  Bank  0.06
2    Frozen Yogurt Shop  0.06
3  Fast Food Restaurant  0.06
4            Restaurant  0.06


----Bayview Village----
                 venue  freq
0                 Café  0.25
1  Japanese Restaurant  0.25
2                 Bank  0.25
3   Chinese Restaurant  0.25
4        Movie Theater  0.00


----Bedford Park, Lawrence Manor East----


                  venue  freq
0             Juice Bar  0.08
1    Italian Restaurant  0.08
2           Coffee Shop  0.08
3  Fast Food Restaurant  0.08
4   Japanese Restaurant  0.04


----Berczy Park----
                venue  freq
0         Coffee Shop  0.07
1        Cocktail Bar  0.05
2            Beer Bar  0.04
3  Seafood Restaurant  0.04
4          Steakhouse  0.04


----Birch Cliff, Cliffside West----
                   venue  freq
0        College Stadium   0.2
1                   Farm   0.2
2           Skating Rink   0.2
3  General Entertainment   0.2
4                   Café   0.2


----Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe----
               venue  freq
0        Pizza Place  0.14
1       Liquor Store  0.14
2  Convenience Store  0.14
3               Café  0.14
4     Shopping Plaza  0.14


----Brockton, Exhibition Place, Parkdale Village----


                    venue  freq
0             Coffee Shop  0.11
1                    Café  0.11
2          Breakfast Spot  0.11
3               Pet Store  0.05
4  Furniture / Home Store  0.05


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station  0.11
1         Yoga Studio  0.06
2       Auto Workshop  0.06
3          Comic Shop  0.06
4         Pizza Place  0.06


----CFB Toronto, Downsview East----
               venue  freq
0            Airport  0.33
1               Park  0.33
2  Other Repair Shop  0.33
3        Yoga Studio  0.00
4        Men's Store  0.00


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0   Airport Service  0.21
1    Airport Lounge  0.14
2  Airport Terminal  0.14
3     Boat or Ferry  0.07
4          Boutique  0.07


----Cabbagetown, St. James Town----


         venue  freq
0  Coffee Shop  0.10
1         Café  0.05
2          Pub  0.05
3       Bakery  0.05
4  Pizza Place  0.05


----Caledonia-Fairbanks----
                  venue  freq
0                  Park  0.33
1         Women's Store  0.17
2  Fast Food Restaurant  0.17
3                Market  0.17
4              Pharmacy  0.17


----Canada Post Gateway Processing Centre----
                 venue  freq
0          Coffee Shop   0.2
1                Hotel   0.2
2  American Restaurant   0.1
3        Burrito Place   0.1
4       Sandwich Place   0.1


----Cedarbrae----
                  venue  freq
0  Caribbean Restaurant  0.14
1      Hakka Restaurant  0.14
2                Bakery  0.14
3                  Bank  0.14
4    Athletics & Sports  0.14


----Central Bay Street----


                venue  freq
0         Coffee Shop  0.16
1                Café  0.06
2  Italian Restaurant  0.05
3        Burger Joint  0.03
4      Sandwich Place  0.03


----Chinatown, Grange Park, Kensington Market----
                           venue  freq
0                           Café  0.08
1  Vegetarian / Vegan Restaurant  0.06
2                    Coffee Shop  0.04
3            Dumpling Restaurant  0.04
4                         Bakery  0.04


----Christie----
           venue  freq
0  Grocery Store  0.19
1           Café  0.19
2           Park  0.12
3          Diner  0.06
4      Nightclub  0.06


----Church and Wellesley----
                 venue  freq
0          Coffee Shop  0.07
1  Japanese Restaurant  0.07
2     Sushi Restaurant  0.06
3           Restaurant  0.03
4              Gay Bar  0.03


----Clairlea, Golden Mile, Oakridge----


                  venue  freq
0              Bus Line  0.22
1                Bakery  0.22
2         Metro Station  0.11
3           Bus Station  0.11
4  Fast Food Restaurant  0.11


----Clarks Corners, Sullivan, Tam O'Shanter----
                  venue  freq
0           Pizza Place   0.2
1       Thai Restaurant   0.1
2          Noodle House   0.1
3    Chinese Restaurant   0.1
4  Fast Food Restaurant   0.1


----Cliffcrest, Cliffside, Scarborough Village West----
                             venue  freq
0                            Motel   0.5
1              American Restaurant   0.5
2                    Movie Theater   0.0
3              Monument / Landmark   0.0
4  Molecular Gastronomy Restaurant   0.0


----Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park----
                             venue  freq
0                             Bank   1.0
1                      Yoga Studio   0.0
2                    Metro Station   0.0
3  Molecular Gastronomy Restaurant   0.0


                 venue  freq
0          Coffee Shop  0.11
1                 Café  0.07
2                Hotel  0.06
3           Restaurant  0.05
4  American Restaurant  0.04


----Davisville----
              venue  freq
0       Pizza Place  0.09
1    Sandwich Place  0.09
2      Dessert Shop  0.09
3   Thai Restaurant  0.06
4  Sushi Restaurant  0.06


----Davisville North----
            venue  freq
0  Clothing Store  0.12
1            Park  0.12
2  Breakfast Spot  0.12
3             Gym  0.12
4   Grocery Store  0.12


----Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West----
                 venue  freq
0          Coffee Shop  0.14
1                  Pub  0.14
2           Bagel Shop  0.07
3  Fried Chicken Joint  0.07
4          Pizza Place  0.07


----Del Ray, Keelesdale, Mount Dennis, Silverthorn----


                   venue  freq
0         Discount Store  0.17
1             Restaurant  0.17
2         Sandwich Place  0.17
3      Convenience Store  0.17
4  Check Cashing Service  0.17


----Design Exchange, Toronto Dominion Centre----
         venue  freq
0  Coffee Shop  0.12
1         Café  0.07
2        Hotel  0.07
3   Restaurant  0.05
4       Bakery  0.03


----Don Mills North----
                  venue  freq
0   Japanese Restaurant  0.17
1  Caribbean Restaurant  0.17
2  Gym / Fitness Center  0.17
3        Baseball Field  0.17
4                  Café  0.17


----Dorset Park, Scarborough Town Centre, Wexford Heights----
                       venue  freq
0          Indian Restaurant  0.29
1         Chinese Restaurant  0.14
2                  Pet Store  0.14
3      Vietnamese Restaurant  0.14
4  Latin American Restaurant  0.14


----Dovercourt Village, Dufferin----


         venue  freq
0     Pharmacy  0.11
1  Supermarket  0.11
2       Bakery  0.11
3  Music Venue  0.05
4      Brewery  0.05


----Downsview Central----
                venue  freq
0          Food Truck  0.33
1        Home Service  0.33
2      Baseball Field  0.33
3         Yoga Studio  0.00
4  Mexican Restaurant  0.00


----Downsview Northwest----
                venue  freq
0  Athletics & Sports  0.25
1        Liquor Store  0.25
2      Discount Store  0.25
3       Grocery Store  0.25
4         Yoga Studio  0.00


----Downsview West----
           venue  freq
0  Grocery Store  0.33
1  Shopping Mall  0.17
2           Park  0.17
3           Bank  0.17
4          Hotel  0.17


----Downsview, North Park, Upwood Park----


                        venue  freq
0                        Park  0.25
1  Construction & Landscaping  0.25
2                      Bakery  0.25
3            Basketball Court  0.25
4                 Yoga Studio  0.00


----East Birchmount Park, Ionview, Kennedy Park----
                       venue  freq
0             Discount Store  0.50
1           Department Store  0.25
2                Coffee Shop  0.25
3                Yoga Studio  0.00
4  Middle Eastern Restaurant  0.00


----East Toronto----
                venue  freq
0                Park  0.50
1         Coffee Shop  0.25
2   Convenience Store  0.25
3         Yoga Studio  0.00
4  Mexican Restaurant  0.00


----Emery, Humberlea----
                             venue  freq
0                   Baseball Field   0.5
1           Furniture / Home Store   0.5
2                      Yoga Studio   0.0
3              Monument / Landmark   0.0
4  Molecular Gastronomy Restaurant   0.0


----Fairview, Henry Farm, Oriole----


                  venue  freq
0        Clothing Store  0.09
1           Coffee Shop  0.08
2  Fast Food Restaurant  0.08
3        Cosmetics Shop  0.05
4         Women's Store  0.05


----First Canadian Place, Underground city----
          venue  freq
0          Café  0.09
1   Coffee Shop  0.09
2         Hotel  0.04
3  Burger Joint  0.03
4     Gastropub  0.03


----Flemingdon Park, Don Mills South----
              venue  freq
0        Beer Store  0.10
1               Gym  0.10
2       Coffee Shop  0.10
3  Asian Restaurant  0.10
4    Discount Store  0.05


----Forest Hill North, Forest Hill West----
                venue  freq
0    Sushi Restaurant  0.25
1  Mexican Restaurant  0.25
2               Trail  0.25
3       Jewelry Store  0.25
4         Yoga Studio  0.00


----Glencairn----


                 venue  freq
0          Pizza Place  0.25
1               Bakery  0.25
2  Japanese Restaurant  0.25
3                  Pub  0.25
4       Massage Studio  0.00


----Guildwood, Morningside, West Hill----
                 venue  freq
0          Pizza Place  0.12
1  Rental Car Location  0.12
2                  Spa  0.12
3       Medical Center  0.12
4       Breakfast Spot  0.12


----Harbord, University of Toronto----
                venue  freq
0                Café  0.11
1              Bakery  0.06
2  Italian Restaurant  0.06
3          Restaurant  0.06
4                 Bar  0.06


----Harbourfront East, Toronto Islands, Union Station----
                venue  freq
0         Coffee Shop  0.12
1            Aquarium  0.05
2               Hotel  0.05
3                Café  0.04
4  Italian Restaurant  0.04


----Harbourfront, Regent Park----


         venue  freq
0  Coffee Shop  0.16
1         Park  0.07
2       Bakery  0.07
3         Café  0.04
4          Pub  0.04


----High Park, The Junction South----
                venue  freq
0                Café  0.09
1                 Bar  0.09
2  Mexican Restaurant  0.09
3               Diner  0.04
4           Gastropub  0.04


----Highland Creek, Rouge Hill, Port Union----
                             venue  freq
0                              Bar   1.0
1                      Yoga Studio   0.0
2                    Metro Station   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----Hillcrest Village----
                      venue  freq
0  Mediterranean Restaurant   0.2
1      Fast Food Restaurant   0.2
2                   Dog Run   0.2
3                      Pool   0.2
4               Golf Course   0.2


----Humber Bay Shores, Mimico South, New Toronto----


                 venue  freq
0                 Café  0.11
1           Restaurant  0.06
2  American Restaurant  0.06
3  Fried Chicken Joint  0.06
4       Sandwich Place  0.06


----Humber Bay, King's Mill Park, Kingsway Park South East, Mimico NE, Old Mill South, The Queensway East, Royal York South East, Sunnylea----
                             venue  freq
0                        Locksmith  0.33
1                   Baseball Field  0.33
2                     Home Service  0.33
3              Monument / Landmark  0.00
4  Molecular Gastronomy Restaurant  0.00


----Humber Summit----
                             venue  freq
0                      Pizza Place   0.5
1              Empanada Restaurant   0.5
2                      Yoga Studio   0.0
3                    Metro Station   0.0
4  Molecular Gastronomy Restaurant   0.0


----Humewood-Cedarvale----
                             venue  freq
0                            Field  0.25
1                            Trail  0.25
2            

                             venue  freq
0                      Pizza Place  0.33
1                             Park  0.33
2                         Bus Line  0.33
3                    Metro Station  0.00
4  Molecular Gastronomy Restaurant  0.00


----Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor----
                    venue  freq
0             Flower Shop  0.07
1  Thrift / Vintage Store  0.07
2          Discount Store  0.07
3          Sandwich Place  0.07
4           Burrito Place  0.07


----L'Amoreaux West----
                  venue  freq
0  Fast Food Restaurant  0.15
1    Chinese Restaurant  0.15
2              Gym Pool  0.08
3        Breakfast Spot  0.08
4         Grocery Store  0.08


----Lawrence Heights, Lawrence Manor----
                    venue  freq
0          Clothing Store  0.29
1  Furniture / Home Store  0.21
2             Event Space  0.07
3      Miscellaneous Shop  0.07
4             Coffee Shop  0.07


----Lawrence P

           venue  freq
0           Park  0.33
1       Bus Line  0.33
2    Swim School  0.33
3    Yoga Studio  0.00
4  Metro Station  0.00


----Leaside----
                    venue  freq
0             Coffee Shop  0.09
1  Furniture / Home Store  0.09
2     Sporting Goods Shop  0.09
3           Grocery Store  0.06
4            Burger Joint  0.06


----Little Portugal, Trinity----
                     venue  freq
0                      Bar  0.13
1              Men's Store  0.05
2         Asian Restaurant  0.05
3              Coffee Shop  0.05
4  New American Restaurant  0.03


----Maryvale, Wexford----
            venue  freq
0     Auto Garage  0.17
1   Shopping Mall  0.17
2  Sandwich Place  0.17
3  Breakfast Spot  0.17
4          Bakery  0.17


----Moore Park, Summerhill East----


                             venue  freq
0                              Gym   0.5
1                       Playground   0.5
2                      Yoga Studio   0.0
3                      Men's Store   0.0
4  Molecular Gastronomy Restaurant   0.0


----North Toronto West----
            venue  freq
0  Clothing Store  0.15
1     Coffee Shop  0.10
2     Yoga Studio  0.05
3       Gift Shop  0.05
4    Dessert Shop  0.05


----Northwest----
                 venue  freq
0  Rental Car Location  0.33
1            Drugstore  0.33
2                  Bar  0.33
3          Yoga Studio  0.00
4                Motel  0.00


----Northwood Park, York University----
                  venue  freq
0                   Bar   0.2
1        Massage Studio   0.2
2    Falafel Restaurant   0.2
3           Coffee Shop   0.2
4  Caribbean Restaurant   0.2


----Parkdale, Roncesvalles----


            venue  freq
0  Breakfast Spot  0.13
1       Gift Shop  0.13
2       Bookstore  0.07
3   Movie Theater  0.07
4             Bar  0.07


----Parkwoods----
                  venue  freq
0     Food & Drink Shop  0.25
1                  Park  0.25
2              Bus Stop  0.25
3  Fast Food Restaurant  0.25
4                Market  0.00


----Queen's Park----
                 venue  freq
0          Coffee Shop  0.22
1  Japanese Restaurant  0.05
2     Sushi Restaurant  0.05
3                 Park  0.05
4                  Gym  0.05


----Rosedale----
                             venue  freq
0                             Park  0.50
1                            Trail  0.25
2                       Playground  0.25
3                      Yoga Studio  0.00
4  Molecular Gastronomy Restaurant  0.00


----Roselawn----


                             venue  freq
0                           Garden   1.0
1                    Metro Station   0.0
2              Monument / Landmark   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----Rouge, Malvern----
                             venue  freq
0             Fast Food Restaurant   1.0
1                      Men's Store   0.0
2  Molecular Gastronomy Restaurant   0.0
3       Modern European Restaurant   0.0
4               Miscellaneous Shop   0.0


----Runnymede, Swansea----
                venue  freq
0         Coffee Shop  0.08
1                Café  0.08
2         Pizza Place  0.08
3    Sushi Restaurant  0.05
4  Italian Restaurant  0.05


----Ryerson, Garden District----
                       venue  freq
0                Coffee Shop  0.10
1             Clothing Store  0.06
2                       Café  0.04
3             Cosmetics Shop  0.04
4  Middle Eastern Restaurant  0.03


----Scarborough Village----


                             venue  freq
0                       Playground   1.0
1                      Yoga Studio   0.0
2                      Men's Store   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----Silver Hills, York Mills----
                             venue  freq
0                        Cafeteria   1.0
1               Mexican Restaurant   0.0
2                            Motel   0.0
3              Monument / Landmark   0.0
4  Molecular Gastronomy Restaurant   0.0


----St. James Town----
            venue  freq
0     Coffee Shop  0.07
1            Café  0.06
2      Restaurant  0.05
3           Hotel  0.05
4  Cosmetics Shop  0.04


----Stn A PO Boxes 25 The Esplanade----
          venue  freq
0   Coffee Shop  0.11
1    Restaurant  0.04
2          Café  0.04
3         Hotel  0.03
4  Cocktail Bar  0.03


----Studio District----


                venue  freq
0                Café  0.11
1         Coffee Shop  0.08
2              Bakery  0.05
3  Italian Restaurant  0.05
4           Gastropub  0.05


----The Annex, North Midtown, Yorkville----
                 venue  freq
0          Coffee Shop  0.12
1       Sandwich Place  0.12
2                 Café  0.12
3          Pizza Place  0.08
4  American Restaurant  0.04


----The Beaches----
               venue  freq
0                Pub   0.2
1  Health Food Store   0.2
2      Grocery Store   0.2
3           Boutique   0.2
4        Yoga Studio   0.0


----The Beaches West, India Bazaar----
                venue  freq
0      Sandwich Place  0.11
1         Pizza Place  0.05
2  Italian Restaurant  0.05
3           Pet Store  0.05
4                 Pub  0.05


----The Danforth West, Riverdale----


                    venue  freq
0        Greek Restaurant  0.21
1             Coffee Shop  0.10
2          Ice Cream Shop  0.07
3      Italian Restaurant  0.07
4  Furniture / Home Store  0.05


----The Junction North, Runnymede----
                       venue  freq
0                Pizza Place  0.25
1       Caribbean Restaurant  0.25
2              Grocery Store  0.25
3                   Bus Line  0.25
4  Middle Eastern Restaurant  0.00


----The Kingsway, Montgomery Road, Old Mill North----
           venue  freq
0           Park  0.33
1          River  0.33
2           Pool  0.33
3    Yoga Studio  0.00
4  Metro Station  0.00


----Thorncliffe Park----
               venue  freq
0  Indian Restaurant  0.12
1        Yoga Studio  0.06
2           Pharmacy  0.06
3        Pizza Place  0.06
4        Coffee Shop  0.06


----Victoria Village----


                   venue  freq
0            Pizza Place  0.25
1            Coffee Shop  0.25
2           Hockey Arena  0.25
3  Portuguese Restaurant  0.25
4          Metro Station  0.00


----Westmount----
                venue  freq
0         Pizza Place   0.2
1         Coffee Shop   0.2
2      Sandwich Place   0.2
3  Chinese Restaurant   0.2
4        Intersection   0.2


----Willowdale South----
                 venue  freq
0     Ramen Restaurant  0.08
1          Coffee Shop  0.08
2           Restaurant  0.08
3     Sushi Restaurant  0.06
4  Japanese Restaurant  0.06


----Willowdale West----
            venue  freq
0        Pharmacy   0.2
1     Pizza Place   0.2
2  Discount Store   0.2
3   Grocery Store   0.2
4     Coffee Shop   0.2


----Woburn----


                             venue  freq
0                      Coffee Shop  0.67
1                Korean Restaurant  0.33
2                    Metro Station  0.00
3              Monument / Landmark  0.00
4  Molecular Gastronomy Restaurant  0.00


----Woodbine Gardens, Parkview Hill----
                  venue  freq
0  Fast Food Restaurant  0.15
1           Pizza Place  0.15
2             Gastropub  0.08
3                  Café  0.08
4                  Bank  0.08


----Woodbine Heights----
            venue  freq
0    Skating Rink  0.22
1     Video Store  0.11
2    Dance Studio  0.11
3     Curling Ice  0.11
4  Cosmetics Shop  0.11


----York Mills West----
               venue  freq
0               Park  0.50
1               Bank  0.25
2  Convenience Store  0.25
3        Yoga Studio  0.00
4      Metro Station  0.00




In [221]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Finally we put this information into pandas dataframe format.

In [222]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,American Restaurant,Bakery,Burger Joint,Hotel,Cosmetics Shop
1,Agincourt,Breakfast Spot,Lounge,Skating Rink,Sandwich Place,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Women's Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Women's Store,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Coffee Shop,Discount Store,Pharmacy,Pizza Place,Beer Store,Fried Chicken Joint,Sandwich Place,Fast Food Restaurant,Japanese Restaurant
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Gym,Skating Rink,Pharmacy,Pub,Sandwich Place,Pool,Diner,Deli / Bodega


Now it remains to create clusters with k-means. We chose 5 clusters, since it appeared to give the best results.

In [223]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 0, 1, 1, 3, 3, 3, 3, 3])

In [215]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = nbh_table

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

The last step is to create a map with colorized neighbourhoods according to the cluster they belong to.

In [217]:
# create map
from matplotlib import colors, cm

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    if np.isnan(cluster-1):
        num = 2
    else:
        num = int(cluster-1)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[num],
        fill=True,
        fill_color=rainbow[num],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters