### Toronto City Analytics: Segmenting and Clustering Neighborhoods in Toronto
#### Capston Project about Applied Data Science on Coursera (part 1) 

[1. Download and explore data set](#1.-Download-and-explore-data-set)

[2. Explore neighborhoods in Toronto](#2.-Explore-neighborhoods-in-Toronto)

[3. Cluster neighborhoods in Toronto](#3.-Cluster-neighborhoods-in-Toronto)




### 1. Download and explore data set

First we need to get the data and explore it: let's start with downloading the dependencies

In [34]:
from bs4 import BeautifulSoup
import requests
print('--> BeautifulSoup & requests imported')

ModuleNotFoundError: No module named 'bs4'

Importing data from web (wiki): 
1. we scrape information from the web and create the xml file.
2. pulling data out of xml file: finding the relevant table and printing all the rows.

In [43]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'lxml')

table = soup.find('table',class_='wikitable sortable').tbody
#table1 = table.tr
table=table.find_all('tr')

for rows in table:
    row = rows.text
    print(row)
    print('--------')



Postal Code

Borough

Neighborhood

--------

M1A

Not assigned



--------

M2A

Not assigned



--------

M3A

North York

Parkwoods

--------

M4A

North York

Victoria Village

--------

M5A

Downtown Toronto

Regent Park, Harbourfront

--------

M6A

North York

Lawrence Manor, Lawrence Heights

--------

M7A

Downtown Toronto

Queen's Park, Ontario Provincial Government

--------

M8A

Not assigned



--------

M9A

Etobicoke

Islington Avenue

--------

M1B

Scarborough

Malvern, Rouge

--------

M2B

Not assigned



--------

M3B

North York

Don Mills

--------

M4B

East York

Parkview Hill, Woodbine Gardens

--------

M5B

Downtown Toronto

Garden District, Ryerson

--------

M6B

North York

Glencairn

--------

M7B

Not assigned



--------

M8B

Not assigned



--------

M9B

Etobicoke

West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale

--------

M1C

Scarborough

Rouge Hill, Port Union, Highland Creek

--------

M2C

Not assigned



--------

M3C

N

--------
---> As we can see, the data contains postal code, borough and name for Toronto neighborhoods.
To store this data, we create a corresponding pandas dataframe and print the first 5 rows. 

In [44]:
import pandas as pd

columns=[]
for x in table[0].find_all('th'):
    columns.append(x.text.replace('\n',''))
df=pd.DataFrame(columns=columns)
df

Unnamed: 0,Postal Code,Borough,Neighborhood


In [45]:
for i in range(1,len(table)):
    row=[]
    for x in table[i].find_all('td'):
        row.append(x.text.replace('\n',''))
    df.loc[i-1,:]=row
df.head()


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


We want to ignore the "Not assigned" boroughs and rename the column PostalCode:

In [46]:
for i in range(df.shape[0]):
    if df.loc[i,'Borough']=='Not assigned':
        df.drop([i],inplace=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [47]:
df.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
df.index=range(df.shape[0])
df


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


-----
Checking if there are neighborhoods that are "Not assigned" and using the name of the related borough where necessary.

In [48]:
for i in range(df.shape[0]):
    if df.iloc[i,2]==('' or 'Not assigned'):
        df.iloc[i,2]=df.iloc[i,1]
        #df.drop([i],inplace=True)
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


**For practice**:
Since the neighborhoods are already grouped by postal code, we take one row (here it is randomly no. 96 as example) and split the line into two separated entries/neighborhoods. Next we can test the code to group the neighborhoods by postal code:  


In [49]:
test=df.iloc[96,2].split(',')
df.iloc[96,2]=test[0]
df=df.append(df.iloc[96,:],ignore_index=True)
df.iloc[103,2]=test[1]
df





Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [50]:
a=[]
for k in range(df.shape[0]):
    for i in range(k, df.shape[0]):
        if (df.iloc[k,0]==df.iloc[i,0])&(i!=k):
            print('found:',i,k,df.iloc[i,0])
            a.append(i)
            df.iloc[k,2]=df.iloc[k,2]+', '+df.iloc[i,2]

 #a=df.iloc[0,0]+', '+df.iloc[95,2]
    

for i in a:
    df.drop([i],inplace=True)

df


found: 103 96 M4X


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


----
The grouping worked: We have again the grouped dataframe consisting of 103 rows and 3 columns.

In [51]:
df.shape

(103, 3)

-----
Reading the csv with geodata for our neighborhoods and merging the two tables:

In [52]:
geodata = pd.read_csv('https://cocl.us/Geospatial_data')
geodata.set_index('Postal Code')
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [55]:
df_geo=pd.merge(df,geodata, left_on='PostalCode',right_on='Postal Code',how='left')
df_geo.drop(['Postal Code'],axis=1)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937



### 2. Explore neighborhoods in Toronto

First we look for the geograpical coordinates of Toronto:

In [107]:
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim 
#!conda install -c conda-forge folium=0.5.0 --yes
import folium

address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, CAN are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, CAN are 43.6534817, -79.3839347.


-------
Now, we dsplay Toronto on a map including markers for all the neighborhoods

In [57]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)


# add markers to map
for lat, lng, label in zip(df_geo['Latitude'], df_geo['Longitude'], df_geo['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
     
    
map_toronto



Setting up Foursquare API

In [58]:
CLIENT_ID = 'your client_id' # your Foursquare ID
CLIENT_SECRET = 'your client_secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version




We define the function for getting venues nearby a given location with the help of Foursquare API.
The GET request delivers a JSON file. We store the venues in a list, loop through it and convert it to a dataframe.  


In [59]:
# limit of number of venues returned by Foursquare API
LIMIT = 100  

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


Let's use the function for getting nearby venues for the neighborhoods in the dataframe defined above including the Toronto neighborhoods:

In [60]:
toronto_venues = getNearbyVenues(names=df_geo['Neighborhood'],
                                   latitudes=df_geo['Latitude'],
                                   longitudes=df_geo['Longitude']
                                  )
toronto_venues.head()

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


----
The following cluster analysis of neighborhoods will be based on the venue categories. Therefore we need to prepare a dataframe for the input X: we use one hot encoding to get dummy variables for the venue categories. 

In [61]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()


Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now, we group the dataframe by neighborhoods:

In [62]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
1,"Alderwood, Long Branch",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.052632,0.0000,0.000000,0.000000,0.0,0.000000
3,Bayview Village,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
5,Berczy Park,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.017544,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.043478,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
8,Business reply mail Processing Centre,0.055556,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.0,0.000000,0.058824,0.058824,0.058824,0.117647,0.176471,0.058824,...,0.00000,0.00,0.000000,0.000000,0.000000,0.0000,0.000000,0.000000,0.0,0.000000


We define a function to get the most common venues for a given neighborhood. 
Therefore we sort a certain row descending and take the top (e.g. 10) venue types/columns.

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Apply the function for Toronto neighborhoods --> dataframe with one row for each neighborhood and the 10 most common venue types in the columns:

In [91]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Clothing Store,Breakfast Spot,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
1,"Alderwood, Long Branch",Pizza Place,Gym,Athletics & Sports,Pharmacy,Pool,Pub,Dance Studio,Sandwich Place,Skating Rink,Coffee Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Gift Shop,Fried Chicken Joint,Sandwich Place,Diner,Bridal Shop,Restaurant,Deli / Bodega,Ice Cream Shop
3,Bayview Village,Japanese Restaurant,Café,Bank,Chinese Restaurant,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Restaurant,Juice Bar,Butcher,Café,Indian Restaurant,Sushi Restaurant,Pizza Place



### 3. Cluster neighborhoods in Toronto

Now we cluster the neighborhoods with KMeans based on most common venue types.

In [95]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [96]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Clothing Store,Breakfast Spot,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
1,"Alderwood, Long Branch",Pizza Place,Gym,Athletics & Sports,Pharmacy,Pool,Pub,Dance Studio,Sandwich Place,Skating Rink,Coffee Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Gift Shop,Fried Chicken Joint,Sandwich Place,Diner,Bridal Shop,Restaurant,Deli / Bodega,Ice Cream Shop
3,Bayview Village,Japanese Restaurant,Café,Bank,Chinese Restaurant,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Restaurant,Juice Bar,Butcher,Café,Indian Restaurant,Sushi Restaurant,Pizza Place
5,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Restaurant,Café,Beer Bar,Bakery,Seafood Restaurant,Comfort Food Restaurant,Shopping Mall
6,"Birch Cliff, Cliffside West",College Stadium,Skating Rink,Café,General Entertainment,Women's Store,Distribution Center,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
7,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Coffee Shop,Gym,Grocery Store,Pet Store,Performing Arts Venue,Nightclub,Italian Restaurant,Intersection
8,Business reply mail Processing Centre,Light Rail Station,Yoga Studio,Smoke Shop,Restaurant,Auto Workshop,Fast Food Restaurant,Farmers Market,Spa,Pizza Place,Recording Studio
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Plane,Sculpture Garden,Bar,Harbor / Marina,Boat or Ferry,Airport Terminal,Coffee Shop,Airport Gate


Merge the dataframe with the initial one about the neighborhoods to have all the info in one df.

In [97]:
# add clustering labels
neighborhoods_venues_sorted['Cluster Labels'] = kmeans.labels_

toronto_merged = df_geo

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656,Park,Fast Food Restaurant,Food & Drink Shop,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,2.0
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572,French Restaurant,Portuguese Restaurant,Coffee Shop,Pizza Place,Hockey Arena,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,1.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",M5A,43.65426,-79.360636,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Café,Theater,Restaurant,Yoga Studio,Hotel,1.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",M6A,43.718518,-79.464763,Women's Store,Furniture / Home Store,Clothing Store,Coffee Shop,Boutique,Miscellaneous Shop,Athletics & Sports,Event Space,Accessories Store,Vietnamese Restaurant,1.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",M7A,43.662301,-79.389494,Coffee Shop,Sushi Restaurant,Distribution Center,Restaurant,Park,Mexican Restaurant,Juice Bar,Japanese Restaurant,Italian Restaurant,Hobby Shop,1.0


Exclude neighborhoods without results/venues from the API call.

In [100]:
toronto_merged.dropna(inplace=True)
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656,Park,Fast Food Restaurant,Food & Drink Shop,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,2.0
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572,French Restaurant,Portuguese Restaurant,Coffee Shop,Pizza Place,Hockey Arena,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,1.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",M5A,43.654260,-79.360636,Coffee Shop,Park,Pub,Bakery,Breakfast Spot,Café,Theater,Restaurant,Yoga Studio,Hotel,1.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",M6A,43.718518,-79.464763,Women's Store,Furniture / Home Store,Clothing Store,Coffee Shop,Boutique,Miscellaneous Shop,Athletics & Sports,Event Space,Accessories Store,Vietnamese Restaurant,1.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",M7A,43.662301,-79.389494,Coffee Shop,Sushi Restaurant,Distribution Center,Restaurant,Park,Mexican Restaurant,Juice Bar,Japanese Restaurant,Italian Restaurant,Hobby Shop,1.0
6,M1B,Scarborough,"Malvern, Rouge",M1B,43.806686,-79.194353,Fast Food Restaurant,Women's Store,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,2.0
7,M3B,North York,Don Mills,M3B,43.745906,-79.352188,Gym,Restaurant,Coffee Shop,Asian Restaurant,Japanese Restaurant,Beer Store,Sporting Goods Shop,Italian Restaurant,Shopping Mall,Sandwich Place,1.0
8,M4B,East York,"Parkview Hill, Woodbine Gardens",M4B,43.706397,-79.309937,Pizza Place,Gym / Fitness Center,Breakfast Spot,Fast Food Restaurant,Pharmacy,Bank,Gastropub,Athletics & Sports,Intersection,Women's Store,1.0
9,M5B,Downtown Toronto,"Garden District, Ryerson",M5B,43.657162,-79.378937,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Middle Eastern Restaurant,Restaurant,Japanese Restaurant,Italian Restaurant,Cosmetics Shop,Hotel,1.0
10,M6B,North York,Glencairn,M6B,43.709577,-79.445073,Japanese Restaurant,Sushi Restaurant,Pizza Place,Pub,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Women's Store,1.0


----
Finally we plot the results using matplotlib: create a map of Toronto (using folium) with markers for all the neighborhoods and colored by their cluster_labels. The respective cluster is also displayed as label of the marker.

In [106]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters