# Segmenting and Clustering Neighborhoods in the City of Toronto, Canada
## 3. Explore and Cluster 

### a. Import libraries

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
#Import dataframe
dfCoords = pd.read_csv('dfCoords.csv',sep=',')

In [3]:
#Get Geographical location from Toronto
address = 'Toronto, CA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

  after removing the cwd from sys.path.


The geograpical coordinates of Toronto are 43.7170226, -79.4197830350134.


In [4]:
# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(dfCoords['Latitude'], dfCoords['Longitude'], dfCoords['Borough'], dfCoords['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

#### Define Foursquare Credentials and Version

In [5]:
CLIENT_ID = 'OK3AEGZMAH5Z0SEJER13ISCD421CYD4ME1ZFHAUXG0YNX525' # your Foursquare ID
CLIENT_SECRET = 'ZYG0IYC3CSGIPYK154RISAEGG3ZOWGVEWJ5M41VBDWFX5CLS' # your Foursquare Secret
VERSION = '20181224' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OK3AEGZMAH5Z0SEJER13ISCD421CYD4ME1ZFHAUXG0YNX525
CLIENT_SECRET:ZYG0IYC3CSGIPYK154RISAEGG3ZOWGVEWJ5M41VBDWFX5CLS


# Explore Neighborhoods with word Toronto

### Using the function provided in Clustering Lab

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Select 10 Neigborhoods with word Toronto on it

In [13]:
dfToronto=dfCoords[dfCoords['Borough'].str.contains('Toronto')]
dfToronto=dfToronto.sample(10)
dfToronto=dfToronto.reset_index(drop=True)
dfToronto.shape

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4M,East Toronto,Studio District,43.659526,-79.340923
1,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
2,M6R,West Toronto,"Parkdale,Roncesvalles",43.64896,-79.456325
3,M7Y,East Toronto,Business reply mail Processing Centre969 Eastern,43.662744,-79.321558
4,M5K,Downtown Toronto,"Design Exchange,Toronto Dominion Centre",43.647177,-79.381576
5,M6P,West Toronto,"High Park,The Junction South",43.661608,-79.464763
6,M5V,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",43.628947,-79.39442
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
8,M5L,Downtown Toronto,"Commerce Court,Victoria Hotel",43.648198,-79.379817
9,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


#### Run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [14]:
toronto_venues = getNearbyVenues(names=dfToronto['Neighborhood'],
                                   latitudes=dfCoords['Latitude'],
                                   longitudes=dfCoords['Longitude']
                                  )

Studio District
Lawrence Park
Parkdale,Roncesvalles
Business reply mail Processing Centre969 Eastern
Design Exchange,Toronto Dominion Centre
High Park,The Junction South
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Central Bay Street
Commerce Court,Victoria Hotel
Berczy Park


Get the size of the corresponding Dataframe

In [15]:
toronto_venues.shape

(47, 7)

Let's check how many venues were returned for each neighborhood

In [16]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,4,4,4,4,4,4
Business reply mail Processing Centre969 Eastern,5,5,5,5,5,5
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",8,8,8,8,8,8
Central Bay Street,9,9,9,9,9,9
"Commerce Court,Victoria Hotel",2,2,2,2,2,2
"Design Exchange,Toronto Dominion Centre",7,7,7,7,7,7
"High Park,The Junction South",2,2,2,2,2,2
Lawrence Park,2,2,2,2,2,2
"Parkdale,Roncesvalles",6,6,6,6,6,6
Studio District,2,2,2,2,2,2


# Analyze Each Neighbourhood

In [17]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['zNeighborhood'] = toronto_venues['Neighborhood'] 
toronto_onehot.shape
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.shape

(47, 38)

In [18]:
toronto_grouped = toronto_onehot.groupby('zNeighborhood').mean().reset_index()
toronto_grouped.shape

(10, 38)

In [19]:
num_top_venues = 5

for hood in toronto_grouped['zNeighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['zNeighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                   venue  freq
0  General Entertainment  0.25
1           Skating Rink  0.25
2                   Café  0.25
3        College Stadium  0.25
4                   Park  0.00


----Business reply mail Processing Centre969 Eastern----
                venue  freq
0         Coffee Shop   0.4
1            Pharmacy   0.2
2   Korean Restaurant   0.2
3  Mexican Restaurant   0.2
4                Park   0.0


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
               venue  freq
0     Discount Store  0.25
1      Train Station  0.12
2        Bus Station  0.12
3   Department Store  0.12
4  Convenience Store  0.12


----Central Bay Street----
                  venue  freq
0                Bakery  0.22
1              Bus Line  0.22
2          Soccer Field  0.11
3  Fast Food Restaurant  0.11
4           Bus Station  0.11


----Commerce Court,Victoria Hotel----
                   venue  freq
0    American 

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['zNeighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.shape

(10, 11)

# 4. Cluster Neighborhoods

In [36]:
# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('zNeighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 2, 0, 0, 0, 0, 1], dtype=int32)

In [37]:
toronto_merged = dfToronto

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Print Shop,Fast Food Restaurant,Train Station,Caribbean Restaurant,Discount Store,Department Store,Convenience Store,College Stadium,Coffee Shop,Chinese Restaurant
1,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,History Museum,Bar,Train Station,Caribbean Restaurant,Discount Store,Department Store,Convenience Store,College Stadium,Coffee Shop,Chinese Restaurant
2,M6R,West Toronto,"Parkdale,Roncesvalles",43.64896,-79.456325,0,Electronics Store,Medical Center,Rental Car Location,Pizza Place,Breakfast Spot,Mexican Restaurant,Department Store,Convenience Store,College Stadium,Coffee Shop
3,M7Y,East Toronto,Business reply mail Processing Centre969 Eastern,43.662744,-79.321558,0,Coffee Shop,Mexican Restaurant,Korean Restaurant,Pharmacy,Train Station,Department Store,Convenience Store,College Stadium,Chinese Restaurant,Caribbean Restaurant
4,M5K,Downtown Toronto,"Design Exchange,Toronto Dominion Centre",43.647177,-79.381576,2,Fried Chicken Joint,Bank,Hakka Restaurant,Caribbean Restaurant,Thai Restaurant,Bakery,Athletics & Sports,Bar,Breakfast Spot,Bus Line
5,M6P,West Toronto,"High Park,The Junction South",43.661608,-79.464763,0,Convenience Store,Playground,Train Station,Café,Discount Store,Department Store,College Stadium,Coffee Shop,Chinese Restaurant,Caribbean Restaurant
6,M5V,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",43.628947,-79.39442,0,Discount Store,Train Station,Department Store,Convenience Store,Coffee Shop,Chinese Restaurant,Bus Station,Caribbean Restaurant,College Stadium,Café
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Bus Line,Bakery,Bus Station,Metro Station,Park,Fast Food Restaurant,Soccer Field,Bar,Breakfast Spot,Bank
8,M5L,Downtown Toronto,"Commerce Court,Victoria Hotel",43.648198,-79.379817,0,American Restaurant,Motel,Discount Store,Department Store,Convenience Store,College Stadium,Coffee Shop,Chinese Restaurant,Caribbean Restaurant,Café
9,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Café,General Entertainment,Skating Rink,College Stadium,Train Station,Department Store,Convenience Store,Coffee Shop,Chinese Restaurant,Caribbean Restaurant


In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Cluster 1

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Print Shop,Fast Food Restaurant,Train Station,Caribbean Restaurant,Discount Store,Department Store,Convenience Store,College Stadium,Coffee Shop,Chinese Restaurant
1,Central Toronto,0,History Museum,Bar,Train Station,Caribbean Restaurant,Discount Store,Department Store,Convenience Store,College Stadium,Coffee Shop,Chinese Restaurant
2,West Toronto,0,Electronics Store,Medical Center,Rental Car Location,Pizza Place,Breakfast Spot,Mexican Restaurant,Department Store,Convenience Store,College Stadium,Coffee Shop
3,East Toronto,0,Coffee Shop,Mexican Restaurant,Korean Restaurant,Pharmacy,Train Station,Department Store,Convenience Store,College Stadium,Chinese Restaurant,Caribbean Restaurant
5,West Toronto,0,Convenience Store,Playground,Train Station,Café,Discount Store,Department Store,College Stadium,Coffee Shop,Chinese Restaurant,Caribbean Restaurant
6,Downtown Toronto,0,Discount Store,Train Station,Department Store,Convenience Store,Coffee Shop,Chinese Restaurant,Bus Station,Caribbean Restaurant,College Stadium,Café
7,Downtown Toronto,0,Bus Line,Bakery,Bus Station,Metro Station,Park,Fast Food Restaurant,Soccer Field,Bar,Breakfast Spot,Bank
8,Downtown Toronto,0,American Restaurant,Motel,Discount Store,Department Store,Convenience Store,College Stadium,Coffee Shop,Chinese Restaurant,Caribbean Restaurant,Café


Cluster 2

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Downtown Toronto,1,Café,General Entertainment,Skating Rink,College Stadium,Train Station,Department Store,Convenience Store,Coffee Shop,Chinese Restaurant,Caribbean Restaurant


Cluster 3

In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,2,Fried Chicken Joint,Bank,Hakka Restaurant,Caribbean Restaurant,Thai Restaurant,Bakery,Athletics & Sports,Bar,Breakfast Spot,Bus Line


### Analysis

After looking at the clustering results, we can see that in Cluster 1 we find a mixture of services. 
Cluster 2 Coffee shop as first venue
Cluster 3 Fast food as first venue