# Segmenting and Clustering Neighborhoods in Toronto

The key techniques of this notebook are
- Converting addresses into their equivalent latitude and longitude values. 
- Implementing Foursquare API to explore neighborhoods.
- Web scraping using beautifulsoap
- Exploring the most common venue categories in each neighborhood
- Grouping the neighborhoods into clusters.
- Using Folium library to visualize the neighborhoods and their emerging clusters.

-- TODO
- Load Toronto Neighbourhood data from CSV file
- Map Toronto
- Analyse a Neighbourhood
- Get top venues in the neighbourhood
- Categorise Top 10 Venues for neighbourhood
- Cluster the categories
- Map the Clusters

In [1]:
# importing new libraries
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
print('Libraries imported.')

Libraries imported.


### Load Toronto Neighbourhood data

In [3]:
Neighbourhoods = pd.read_csv('toronto_base_Coords.csv', index_col=0)
print(Neighbourhoods.shape)
Neighbourhoods.head(5)

(103, 5)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Port Union, Rouge Hill, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Filter Data on Boroughs containing Toronto

In [12]:
# Prepare data set for Analysis, Filter Boroughs containing '%Toronto%'
df = Neighbourhoods[Neighbourhoods['Borough'].str.contains('Toronto')].reset_index(drop=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


### Create Toronto Map

In [20]:
map_Toronto = folium.Map(location=[43.6532, -79.3832], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_Toronto)  
    
map_Toronto

In [14]:
# Initialise FourSquare
CLIENT_ID = 'HVU4OJEHX40I4IXTU2NJRXPBUZFR4W5V3MKTI2LBGBWPOVP0' # your Foursquare ID
CLIENT_SECRET = 'SDRUSAJWFUC2HA4NDIGUWCP5DGEXYA22SW5JCRERXNRAXGVD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [15]:
# Function to get venues from FourSquare
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 500 # define radius
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Analyse Neighbourhood by Venue Categories

In [16]:
toronto_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )
#print(toronto_venues.shape)
#print(toronto_venues.head())
toronto_venues.groupby('Neighbourhood').count()
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 228 uniques categories.


In [17]:
Toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

Toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

# Toronto_onehot.head()
Toronto_grouped = Toronto_onehot.groupby('Neighbourhood').mean().reset_index()
#Toronto_grouped

In [18]:
# Iterate Toronto_grouped and list all Neighbourhood
for hood in Toronto_grouped['Neighbourhood']:
    #print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [22]:
# Analysing Most common venues per Neighbourhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = Toronto_grouped['Neighbourhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

In [23]:
neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, Richmond, King",Coffee Shop,Café,Steakhouse,American Restaurant,Thai Restaurant,Bakery,Clothing Store,Asian Restaurant,Gym,Bar
1,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Farmers Market,Bakery,Italian Restaurant,Steakhouse,Cheese Shop,Pub,Café
2,Business reply mail Processing Centre969 Eastern,Comic Shop,Auto Workshop,Smoke Shop,Park,Light Rail Station,Spa,Farmers Market,Fast Food Restaurant,Brewery,Burrito Place
3,"Cabbagetown, St. James Town",Coffee Shop,Restaurant,Café,Pizza Place,Italian Restaurant,Pub,Indian Restaurant,Park,Bakery,Beer Store
4,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Bar,Ice Cream Shop,Café,Burger Joint,Bubble Tea Shop,Chinese Restaurant,Spa


## Cluster Toronto Common Venues

In [25]:
print(Toronto_grouped.shape)
Toronto_grouped_clustering = Toronto_grouped.drop('Neighbourhood', 1)
print(Toronto_grouped_clustering.shape)

(38, 229)
(38, 228)


In [26]:
# Use KMeans clustering with 5 clusters
kclusters = 5

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

#kmeans.labels_[0:10]

In [28]:
sorted(df['Neighbourhood'])
sorted(neighbourhoods_venues_sorted['Neighbourhood'])

['Adelaide, Richmond, King',
 'Berczy Park',
 'Business reply mail Processing Centre969 Eastern',
 'Cabbagetown, St. James Town',
 'Central Bay Street',
 'Christie',
 'Church and Wellesley',
 'Commerce Court, Victoria Hotel',
 'Davisville',
 'Davisville North',
 'Dovercourt Village, Dufferin',
 'Exhibition Place, Brockton, Parkdale Village',
 'Forest Hill West, Forest Hill North',
 'Harbord, University of Toronto',
 'Harbourfront East, Union Station, Toronto Islands',
 'High Park, The Junction South',
 'Kensington Market, Grange Park, Chinatown',
 'King and Spadina, CN Tower, South Niagara, Bathurst Quay, Island airport, Railway Lands, Harbourfront West',
 'Lawrence Park',
 'Moore Park, Summerhill East',
 'North Midtown, The Annex, Yorkville',
 'North Toronto West',
 'Regent Park, Harbourfront',
 'Roncesvalles, Parkdale',
 'Rosedale',
 'Roselawn',
 'Runnymede, Swansea',
 'Ryerson, Garden District',
 'South Hill, Rathnelly, Forest Hill SE, Deer Park, Summerhill West',
 'St. James Town',

In [29]:
print(len(kmeans.labels_))
print(df.shape)

38
(38, 5)


In [31]:
Toronto_merged = df

Toronto_merged['Cluster Labels'] = kmeans.labels_

Toronto_merged = Toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

print(Toronto_merged.shape)
Toronto_merged

(38, 16)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Coffee Shop,Neighborhood,Park,Pub,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Bookstore,Bubble Tea Shop,Bakery,Spa,Juice Bar,Liquor Store
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Pizza Place,Sandwich Place,Park,Steakhouse,Sushi Restaurant,Food & Drink Shop,Ice Cream Shop,Pub,Movie Theater,Fish & Chips Shop
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Italian Restaurant,Bakery,American Restaurant,Yoga Studio,Coworking Space,Seafood Restaurant,Sandwich Place
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Dim Sum Restaurant,Park,Swim School,Bus Line,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Grocery Store,Clothing Store,Burger Joint,Food & Drink Shop,Dance Studio,Hotel,Sandwich Place,Breakfast Spot,Park,Discount Store
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Coffee Shop,Sporting Goods Shop,Clothing Store,Gym / Fitness Center,Fast Food Restaurant,Mexican Restaurant,Diner,Dessert Shop,Park,Chinese Restaurant
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Dessert Shop,Sandwich Place,Pizza Place,Café,Italian Restaurant,Coffee Shop,Sushi Restaurant,Seafood Restaurant,Fried Chicken Joint,Diner
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0,Restaurant,Yoga Studio,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
9,M4V,Central Toronto,"South Hill, Rathnelly, Forest Hill SE, Deer Pa...",43.686412,-79.400049,0,Coffee Shop,Pub,Pizza Place,Supermarket,Bagel Shop,Fried Chicken Joint,Sports Bar,American Restaurant,Convenience Store,Vietnamese Restaurant


### Map Clusters

In [32]:
map_clusters = folium.Map(location=[43.72, -79.3832], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighbourhood'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

In [33]:
print('There are {} clusters.'.format(len(Toronto_merged['Cluster Labels'].unique())))

There are 5 clusters.


#### Display Cluster data for each of the 5 clusters

In [34]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,4,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Gay Bar,Restaurant,Burger Joint,Pub,Gastropub,Fast Food Restaurant,Men's Store


In [35]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Central Toronto,3,Coffee Shop,Sandwich Place,Café,Pizza Place,Park,Liquor Store,Burger Joint,Jewish Restaurant,Indian Restaurant,BBQ Joint


In [36]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Downtown Toronto,2,Coffee Shop,Hotel,Aquarium,Café,Pizza Place,Scenic Lookout,Restaurant,Bakery,Brewery,Italian Restaurant


In [37]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Downtown Toronto,1,Café,Coffee Shop,Bar,Japanese Restaurant,Bookstore,Restaurant,Bakery,Poutine Place,Pub,Chinese Restaurant


In [38]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Coffee Shop,Neighborhood,Park,Pub,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
1,East Toronto,0,Greek Restaurant,Ice Cream Shop,Coffee Shop,Italian Restaurant,Bookstore,Bubble Tea Shop,Bakery,Spa,Juice Bar,Liquor Store
2,East Toronto,0,Pizza Place,Sandwich Place,Park,Steakhouse,Sushi Restaurant,Food & Drink Shop,Ice Cream Shop,Pub,Movie Theater,Fish & Chips Shop
3,East Toronto,0,Café,Coffee Shop,Gastropub,Italian Restaurant,Bakery,American Restaurant,Yoga Studio,Coworking Space,Seafood Restaurant,Sandwich Place
4,Central Toronto,0,Dim Sum Restaurant,Park,Swim School,Bus Line,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
5,Central Toronto,0,Grocery Store,Clothing Store,Burger Joint,Food & Drink Shop,Dance Studio,Hotel,Sandwich Place,Breakfast Spot,Park,Discount Store
6,Central Toronto,0,Coffee Shop,Sporting Goods Shop,Clothing Store,Gym / Fitness Center,Fast Food Restaurant,Mexican Restaurant,Diner,Dessert Shop,Park,Chinese Restaurant
7,Central Toronto,0,Dessert Shop,Sandwich Place,Pizza Place,Café,Italian Restaurant,Coffee Shop,Sushi Restaurant,Seafood Restaurant,Fried Chicken Joint,Diner
8,Central Toronto,0,Restaurant,Yoga Studio,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
9,Central Toronto,0,Coffee Shop,Pub,Pizza Place,Supermarket,Bagel Shop,Fried Chicken Joint,Sports Bar,American Restaurant,Convenience Store,Vietnamese Restaurant
