# Segmenting Toronto Neighbourhoods

This current project shows how data on Toronto neighbourhoods is collected, processed and analyzed.

## Housekeeping

We initially begin by importing all the needed libraries for the following analysis.

In [116]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
import json
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 
import lxml

## Data Collection

We start by reading-into the data for all the postal codes in Toronto, using the pandas function `pd.read_html` which reads all the tables in a given web page. We then access the first table (the one with the codes) and use it for the following analysis.

In [117]:
# Read all the tables onto the page and then select only the one with relevant information
tables = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
bor = pd.DataFrame(tables[0])

# Drop boroughs that are not assigned
index = bor[bor["Borough"]=="Not assigned"].index
bor.drop(index, inplace = True)
bor[bor["Neighbourhood"]=="Not assigned"]
bor.loc[bor["Neighbourhood"]=="Not assigned", "Neighbourhood"] = bor.loc[bor["Neighbourhood"]=="Not assigned", "Borough"]

We can now see the result of the successfully imported and processed table.

In [118]:
bor.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
9,M9A,Etobicoke,Islington Avenue
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


We can also see how the different neighbourhoods are grouped per borough (and also per postal code). To illustrate, we check out the York borough.

In [119]:
# Make a grouping of neighbourhoods per borough and display a specific one
borg = bor[["Neighbourhood", "Borough"]].groupby(bor["Borough"])
borg.get_group("York")

Unnamed: 0,Neighbourhood,Borough
34,Humewood-Cedarvale,York
48,Caledonia-Fairbanks,York
131,Del Ray,York
132,Keelesdale,York
133,Mount Dennis,York
134,Silverthorn,York
145,The Junction North,York
146,Runnymede,York
149,Weston,York


The `geocoder` package continuously failed and we thus use pandas to read directly the table with latitudes and longitudes from the provided web address.

In [120]:
# Read-in the postal codes after geocoder continuously failed
postal = pd.read_csv("http://cocl.us/Geospatial_data")

# Merge the datafarme with borough information with the one with latitude and longitude
data = pd.merge(bor, postal, left_on='Postcode', right_on='Postal Code')
data.drop("Postal Code", axis = 1, inplace = True)

We now have the consolidated information that can be used for further analysis.

In [121]:
data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Heights,43.718518,-79.464763
4,M6A,North York,Lawrence Manor,43.718518,-79.464763


For some reason and no particular logic, the dataframe that needs to be created has a column which lists a number of neighbourhoods that fall under the same postal code. This is bad practice as each cell should contain only one piece of data, it is difficult to analyze, and needs further processing. At any rate, this can be achieved by grouping data per postal code, and then formatting the names of the neighbourhoods as strings. The new data frame is merged with other relevant data and irrelevant columns are dropped, thus reaching the desired result.

In [122]:
# Consolidate all the neighbourhoods per single postal code
cons = data.groupby('Postcode')['Neighbourhood'].apply(", ".join)
cons = pd.merge(cons, postal, left_on='Postcode', right_on='Postal Code')
data2 = pd.merge(cons, data, left_on='Postal Code', right_on='Postcode')

# Drop unnecessary columns after merging dataframes
data2.drop("Postal Code", axis = 1, inplace = True)
data2.drop_duplicates("Postcode", inplace=True)
data2.drop(["Neighbourhood_y", "Latitude_y", "Longitude_y"], axis = 1, inplace = True)

# Reorder and rename columns so that all is picture-perfect
data2 = data2[['Postcode', 'Borough', 'Neighbourhood_x', 'Latitude_x', 'Longitude_x']]
data2.columns = ['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude']

We can now display the final dataframe, sorted by postal code.

In [123]:
data2.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
2,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
5,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
8,M1G,Scarborough,Woburn,43.770992,-79.216917
9,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
10,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
11,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
14,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
17,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
20,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## Geospatial Visualization

We next get the coordiantes of Toronto using the `geolocator`, as follows:

In [124]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Let's now create and view a map of Toronto.

In [125]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
map_toronto

We next add a marker for each neighborhood in Toronto.

In [126]:
# add markers to map
for lat, lng, borough, neighborhood in zip(data2['Latitude'], data2['Longitude'], data2['Borough'], data2['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Put credentials for the Foursquare API

In [127]:
CLIENT_ID = 'D3EFN25BTLMQDBKPWWCK2NZC0AW23BVIBAA2X2D2LFHKJPXN' # your Foursquare ID
CLIENT_SECRET = 'CBARVRGDBEK243FNOR4FGAARFDQQAHOHIWRJ0NCOSK551NKT' # your Foursquare Secret
VERSION = '20200227' # Foursquare API version

## Analysis of Scarborough Borough and Its Neighborhoods Per Postal Code

We limit the analysis to a single borough in Toronto, namely Scarborough. Its data and specific are visualized below.

In [128]:
data3 = data2[data2['Borough'] == 'Scarborough'].reset_index(drop=True)
data3.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


We test the code on a single postal code, selecting for clarity the first one in the data frame.

In [129]:
neighborhood_latitude = data3.loc[0,"Latitude"]
neighborhood_longitude = data3.loc[0,"Longitude"]
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=D3EFN25BTLMQDBKPWWCK2NZC0AW23BVIBAA2X2D2LFHKJPXN&client_secret=CBARVRGDBEK243FNOR4FGAARFDQQAHOHIWRJ0NCOSK551NKT&v=20200227&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100'

In [130]:
import requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e5849b6c8cff2001b06d9cb'},
  'headerLocation': 'Malvern',
  'headerFullLocation': 'Malvern, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 1,
  'suggestedBounds': {'ne': {'lat': 43.8111863045, 'lng': -79.18812958073042},
   'sw': {'lat': 43.80218629549999, 'lng': -79.2005772192696}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bb6b9446edc76b0d771311c',
       'name': "Wendy's",
       'location': {'crossStreet': 'Morningside & Sheppard',
        'lat': 43.80744841934756,
        'lng': -79.19905558052072,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.80744841934756,
          'lng': -79.19905558052072}],
        'distance': 387,
        'cc': 'CA',
        'city': 'Toronto',
    

Then we used the defined function that allows us to get all the nearby venues given the postal codes.

In [131]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Get nearby venues for all the neighbourhoods in Toronto

In [132]:
scar_venues = getNearbyVenues(names=data3['Neighbourhood'],
                                   latitudes=data3['Latitude'],
                                   longitudes=data3['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge


In [133]:
scar_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant


In [134]:
scar_venues[["Venue", "Neighborhood"]].groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Agincourt,5
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",3
"Birch Cliff, Cliffside West",4
Cedarbrae,8
"Clairlea, Golden Mile, Oakridge",9
"Clarks Corners, Sullivan, Tam O'Shanter",14
"Cliffcrest, Cliffside, Scarborough Village West",3
"Dorset Park, Scarborough Town Centre, Wexford Heights",5
"East Birchmount Park, Ionview, Kennedy Park",5
"Guildwood, Morningside, West Hill",7


## Clustering of Scarborough Neighborhoods

We then proceed to cluster the different area postal codes in Scarborough, with the neighborhoods combined at the postal code level. This is done as the available latitude and longitude coordinates are also at the postal code level.

In [135]:
# one hot encoding
scar_onehot = pd.get_dummies(scar_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scar_onehot['Neighborhood'] = scar_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scar_onehot.columns[-1]] + list(scar_onehot.columns[:-1])
scar_onehot = scar_onehot[fixed_columns]

scar_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Bakery,Bank,Bar,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Line,...,Rental Car Location,Sandwich Place,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Spa,Supermarket,Thai Restaurant,Vietnamese Restaurant
0,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


As a next step we group the different venues and take into account the mean occurrence of each category according to the post code. Recall that a single post code (which is the unit of analysis here) may contain one or more neighborhoods.

In [136]:
scar_grouped = scar_onehot.groupby('Neighborhood').mean().reset_index()
scar_grouped

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Bakery,Bank,Bar,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Line,...,Rental Car Location,Sandwich Place,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Spa,Supermarket,Thai Restaurant,Vietnamese Restaurant
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,...,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0
1,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.125,0.125,0.125,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0
4,"Clairlea, Golden Mile, Oakridge",0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.222222,...,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0
5,"Clarks Corners, Sullivan, Tam O'Shanter",0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0
6,"Cliffcrest, Cliffside, Scarborough Village West",0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
7,"Dorset Park, Scarborough Town Centre, Wexford ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2
8,"East Birchmount Park, Ionview, Kennedy Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Guildwood, Morningside, West Hill",0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,...,0.142857,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0


We now proceed to see the most common venues in Scarbourough and create a sorted data frame that we will later use for visualization.

In [137]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
scar_venues_sorted = pd.DataFrame(columns=columns)
scar_venues_sorted['Neighborhood'] = scar_grouped['Neighborhood']

for ind in np.arange(scar_grouped.shape[0]):
    scar_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scar_grouped.iloc[ind, :], num_top_venues)

To make sure the operation proceeded as planned we also visualize this data frame, as we have already done before.

In [138]:
scar_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Clothing Store,Skating Rink,Breakfast Spot,Latin American Restaurant,Lounge,Vietnamese Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
1,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Bakery,Playground,Caribbean Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store,Department Store
2,"Birch Cliff, Cliffside West",General Entertainment,Skating Rink,College Stadium,Café,Vietnamese Restaurant,Chinese Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
3,Cedarbrae,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Hakka Restaurant,College Stadium,Grocery Store
4,"Clairlea, Golden Mile, Oakridge",Bus Line,Ice Cream Shop,Intersection,Metro Station,Bus Station,Park,Soccer Field,Bakery,Bar,Coffee Shop


We are now going to cluster the neighborhoods in Scarborough neighborhood in Toronto into 3 different clusters, using the grouped data.

In [139]:
# set number of clusters
kclusters = 4

scar_grouped_clustering = scar_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

In [140]:
scar_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Clothing Store,Skating Rink,Breakfast Spot,Latin American Restaurant,Lounge,Vietnamese Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
1,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Bakery,Playground,Caribbean Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store,Department Store
2,"Birch Cliff, Cliffside West",General Entertainment,Skating Rink,College Stadium,Café,Vietnamese Restaurant,Chinese Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
3,Cedarbrae,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Hakka Restaurant,College Stadium,Grocery Store
4,"Clairlea, Golden Mile, Oakridge",Bus Line,Ice Cream Shop,Intersection,Metro Station,Bus Station,Park,Soccer Field,Bakery,Bar,Coffee Shop


In [141]:
# add clustering labels
scar_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

This has generated a useful data frame that can be visualized to gain insight.

## Cluster Visualization

Next, we merge the dataframe with most common venues and its cluster labels so that we are able to properly visualize it.

In [142]:
# merge scar_grouped with scar_data to add latitude/longitude for each neighborhood
scar_merged = pd.merge(scar_venues_sorted, data3, left_on='Neighborhood', right_on='Neighbourhood')
scar_merged.drop("Neighbourhood", axis = 1, inplace = True)
scar_merged.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Postcode,Borough,Latitude,Longitude
0,0,Agincourt,Clothing Store,Skating Rink,Breakfast Spot,Latin American Restaurant,Lounge,Vietnamese Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,M1S,Scarborough,43.7942,-79.262029
1,0,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Bakery,Playground,Caribbean Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store,Department Store,M1V,Scarborough,43.815252,-79.284577
2,0,"Birch Cliff, Cliffside West",General Entertainment,Skating Rink,College Stadium,Café,Vietnamese Restaurant,Chinese Restaurant,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,M1N,Scarborough,43.692657,-79.264848
3,0,Cedarbrae,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Hakka Restaurant,College Stadium,Grocery Store,M1H,Scarborough,43.773136,-79.239476
4,0,"Clairlea, Golden Mile, Oakridge",Bus Line,Ice Cream Shop,Intersection,Metro Station,Bus Station,Park,Soccer Field,Bakery,Bar,Coffee Shop,M1L,Scarborough,43.711112,-79.284577


The final step is to put the cluster labels on a map so that we can observe the different clusters of neighborhoods.

In [143]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scar_merged['Latitude'], scar_merged['Longitude'], scar_merged['Neighborhood'], scar_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Conclusions

The current exercise focused on undestanding how the different neibghbourhoods in the Scarborough bourough of Toronto are different or similar from one another. Essentially this has boiled down to the task of clustering them according to the different venues that lie within each neighbourhood. We used location data from Foursquare and based on the types of venues created a number of cluster. Over the course of the project, a number of experiments were performed with different numbers ranging from 3 to 7, but it seems that four clusters fit the data best.

The major conlcusion is that **within Scarbourough the western part of the borough is largely homogenous, and residents willing to switch neighborhoods will probably find little difference**. However, the western part is dominated by neighborhoods with highly specific conditions that are entirely unlike each other. Woburn has its own specifics, and this holds true for the Rouge and Malvern neighbourhoods that are different. Another cluster is formed by Highland Creek, Port Union, and Rouge Hill. Each of those three will have a different feel and movers will have to evaluate them on their own to see which cluster best fits their desires. **In short, Scarbourough is very simiar in its west side, but rather diverse in the east.**