# Toronto Neighborhood Segmenting and Clustering
***
For this notebook I will be exploring the neighborhoods and boroughs of Toronto to determine clusters of similar neighborhoods based on venues in the different areas.

### Scraping data from a table into a DataFrame

First, I need to install the table parser which will allow me the easily get the table from a website. I also import other libraries that will help read the table and create the dataframe.

In [1]:
! pip install html-table-parser-python3

import urllib.request

from html_table_parser import HTMLTableParser
import pandas as pd
import numpy as np

Collecting html-table-parser-python3
  Downloading html_table_parser_python3-0.1.5-py3-none-any.whl (3.5 kB)
Installing collected packages: html-table-parser-python3
Successfully installed html-table-parser-python3-0.1.5


Here I define a function that will take in the url of the website and it will return the contents of the website.

In [2]:
def url_get_contents(url):
    req = urllib.request.Request(url=url)
    f = urllib.request.urlopen(req)
    
    return f.read()

I then utilize the function to get the contents of the webpage that has the table I need. I have to set the column names and drop the first row to fix how the table is set up before starting to clean the data.

In [3]:
xhtml = url_get_contents('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').decode('utf-8')

parser = HTMLTableParser()

parser.feed(xhtml)

toronto_df = pd.DataFrame(parser.tables[0])

toronto_df.columns = toronto_df.iloc[0]
toronto_df.drop(0, inplace=True)

toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"


This next line will get rid of any row where the Borough is 'Not assigned' as that will be used for the clustering. The index is reset so that it is still sequential starting at 0.

In [4]:
toronto_df = toronto_df[toronto_df['Borough'] != 'Not assigned'].reset_index(drop=True)

toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Then any neighbourhood that is 'Not assigned' will need to be changed to the Borough name to make it better to work with.

### Part 1 - Setting Up Dataframe

In [74]:
for index, row in toronto_df.iterrows():
    if row['Neighbourhood'] == 'Not assigned':
        row['Neighbourhood'] = row['Borough']
toronto_df[1:30]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.66263,-79.52831
6,M1B,Scarborough,"Malvern, Rouge",43.81139,-79.19662
7,M3B,North York,Don Mills,43.74923,-79.36186
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.70718,-79.31192
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
10,M6B,North York,Glencairn,43.70687,-79.44812


In [6]:
toronto_df.shape

(103, 3)

### Getting Latitude and Longitude Coordinates

Installing geocoder for latitude and longitude retrieval.

In [7]:
! pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 6.5 MB/s  eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


### Part 2 - Latitudes and Longitudes

Getting latitude and longitude for each postal code.

In [8]:
import geocoder

for index, row in toronto_df.iterrows():
    lat_lng_coords = None
    postal_code = row['Postal Code']
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
        
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]

    toronto_df.loc[toronto_df.index[index], 'Latitude'] = latitude
    toronto_df.loc[toronto_df.index[index], 'Longitude'] = longitude
    
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


### Visualizing and Analyzing Data

In [9]:
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

! pip install folium
import folium

! pip install geopy
from geopy.geocoders import Nominatim

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 2.7 MB/s eta 0:00:011
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


Getting the coordinates for Toronto to center the map.

In [10]:
address = 'Toronto, CN'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Toronto are 43.6425637, -79.38708718320467.


Creating a map of Toronto to visualize where all the neighbourhoods are. Each marker is a borough, and when you click on it, it will show the neighbourhoods that are in that borough.

In [75]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighbourhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False
    ).add_to(map_toronto)

map_toronto

Here, I've decided to just limit it to the centralized boroughs, which all contain the word Toronto.

In [76]:
toronto_boroughs = toronto_df[toronto_df['Borough'].str.contains('Toronto')].reset_index(drop=True)
toronto_boroughs.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
3,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
4,M4E,East Toronto,The Beaches,43.67709,-79.29547


At this point, I want to get the venues in the areas of each neighborhood. This data will be used to make our clusters. I am using the getNearbyVenues function from the exercises.

In [13]:
# @hidden_cell

CLIENT_ID = 'QRFWMF5F30BOVK0VBEMB1XULAQ1SQXWDWZNBBL5ZRPEBPEGJ'
CLIENT_SECRET = '2RPVAKVBPJWZ0XG3NRCMDF1FIXJLMRA5TNJMNOE1EJMDRIDX'
VERSION = '20180605'
LIMIT = 100

In [14]:
import requests

In [77]:
def getNearbyVenues(names, latitudes, longitudes, radius=400):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [78]:
toronto_venues = getNearbyVenues(names=toronto_boroughs['Neighbourhood'], latitudes=toronto_boroughs['Latitude'], longitudes=toronto_boroughs['Longitude'])
toronto_venues.head()

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65512,-79.36264,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65512,-79.36264,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65512,-79.36264,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,"Regent Park, Harbourfront",43.65512,-79.36264,Berkeley Church,43.655123,-79.365873,Event Space
4,"Regent Park, Harbourfront",43.65512,-79.36264,The Yoga Lounge,43.655515,-79.364955,Yoga Studio


In [79]:
toronto_venues.shape

(1408, 7)

In [80]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,44,44,44,44,44,44
"Brockton, Parkdale Village, Exhibition Place",40,40,40,40,40,40
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",27,27,27,27,27,27
Central Bay Street,32,32,32,32,32,32
Christie,7,7,7,7,7,7
Church and Wellesley,69,69,69,69,69,69
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,24,24,24,24,24,24
Davisville North,4,4,4,4,4,4


In [81]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 208 unique categories.


In order to get the onehot encoding for each type of venue and the neighborhood column into a dataframe, I first get the onehot encodings, then add the neighborhood column from the venues dataframe. It did not add it at the end so I needed to find its index before I could splice together the columns from the dataframe.

In [82]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood']
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [83]:
toronto_onehot.shape

(1408, 209)

In [84]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.03125,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.014493,0.014493,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.014493,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [85]:
num_top_venues = 5

for neighbourhood in toronto_grouped['Neighbourhood']:
    print("----"+neighbourhood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == neighbourhood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
            venue  freq
0     Coffee Shop  0.09
1    Cocktail Bar  0.07
2          Bakery  0.05
3  Farmers Market  0.05
4     Cheese Shop  0.05


----Brockton, Parkdale Village, Exhibition Place----
         venue  freq
0          Bar  0.08
1         Café  0.08
2  Coffee Shop  0.08
3   Restaurant  0.08
4  Supermarket  0.05


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
             venue  freq
0      Coffee Shop  0.08
1             Café  0.05
2       Restaurant  0.04
3  Thai Restaurant  0.04
4            Hotel  0.03


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
               venue  freq
0               Park  0.07
1               Café  0.07
2  French Restaurant  0.07
3       Intersection  0.04
4      Grocery Store  0.04


----Central Bay Street----
                       venue  freq
0                Coffee Shop  0.09
1  Middle Eastern Restaurant  0.

In [86]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [87]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Farmers Market,Pharmacy,Beer Bar,Seafood Restaurant,Cheese Shop,Bakery,Breakfast Spot
1,"Brockton, Parkdale Village, Exhibition Place",Bar,Restaurant,Café,Coffee Shop,Nightclub,Breakfast Spot,Sandwich Place,Supermarket,Ethiopian Restaurant,Cocktail Bar
2,"Business reply mail Processing Centre, South C...",Coffee Shop,Café,Restaurant,Thai Restaurant,Salad Place,Hotel,Steakhouse,Gym,Sushi Restaurant,Bakery
3,"CN Tower, King and Spadina, Railway Lands, Har...",Café,Park,French Restaurant,Restaurant,Speakeasy,Ramen Restaurant,Pub,Italian Restaurant,Caribbean Restaurant,Intersection
4,Central Bay Street,Coffee Shop,Middle Eastern Restaurant,Bubble Tea Shop,Neighborhood,Clothing Store,Sushi Restaurant,Plaza,Poke Place,Italian Restaurant,Spa


In [88]:
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

In [89]:
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_boroughs

toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,2,Breakfast Spot,Coffee Shop,Thrift / Vintage Store,Pub,Electronics Store,Event Space,Spa,Bakery,Theater,Yoga Studio
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188,2,Coffee Shop,Park,General Entertainment,Bookstore,Salad Place,Restaurant,Bar,College Auditorium,Thai Restaurant,Theater
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804,2,Coffee Shop,Clothing Store,Hotel,Café,Sandwich Place,Middle Eastern Restaurant,Ramen Restaurant,Bar,Diner,Fast Food Restaurant
3,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587,2,Coffee Shop,Cosmetics Shop,Café,Japanese Restaurant,Gastropub,Theater,Middle Eastern Restaurant,Food Truck,Lingerie Store,Restaurant
4,M4E,East Toronto,The Beaches,43.67709,-79.29547,2,Health Food Store,Pub,Trail,Yoga Studio,Eastern European Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
5,M5E,Downtown Toronto,Berczy Park,43.64536,-79.37306,2,Coffee Shop,Cocktail Bar,Restaurant,Farmers Market,Pharmacy,Beer Bar,Seafood Restaurant,Cheese Shop,Bakery,Breakfast Spot
6,M5G,Downtown Toronto,Central Bay Street,43.65609,-79.38493,2,Coffee Shop,Middle Eastern Restaurant,Bubble Tea Shop,Neighborhood,Clothing Store,Sushi Restaurant,Plaza,Poke Place,Italian Restaurant,Spa
7,M6G,Downtown Toronto,Christie,43.66869,-79.42071,2,Café,Grocery Store,Coffee Shop,Candy Store,Baby Store,Yoga Studio,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6497,-79.38258,2,Coffee Shop,Café,Restaurant,Salad Place,Gym,Japanese Restaurant,Hotel,Seafood Restaurant,Breakfast Spot,Steakhouse
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.66505,-79.43891,2,Park,Pet Store,Bakery,Pharmacy,Smoke Shop,Brazilian Restaurant,Café,Bank,Bus Line,Pool


### Part 3 - Map of Clusters and Analysis of the Clusters

In [90]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [93]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Central Toronto,0,Accessories Store,Furniture / Home Store,Food & Drink Shop,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [95]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,West Toronto,1,Park,Residential Building (Apartment / Condo),Yoga Studio,Eastern European Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant
23,Central Toronto,1,Park,Yoga Studio,Electronics Store,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [92]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,2,Breakfast Spot,Coffee Shop,Thrift / Vintage Store,Pub,Electronics Store,Event Space,Spa,Bakery,Theater,Yoga Studio
1,Downtown Toronto,2,Coffee Shop,Park,General Entertainment,Bookstore,Salad Place,Restaurant,Bar,College Auditorium,Thai Restaurant,Theater
2,Downtown Toronto,2,Coffee Shop,Clothing Store,Hotel,Café,Sandwich Place,Middle Eastern Restaurant,Ramen Restaurant,Bar,Diner,Fast Food Restaurant
3,Downtown Toronto,2,Coffee Shop,Cosmetics Shop,Café,Japanese Restaurant,Gastropub,Theater,Middle Eastern Restaurant,Food Truck,Lingerie Store,Restaurant
4,East Toronto,2,Health Food Store,Pub,Trail,Yoga Studio,Eastern European Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
5,Downtown Toronto,2,Coffee Shop,Cocktail Bar,Restaurant,Farmers Market,Pharmacy,Beer Bar,Seafood Restaurant,Cheese Shop,Bakery,Breakfast Spot
6,Downtown Toronto,2,Coffee Shop,Middle Eastern Restaurant,Bubble Tea Shop,Neighborhood,Clothing Store,Sushi Restaurant,Plaza,Poke Place,Italian Restaurant,Spa
7,Downtown Toronto,2,Café,Grocery Store,Coffee Shop,Candy Store,Baby Store,Yoga Studio,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop
8,Downtown Toronto,2,Coffee Shop,Café,Restaurant,Salad Place,Gym,Japanese Restaurant,Hotel,Seafood Restaurant,Breakfast Spot,Steakhouse
9,West Toronto,2,Park,Pet Store,Bakery,Pharmacy,Smoke Shop,Brazilian Restaurant,Café,Bank,Bus Line,Pool


In [94]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Central Toronto,3,Park,Playground,Yoga Studio,Electronics Store,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
33,Downtown Toronto,3,Playground,Park,Bike Trail,Tennis Court,Cupcake Shop,Dance Studio,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant


## Conclusion
***
As you can see above, most of the neighborhoods were clusted into number 2. They all seem to have coffee shops and cafes in their top 3 venues, so that would be why they are clustered together. There aren't many neighborhoods in the other clusters, but clusters 1 and 3 look like they are grouped by parks/playgrounds to start and then differentiated from each other based on the next most common venus. For a project like this, I would want to go back and maybe choose more neighborhoods instead of limiting to just the ones in the Toronto boroughs to get more data and maybe find more similar clusters that could be used for comparisons, but for right now, this is good to show the different neighborhoods and their similarities within Toronto. Some other things that may need changing would be the radius when finding the venues and the number of clusters.