# Segmenting and Clustering Neighborhoods in Toronto

I explored, segmented, and clustered the neighborhoods in the city of Toronto.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

<a href="#item1">Question 1</a>


<a href="#item2">Question 2</a>


<a href="#item3">Question 3</a>
 
</font>
</div>

## Import necessary Libraries

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

from geopy.geocoders import Nominatim

import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


<a id='item1'></a>

## Question 1


Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe.

In [2]:
from bs4 import BeautifulSoup
import requests

In [3]:
link = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text

In [4]:
soup = BeautifulSoup(link, 'lxml')
#print(soup.prettify())

In [5]:
table = soup.find("table", class_="wikitable sortable")
#print(table.prettify())

In [6]:
print(table.find("tr").text)


Postcode
Borough
Neighbourhood



In [7]:
headers = ["Postcode","Borough","Neighbourhood"]

In [8]:
table1=""
for tr in table.find_all('tr'):
    row1=""
    for tds in tr.find_all('td'):
        row1=row1+","+tds.text
    table1=table1+row1[1:]
#print(table1)

In [9]:
file=open("toronto.csv","wb")
file.write(bytes(table1,encoding="ascii",errors="ignore"))

8708

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

In [10]:
df = pd.read_csv('toronto.csv',header=None)
df.columns = headers

In [11]:
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Not assigned
8,M8A,Not assigned,Not assigned
9,M9A,Downtown Toronto,Queen's Park


Only process the cells that have an __assigned borough__. Ignore cells with a borough that is __Not assigned__.

In [12]:
df.drop(df[df["Borough"] == "Not assigned"].index, inplace = True) 

More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: __Harbourfront and Regent Park__. These two rows will be combined into one row with the neighborhoods __separated with a comma__ as shown in row 11 in the above table.

In [13]:
df = df.groupby(['Postcode','Borough'], sort=False).agg( ', '.join)
df = df.reset_index()

If a cell has a borough but a __Not assigned__ neighborhood, then the neighborhood will be the __same as the borough__. So for the __9th cell__ in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In [14]:
df[df["Neighbourhood"] == "Not assigned"]

Unnamed: 0,Postcode,Borough,Neighbourhood
4,M7A,Queen's Park,Not assigned


In [15]:
df[df["Neighbourhood"] == "Not assigned"] = "Queen's Park"

In [16]:
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,Queen's Park,Queen's Park,Queen's Park
5,M9A,Downtown Toronto,Queen's Park
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


Use the __.shape__ method to print the number of rows of your dataframe.

In [17]:
df.shape

(103, 3)

<a id='item2'></a>

## Question 2


I used the http://cocl.us/Geospatial_data to create dataframe with longitude and latitude values.

In [18]:
geo = pd.read_csv("http://cocl.us/Geospatial_data")

In [19]:
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [20]:
df = df.join(geo, how = "left")
df.drop("Postal Code", axis=1,inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.806686,-79.194353
1,M4A,North York,Victoria Village,43.784535,-79.160497
2,M5A,Downtown Toronto,Harbourfront,43.763573,-79.188711
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.770992,-79.216917
4,Queen's Park,Queen's Park,Queen's Park,43.773136,-79.239476


<a id='item3'></a>

## Question 3


Explore and cluster the neighborhoods in Toronto

In [21]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


In [22]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="Segmenting and Clustering Neighborhoods in Toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [23]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

In [24]:
# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df["Neighbourhood"]):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [25]:
#Private information deleted
CLIENT_ID = # your Foursquare ID
CLIENT_SECRET = # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [26]:
# defining radius and limit of venues to get
radius=500
LIMIT=100

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
toronto_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Parkwoods
Victoria Village
Harbourfront
Lawrence Heights, Lawrence Manor
Queen's Park
Queen's Park
Rouge, Malvern
Don Mills North
Woodbine Gardens, Parkview Hill
Ryerson, Garden District
Glencairn
Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Highland Creek, Rouge Hill, Port Union
Flemingdon Park, Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Thorncliffe Park
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
East Birchmount Park, Ionview, Kennedy Park
Bayview Village
CFB Toronto, Downsview East
The Danforth West, Riv

In [29]:
toronto_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,Victoria Village,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,Harbourfront,43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
3,Harbourfront,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,Harbourfront,43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant
5,Harbourfront,43.763573,-79.188711,Enterprise Rent-A-Car,43.764076,-79.193406,Rental Car Location
6,Harbourfront,43.763573,-79.188711,Woburn Medical Centre,43.766631,-79.192286,Medical Center
7,Harbourfront,43.763573,-79.188711,Lawrence Ave E & Kingston Rd,43.767704,-79.18949,Intersection
8,Harbourfront,43.763573,-79.188711,Eggsmart,43.7678,-79.190466,Breakfast Spot
9,"Lawrence Heights, Lawrence Manor",43.770992,-79.216917,Starbucks,43.770037,-79.221156,Coffee Shop


In [30]:
toronto_venues.shape

(2213, 7)

In [31]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",2,2,2,2,2,2
Agincourt,22,22,22,22,22,22
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",39,39,39,39,39,39
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",8,8,8,8,8,8
"Bathurst Manor, Downsview North, Wilson Heights",20,20,20,20,20,20
Bayview Village,19,19,19,19,19,19
"Bedford Park, Lawrence Manor East",100,100,100,100,100,100
Berczy Park,1,1,1,1,1,1
"Birch Cliff, Cliffside West",100,100,100,100,100,100
"Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe",4,4,4,4,4,4


## Analyze Each Neighborhood

In [32]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot.head()

Unnamed: 0,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [33]:
toronto_onehot.shape

(2213, 273)

In [34]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.025641,0.000000,0.00,0.000000,0.000000,0.000000,0.025641,0.0,0.025641
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
4,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.000000,0.05,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
5,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.000000,0.00,0.000000,0.052632,0.000000,0.000000,0.0,0.052632
6,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030000,...,0.0,0.010000,0.000000,0.00,0.000000,0.000000,0.010000,0.000000,0.0,0.000000
7,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0,0.000000
8,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020000,...,0.0,0.020000,0.000000,0.00,0.000000,0.000000,0.010000,0.000000,0.0,0.000000
9,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0,0.000000


In [35]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0              Airport   0.5
1                 Park   0.5
2    Accessories Store   0.0
3   Mexican Restaurant   0.0
4  Monument / Landmark   0.0


----Agincourt----
                    venue  freq
0                    Café  0.14
1             Coffee Shop  0.09
2          Breakfast Spot  0.09
3                 Stadium  0.05
4  Furniture / Home Store  0.05


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
         venue  freq
0  Coffee Shop  0.23
1          Gym  0.05
2        Diner  0.05
3         Park  0.05
4  Yoga Studio  0.03


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
            venue  freq
0     Pizza Place  0.25
1        Pharmacy  0.12
2             Pub  0.12
3             Gym  0.12
4  Sandwich Place  0.12


----Bathurst Manor, Downsview North, Wilson Heights----
                       venue  freq
0                Coffee Shop  0

                    venue  freq
0             Coffee Shop  0.12
1     Sporting Goods Shop  0.09
2            Burger Joint  0.06
3  Furniture / Home Store  0.06
4                 Brewery  0.03


----East Toronto----
                  venue  freq
0  Fast Food Restaurant  0.18
1           Pizza Place  0.18
2    Athletics & Sports  0.09
3                  Bank  0.09
4          Intersection  0.09


----Emery, Humberlea----
                venue  freq
0         Coffee Shop  0.14
1  Italian Restaurant  0.05
2      Ice Cream Shop  0.04
3                Café  0.04
4      Sandwich Place  0.04


----Fairview, Henry Farm, Oriole----
                  venue  freq
0          Liquor Store  0.17
1        Discount Store  0.17
2  Gym / Fitness Center  0.17
3                   Gym  0.17
4         Grocery Store  0.17


----First Canadian Place, Underground city----
                             venue  freq
0                   Baseball Field   1.0
1                Accessories Store   0.0
2        Middle Eas

               venue  freq
0  Convenience Store  0.07
1                Gym  0.07
2             Bakery  0.07
3     Sandwich Place  0.07
4    Supplement Shop  0.07


----Studio District----
                       venue  freq
0                Coffee Shop  0.09
1             Clothing Store  0.07
2             Cosmetics Shop  0.04
3                       Café  0.03
4  Middle Eastern Restaurant  0.03


----The Annex, North Midtown, Yorkville----
                  venue  freq
0                  Park   0.4
1  Fast Food Restaurant   0.2
2         Women's Store   0.2
3                Market   0.2
4    Mexican Restaurant   0.0


----The Beaches----
                 venue  freq
0                 Café  0.25
1                 Bank  0.25
2  Japanese Restaurant  0.25
3   Chinese Restaurant  0.25
4    Accessories Store  0.00


----The Beaches West, India Bazaar----
                venue  freq
0        Dessert Shop  0.09
1         Pizza Place  0.09
2      Sandwich Place  0.09
3    Sushi Restaurant  0.06

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Airport,Park,Yoga Studio,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
1,Agincourt,Café,Breakfast Spot,Coffee Shop,Music Venue,Restaurant,Bar,Bakery,Stadium,Italian Restaurant,Burrito Place
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Coffee Shop,Diner,Park,Gym,Fried Chicken Joint,Portuguese Restaurant,Smoothie Shop,Seafood Restaurant,Sandwich Place,Burger Joint
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Pizza Place,Gym,Coffee Shop,Pharmacy,Sandwich Place,Skating Rink,Pub,Yoga Studio,Discount Store,Dessert Shop
4,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Chinese Restaurant,Sandwich Place,Bank,Sushi Restaurant,Middle Eastern Restaurant,Restaurant,Deli / Bodega,Fast Food Restaurant,Fried Chicken Joint


## Cluster Neighborhoods

In [38]:
from sklearn.cluster import KMeans

In [39]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 
# to change use .astype()

array([1, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0])

In [43]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.806686,-79.194353,0.0,Fast Food Restaurant,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Women's Store
1,M4A,North York,Victoria Village,43.784535,-79.160497,0.0,Bar,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant
2,M5A,Downtown Toronto,Harbourfront,43.763573,-79.188711,0.0,Electronics Store,Pizza Place,Breakfast Spot,Rental Car Location,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Dog Run,Diner
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.770992,-79.216917,0.0,Coffee Shop,Korean Restaurant,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
4,Queen's Park,Queen's Park,Queen's Park,43.773136,-79.239476,0.0,Bank,Hakka Restaurant,Fried Chicken Joint,Playground,Caribbean Restaurant,Athletics & Sports,Thai Restaurant,Bakery,Gas Station,Discount Store


In [44]:
toronto_merged=toronto_merged.dropna()

In [45]:
toronto_merged['Cluster_Labels'] = toronto_merged.Cluster_Labels.astype(int)

In [46]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

__Cluster 1__

In [48]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Fast Food Restaurant,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Women's Store
1,North York,0,Bar,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant
2,Downtown Toronto,0,Electronics Store,Pizza Place,Breakfast Spot,Rental Car Location,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Dog Run,Diner
3,North York,0,Coffee Shop,Korean Restaurant,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
4,Queen's Park,0,Bank,Hakka Restaurant,Fried Chicken Joint,Playground,Caribbean Restaurant,Athletics & Sports,Thai Restaurant,Bakery,Gas Station,Discount Store
5,Downtown Toronto,0,Bank,Hakka Restaurant,Fried Chicken Joint,Playground,Caribbean Restaurant,Athletics & Sports,Thai Restaurant,Bakery,Gas Station,Discount Store
6,Scarborough,0,Discount Store,Convenience Store,Hobby Shop,Department Store,Coffee Shop,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dessert Shop
7,North York,0,Bakery,Park,Metro Station,Bus Station,Bus Line,Intersection,Soccer Field,Fast Food Restaurant,Dumpling Restaurant,Drugstore
8,East York,0,Motel,American Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Department Store
9,Downtown Toronto,0,Café,Skating Rink,General Entertainment,College Stadium,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Department Store


__Cluster 2__

In [49]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,East York,1,Park,Playground,Asian Restaurant,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
25,Downtown Toronto,1,Park,Food & Drink Shop,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore
30,Downtown Toronto,1,Airport,Park,Yoga Studio,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
40,North York,1,Convenience Store,Coffee Shop,Park,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
50,North York,1,Park,Trail,Playground,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
74,Central Toronto,1,Park,Women's Store,Market,Fast Food Restaurant,Comfort Food Restaurant,Comic Shop,Event Space,Ethiopian Restaurant,Colombian Restaurant,Empanada Restaurant
90,Scarborough,1,Park,River,Smoke Shop,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
98,Etobicoke,1,Park,Yoga Studio,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


__Cluster 3__

In [50]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Downtown Toronto,2,Pool,Baseball Field,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore
97,Downtown Toronto,2,Baseball Field,Yoga Studio,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Fast Food Restaurant


__Cluster 4__

In [51]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Downtown Toronto,3,Cafeteria,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,College Rec Center


__Cluster 5__

In [52]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,Etobicoke,4,Jewelry Store,Yoga Studio,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


## Thank you!