# Segmenting and Clustering Neighborhoods in Toronto

## Part 1: Scrape Postal Codes from Wikipedia page

The first part of this project requires scraping the table of postal codes in Canada from the following website: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.

In [97]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

print("Done Importing Libraries")

Done Importing Libraries


The code below reads the table from the website into a pandas dataframe.

In [2]:
postal_code_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(postal_code_url)
soup = BeautifulSoup(response.text)

post_codes = []
for tr in soup.find_all('tr'):
    td = tr.find_all('td')
    if len(td) == 3:
        if not 'Not assigned' in td[1].text:
            if 'Not assigned' in td[2].text:
                post_codes.append([td[0].text, td[1].text, td[1].text])
            else:
                post_codes.append([td[0].text, td[1].text, td[2].text.rstrip("\n\r")])

post_codes = pd.DataFrame(post_codes, columns=['Postalcode', 'Borough', 'Neighborhood'])
print(post_codes.head())

  Postalcode           Borough      Neighborhood
0        M3A        North York         Parkwoods
1        M4A        North York  Victoria Village
2        M5A  Downtown Toronto      Harbourfront
3        M6A        North York  Lawrence Heights
4        M6A        North York    Lawrence Manor


The following code combines neighborhoods by borough and post code.

In [3]:
borough_df = post_codes.groupby(['Postalcode','Borough'])['Neighborhood'].apply(lambda x: ', '.join(x)).reset_index()
borough_df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [4]:
borough_df.shape

(103, 3)

## Part 2: Attach Lat. and Long. To Postal Code Data

Read in the coordinate data from the povided file.

In [10]:
#http://cocl.us/Geospatial_data
!wget -q -O 'toronto_data.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

with open('toronto_data.csv') as csv_data:
    toronto_data = pd.read_csv(csv_data)
toronto_data.columns = ['Postalcode', 'Latitude', 'Longitude']
print(toronto_data.shape)
toronto_data.head()

Data downloaded!
(103, 3)


Unnamed: 0,Postalcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Attach the coordinate data to the existing borough data frame.  This new data frame should still have 103 rows, but now an additional 2 columns.

In [19]:
all_data = borough_df.set_index('Postalcode').join(toronto_data.set_index('Postalcode'))
all_data.reset_index(inplace=True)
print(all_data.shape)
all_data.head()

(103, 5)


Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Select only Boroughs that contain 'Toronto'.

In [35]:
toronto_boroughs = all_data[all_data['Borough'].str.contains('Toronto')]
print(toronto_boroughs.shape)
toronto_boroughs.head()

(39, 5)


Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


The remainder of this script is going to follow the example with the New York data set.
The latitude and longitude of toronto were just taken from a google search and are hardcoded here (as I don't expect the location of Toronto to change).

In [38]:
latitude = 43.6532
longitude = -79.3832

print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6532, -79.3832.


The code below generates a map of toronto and shows each postal code marked by a dot.  This only includes boroughs and contain 'Toronto'.
Clicking on a dot displays the postal code, borough and neighborhoods.

In [39]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, postcode, borough, neighborhood in zip(toronto_boroughs['Latitude'], toronto_boroughs['Longitude'], toronto_boroughs['Postalcode'], toronto_boroughs['Borough'], toronto_boroughs['Neighborhood']):
    label = '{}: {}: {}'.format(postcode, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Part 3: Explore and Cluster Neighborhoods
The 'Neighborhoods' here are actually the postal codes.  The postal code is used to define a 'Neighborhood' which may in fact contain many neighborhoods as definted by the wikipedia page. I just want to be clear that the definiton of a neighborhood from here forward is not the 'Neighborhood' column on the wikipedia page, but the unique postal code.  Using postal code allows us to get a unique latitude and longitude to look for venues.

First define my Foursquare credentials and other predefined values.

In [43]:
CLIENT_ID = 'XCTNIO5R0HK4SSHFYQDU3231DQ1AGJTGDOYNQ4QTAGCWIDVP' # your Foursquare ID
CLIENT_SECRET = 'TSPNYBOQ23VNN5EXUHGL0E3JNIJQ1COGPZPKAZLXW5A4IH3H' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

LIMIT = 100
radius = 500

Define functions to get venue category and get nearby venues.  These are taken straight from the New York lab.

In [44]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [45]:
toronto_venues = getNearbyVenues(names=toronto_boroughs['Neighborhood'],
                                   latitudes=toronto_boroughs['Latitude'],
                                   longitudes=toronto_boroughs['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

Just visually inspect the venue data to make sure it looks correct.

In [74]:
print(toronto_venues.shape)
toronto_venues.head()

(1718, 7)
['Trail' 'Other Great Outdoors' 'Health Food Store' 'Pub' 'Pizza Place'
 'Neighborhood' 'Greek Restaurant' 'Cosmetics Shop' 'Italian Restaurant'
 'Ice Cream Shop' 'Brewery' 'Yoga Studio' 'Fruit & Vegetable Store'
 'Dessert Shop' 'Bookstore' 'Restaurant' 'Juice Bar' 'Diner' 'Spa'
 'Bubble Tea Shop' 'Grocery Store' 'Furniture / Home Store' 'Coffee Shop'
 'Café' 'Bakery' 'Caribbean Restaurant' 'Frozen Yogurt Shop'
 'American Restaurant' 'Liquor Store' 'Gym' 'Fish & Chips Shop'
 'Burger Joint' 'Park' 'Sushi Restaurant' 'Burrito Place' 'Pet Store'
 'Steakhouse' 'Movie Theater' 'Sandwich Place' 'Fish Market' 'Gay Bar'
 'Cheese Shop' 'Middle Eastern Restaurant' 'Seafood Restaurant'
 'Comfort Food Restaurant' 'Thai Restaurant' 'Stationery Store' 'Wine Bar'
 'Coworking Space' 'Bar' 'Gym / Fitness Center'
 'Latin American Restaurant' 'Gastropub' 'Bank' 'Convenience Store'
 'Clothing Store' 'Thrift / Vintage Store' 'Swim School' 'Bus Line'
 'Food & Drink Shop' 'Breakfast Spot' 'Departme

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
2,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
3,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
4,The Beaches,43.676357,-79.293031,Domino's Pizza,43.679058,-79.297382,Pizza Place


Inspect the number of venues returned for each neighborhood. Rather than printing all of the columns (since they are the same value) I decided to use 'value_counts' on the 'Neighborhood' column to return the number of venues in each neighborhood.

In [71]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,56,56,56,56,56,56
"Brockton, Exhibition Place, Parkdale Village",22,22,22,22,22,22
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",18,18,18,18,18,18
"Cabbagetown, St. James Town",44,44,44,44,44,44
Central Bay Street,83,83,83,83,83,83
"Chinatown, Grange Park, Kensington Market",85,85,85,85,85,85
Christie,18,18,18,18,18,18
Church and Wellesley,83,83,83,83,83,83


In [72]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 234 uniques categories.


### Analyze each neighborhood and prepare the data for K-means.
One thing to be aware of here is that there is already a Venue Category called Neighborhood.  So I will need to rename the column that actually contains the Neighborhood strings to 'TorontoNeighborhood'

In [76]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe, rename to 'TorontoNeighborhood' as there is already a venue category called 'Neighborhood'
toronto_onehot['TorontoNeighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(1718, 235)


Unnamed: 0,TorontoNeighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Look at the average occurance of each category by neighborhood.

In [78]:
toronto_grouped = toronto_onehot.groupby('TorontoNeighborhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped

(39, 235)


Unnamed: 0,TorontoNeighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.055556,0.055556,0.055556,0.111111,0.166667,0.111111,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,...,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.012048
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.047059,0.0,0.058824,0.011765,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,...,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0,0.012048


Function to return the most common X-number of venues for each neighborhood.  Copied from New York lab.

In [79]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a data frame of the 10 most common venues in each neighborhood.  This will be used to draw conclusions later.

In [81]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['TorontoNeighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['TorontoNeighborhood'] = toronto_grouped['TorontoNeighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,TorontoNeighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Bar,Café,Thai Restaurant,Bakery,Gym,Steakhouse,Sushi Restaurant,Burger Joint,Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Steakhouse,Cheese Shop,Café,Farmers Market,Beer Bar,Bakery,Seafood Restaurant,Museum
2,"Brockton, Exhibition Place, Parkdale Village",Café,Bakery,Coffee Shop,Breakfast Spot,Grocery Store,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant,Burrito Place
3,Business Reply Mail Processing Centre 969 Eastern,Pizza Place,Auto Workshop,Comic Shop,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Smoke Shop,Farmers Market
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Bar,Coffee Shop,Rental Car Location,Sculpture Garden,Boutique,Boat or Ferry


### Create clusters
Group the neighborhoods into 5 clusters, as we had done in the New York lab.

In [84]:
num_clusters = 5
toronto_grouped_cluster = toronto_grouped.drop('TorontoNeighborhood', 1)

# Run k-means
kmeans = KMeans(n_clusters = num_clusters, random_state = 0)
kmeans.fit(toronto_grouped_cluster)

# Check the first few labels
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [92]:
# Attach the cluster labels to the oringinal data.
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# Merge the lat/long data with the venue data and cluster results.  Make sure to merge Neighborhood with TorontoNeighborhood
toronto_merged = toronto_boroughs
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('TorontoNeighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Pizza Place,Health Food Store,Pub,Trail,Neighborhood,Other Great Outdoors,Concert Hall,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Bookstore,Furniture / Home Store,Frozen Yogurt Shop,Grocery Store,Brewery,Bubble Tea Shop
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Park,Pizza Place,Ice Cream Shop,Movie Theater,Burger Joint,Sandwich Place,Burrito Place,Fish & Chips Shop,Italian Restaurant,Steakhouse
43,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Brewery,Gastropub,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Convenience Store,Sandwich Place
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,1,Park,Bus Line,Swim School,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [98]:
map_clusters = folium.Map(location=[latitude,longitude], zoom_start=12)

# Set the color scheme for the clusters
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Let's examine the clusters a little more.
My initial reaction looking at the map is that we chose too few clusters.  This is because almost all neighborhoods belong to one cluster, while the others only have 1 or 2 neighborhoods each.  I will look at each cluster a little more in depth to see if they do indeed have similar top 10 venue categories (implying that 5 clusters is correct).

#### Cluster 1

In [99]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,East Toronto,0,Pizza Place,Health Food Store,Pub,Trail,Neighborhood,Other Great Outdoors,Concert Hall,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
41,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Bookstore,Furniture / Home Store,Frozen Yogurt Shop,Grocery Store,Brewery,Bubble Tea Shop
42,East Toronto,0,Park,Pizza Place,Ice Cream Shop,Movie Theater,Burger Joint,Sandwich Place,Burrito Place,Fish & Chips Shop,Italian Restaurant,Steakhouse
43,East Toronto,0,Café,Coffee Shop,Brewery,Gastropub,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Convenience Store,Sandwich Place
45,Central Toronto,0,Gym,Food & Drink Shop,Sandwich Place,Breakfast Spot,Park,Department Store,Dance Studio,Hotel,Doner Restaurant,Donut Shop
46,Central Toronto,0,Clothing Store,Coffee Shop,Yoga Studio,Sporting Goods Shop,Chinese Restaurant,Dessert Shop,Café,Restaurant,Miscellaneous Shop,Diner
47,Central Toronto,0,Pizza Place,Dessert Shop,Sandwich Place,Coffee Shop,Café,Italian Restaurant,Sushi Restaurant,Gym,Gas Station,Park
49,Central Toronto,0,Coffee Shop,Pub,Pizza Place,American Restaurant,Restaurant,Light Rail Station,Fried Chicken Joint,Sports Bar,Supermarket,Sushi Restaurant
51,Downtown Toronto,0,Coffee Shop,Bakery,Café,Italian Restaurant,Pizza Place,Market,Restaurant,Pub,General Entertainment,Snack Place
52,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Gastropub,Fast Food Restaurant,Grocery Store,Gym,Hotel


#### Cluster 2

In [101]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
44,Central Toronto,1,Park,Bus Line,Swim School,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


#### Cluster 3

In [102]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,Downtown Toronto,2,Park,Playground,Trail,Yoga Studio,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
64,Central Toronto,2,Park,Jewelry Store,Trail,Sushi Restaurant,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


#### Cluster 4

In [103]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Central Toronto,3,Home Service,Garden,Yoga Studio,Dim Sum Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


#### Cluster 5

In [105]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,Central Toronto,4,Restaurant,Playground,Trail,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


### Results/Observations
    Looking through the clusters, it seems as if 5 is adequate.  After looking into the cluster and seeing that almost all of them fell into the first cluster I was skeptical.  After looking through all of the neighborhoods in this group, they all have a coffee shop, cafe, bakery or restaurant for most of the top 3 venues. This grouping was definitely the restaurant/coffee shop area and explains why they were all grouped together.
    
    The second group only contains one neighborhood.  This neighborhood seems to be a park and a transit center (based on the popular venues).  The lack of trails has lead this not to be grouped with the third group.
    
    Group 3 contains neighborhoods that have a park and trails. This is similar to group 2, but the trail system distinguishes these two neighborhoods.
    
    Group 4 contains only one neighborhood and seems to be on the outskirts of the other Toronto neighborhoods (based on the map).  
    
    The last group seems to be a mix of restaurants and a playground, but no official park.  This is the reason that it is distinct and not grouped into the restaurant or the parks group.  