# Segmenting and Clustering Neighborhoods in Toronto

In this assignment, I will explore, segment, and cluster the neighborhoods in the city of Toronto.
For the Toronto neighborhood data, a [Wikipedia page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) exists that has all the information we need to explore and cluster the neighborhoods in Toronto. I will scrape the Wikipedia page using [beautiful soup](http://beautiful-soup-4.readthedocs.io/en/latest/) and wrangle the data, clean it, and then read it into a pandas dataframe.
After this I cluster the data and plot the clusters in a Map using Folium library.

# Step 1 : Data Collection
### This will be done in two steps:
- Scrape data from website and store in dataframe.
- Get latitute and longitude of place based on post code.

## Step 1.a
In this step beautiful soup and requests library will be used to scrape data from the website, which will be formatted and stored in a data frame. The data frame will have three columns Postcode, Borough and Neighbourhood.

In [94]:
# import requests to get the wesite and beautifulSoup to scrape data
import requests
from bs4 import BeautifulSoup

In [95]:
# store the url of the data website
tor_data_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
# get the wesite in html format
source_data = requests.get(tor_data_url).text
# read the data
soup = BeautifulSoup(source_data,'lxml')

In [96]:
# all the table rows start with the tag tr and extracted in a list
table_row = soup.find_all('tr')

# the first element in the table row is the table header
table_header = table_row[0].text
table_header = table_header.split('\n')[1:-1]
print(table_header)

# rest of the elements excluding the last five elements are row data of the table
table_body = table_row[1:-5]
# create a list of rows, all the rows will be appended to this list
rows = []
for row in table_body:
    row = row.text.split('\n')[1:-1]
    rows.append(row)
print(rows[0])

['Postcode', 'Borough', 'Neighbourhood']
['M1A', 'Not assigned', 'Not assigned']


In [97]:
# import pandas and numpy to store the  data in a dataframe and format it for further data collection.
import pandas as pd
import numpy as np

### Following conditions will be used to create the dataframe:
- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In [98]:
# creating data frame with three columns: Postcode, Borough, Neighbourhood
df = pd.DataFrame(data = rows, columns = table_header)

# replce 'Not assigned' values with np.nan
df.replace({'Not assigned':np.nan}, inplace=True)

# Dropping all the rows which have null values in Borough
df.dropna(subset=['Borough'], inplace=True)

# if neighbor hood value is null it will aquire borough value, using np.where to use it
n_is_null = df['Neighbourhood'].isnull()
df['Neighbourhood'] = np.where(n_is_null,df['Borough'],df['Neighbourhood'])

print(df.describe())
df.head()

       Postcode    Borough Neighbourhood
count       211        211           211
unique      103         11           209
top         M9V  Etobicoke     Runnymede
freq          8         45             2


Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


From the above result it is now clear that there are no missing values.

In [99]:
# the data frame will now be groupped by postcode and borough and the Neighbourhood will be concatenated by a comma
df = df.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


From above result it is clear that the groupping function was done successfully.

## Step 1.b
In this step two new columns will be added to the data frame which will be the latitude and longitude. Google Maps Geocoding API will be used to get the latitude and the longitude coordinates of each neighborhood.

In [100]:
# use geocoder library, if not present use !conda install -c conda-forge geocoder
import geocoder
# Google API key is required for the geocoder library to work, save the API key in OS environment variables as GOOGLE_API_KEY
# and then access thay key here
import os
# Use BING_API_KEY when choosing to use bing geocoding instead of google geocoding.
BING_API_KEY = 'API_KEY' # os.environ['BING_API_KEY']

In [101]:
# This function will take an adress and return the latlng of that adress
def get_latlng(address):
    # using bing geocoder API since it is better.
    g = geocoder.bing(address, key = BING_API_KEY)
    return pd.Series(g.latlng)

In [102]:
# using the get_latlng function to define latitude and longitude columns of the data frame
df[['Latitude','Longitude']] = df.apply(lambda x: get_latlng(x.Postcode + x.Borough + x.Neighbourhood), axis=1)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.79749,-79.23609
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.778969,-79.131088
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",33.74557,-84.758945
3,M1G,Scarborough,Woburn,43.76725,-79.21761
4,M1H,Scarborough,Cedarbrae,43.74807,-79.23531


In [103]:
# cleaning the dataframe
df.dropna(inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 102 entries, 0 to 102
Data columns (total 5 columns):
Postcode         102 non-null object
Borough          102 non-null object
Neighbourhood    102 non-null object
Latitude         102 non-null float64
Longitude        102 non-null float64
dtypes: float64(2), object(3)
memory usage: 4.8+ KB


In [145]:
df.shape # ( dropped the columns with nan data , didnt want to use while loop to get data as it takes a lot of time)

(102, 5)

# Step 2 : Data preparation
Use four square API to get popular venues for each neighbourhood.

In [104]:
# import the sklearn
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium --yes # uncomment this line if folium is missing
import folium

In [105]:
# Extracing the toronto data
toronto_data = df[(df['Borough'] == 'East Toronto') | (df['Borough'] == 'Downtown Toronto') | (df['Borough'] == 'West Toronto')].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.674129,-79.29644
1,M4K,East Toronto,"The Danforth West, Riverdale",38.96241,-76.942863
2,M4L,East Toronto,"The Beaches West, India Bazaar",22.493118,79.727013
3,M4M,East Toronto,Studio District,43.628719,-79.412827
4,M4W,Downtown Toronto,Rosedale,43.681911,-79.379372


In [106]:
# initialize latitude and longitude of toronto
latitude = 43.6532
longitude = -79.3832

In [107]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# data to be used for map
data = toronto_data.copy()

# add markers to map
for lat, lng, borough, neighborhood in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Foursquare API will be used to get the venue data for each neighbourhood

In [108]:
# @hidden
# initializing foursquare API credentials
CLIENT_ID = 'insert-client-id' # your Foursquare ID
CLIENT_SECRET = 'indert-client-secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

In [109]:
# intialize a function to get all the venue in the neighbourhood
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [110]:
# use above function to get venues nearby toronto
# function takes a lot of time
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea
Business Reply Mail Processing Centre 969 Eastern


In [111]:
# examine the dataframe
toronto_venues.shape

(1546, 7)

In [112]:
# see number of venues per neighbourhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",55,55,55,55,55,55
Berczy Park,100,100,100,100,100,100
"Brockton, Exhibition Place, Parkdale Village",30,30,30,30,30,30
Business Reply Mail Processing Centre 969 Eastern,12,12,12,12,12,12
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",81,81,81,81,81,81
"Cabbagetown, St. James Town",28,28,28,28,28,28
Central Bay Street,100,100,100,100,100,100
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,4,4,4,4,4,4
Church and Wellesley,82,82,82,82,82,82


### Analyzing each neighbourhood

In [113]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [114]:
# after creating dummy columns check shape
toronto_onehot.shape

(1546, 222)

In [115]:
# group the data per neighborhood
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.054545,0.0,0.0,0.018182,0.0,0.0,0.018182,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.012346,0.0,0.024691,0.0,0.037037,0.0,0.0,0.0,0.0,...,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.012346,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,...,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.01,0.0,0.0,0.07,0.0,0.0,0.03,0.01,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.0,0.012195,0.012195,0.0,0.0,0.0,0.0,0.012195,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.012195,0.0


In [116]:
# check the grouped data shape
toronto_grouped.shape

(28, 222)

In [117]:
# print top 5 most common venues
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
                 venue  freq
0       Sandwich Place  0.09
1                Hotel  0.09
2       History Museum  0.07
3  American Restaurant  0.05
4                 Café  0.05


----Berczy Park----
                venue  freq
0         Coffee Shop  0.10
1                Café  0.06
2          Restaurant  0.05
3               Hotel  0.04
4  Italian Restaurant  0.04


----Brockton, Exhibition Place, Parkdale Village----
                     venue  freq
0   Furniture / Home Store  0.10
1              Coffee Shop  0.10
2                     Café  0.07
3  Comfort Food Restaurant  0.07
4                     Park  0.07


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0                Park  0.17
1         Music Venue  0.17
2  Athletics & Sports  0.08
3               Hotel  0.08
4       Movie Theater  0.08


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
     

In [118]:
# First, let's write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [119]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Sandwich Place,Hotel,History Museum,American Restaurant,Café,Lounge,Fast Food Restaurant,Coffee Shop,Food Truck,Ramen Restaurant
1,Berczy Park,Coffee Shop,Café,Restaurant,Bakery,Italian Restaurant,Hotel,Gastropub,Japanese Restaurant,Gym,Beer Bar
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Furniture / Home Store,Comfort Food Restaurant,Park,Café,Grocery Store,Poke Place,Pub,Nail Salon,New American Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Music Venue,Park,Bar,Athletics & Sports,Lounge,Movie Theater,Theme Park,Fast Food Restaurant,Harbor / Marina,Hotel
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Coffee Shop,Hotel,Italian Restaurant,Pizza Place,Gym,Baseball Stadium,Aquarium,Scenic Lookout,Brewery,Café


# Step 3 : Modelling
KMeans clustering will be used to cluster toronto neighborhoods on the basis of venues.

In [135]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 0, 1, 1, 1, 1, 4, 1])

In [137]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# it is 'Neighbourhood' int the toronto data hence the column name is changed to 'Neighborhood' for the join
toronto_data = toronto_data.rename(columns = {'Neighbourhood':'Neighborhood'})

toronto_data.head()

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# drop the column with nan values after join
toronto_merged.dropna(inplace=True)

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.674129,-79.29644,1.0,Bar,Japanese Restaurant,Thai Restaurant,Sandwich Place,Breakfast Spot,Café,Bookstore,Chocolate Shop,Juice Bar,Supermarket
1,M4K,East Toronto,"The Danforth West, Riverdale",38.96241,-76.942863,1.0,Mexican Restaurant,Convenience Store,Asian Restaurant,Pizza Place,Ethiopian Restaurant,Locksmith,Drugstore,Latin American Restaurant,Bank,Intersection
3,M4M,East Toronto,Studio District,43.628719,-79.412827,0.0,Music Venue,Park,Bar,Athletics & Sports,Lounge,Movie Theater,Theme Park,Fast Food Restaurant,Harbor / Marina,Hotel
4,M4W,Downtown Toronto,Rosedale,43.681911,-79.379372,2.0,Park,Playground,Gym / Fitness Center,Women's Store,Doner Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
5,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.670811,-79.373482,1.0,Coffee Shop,Pizza Place,Grocery Store,Convenience Store,Bike Rental / Bike Share,Diner,Sandwich Place,Caribbean Restaurant,Bus Stop,Filipino Restaurant


In [127]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [138]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine cluster

In [139]:
# print the first cluster
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,East Toronto,0.0,Music Venue,Park,Bar,Athletics & Sports,Lounge,Movie Theater,Theme Park,Fast Food Restaurant,Harbor / Marina,Hotel
28,East Toronto,0.0,Music Venue,Park,Bar,Athletics & Sports,Lounge,Movie Theater,Theme Park,Fast Food Restaurant,Harbor / Marina,Hotel


In [140]:
# secoond cluster
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,1.0,Bar,Japanese Restaurant,Thai Restaurant,Sandwich Place,Breakfast Spot,Café,Bookstore,Chocolate Shop,Juice Bar,Supermarket
1,East Toronto,1.0,Mexican Restaurant,Convenience Store,Asian Restaurant,Pizza Place,Ethiopian Restaurant,Locksmith,Drugstore,Latin American Restaurant,Bank,Intersection
5,Downtown Toronto,1.0,Coffee Shop,Pizza Place,Grocery Store,Convenience Store,Bike Rental / Bike Share,Diner,Sandwich Place,Caribbean Restaurant,Bus Stop,Filipino Restaurant
6,Downtown Toronto,1.0,Coffee Shop,Japanese Restaurant,Dance Studio,Gay Bar,Sushi Restaurant,Gym,Burger Joint,Ramen Restaurant,Fast Food Restaurant,Restaurant
7,Downtown Toronto,1.0,Coffee Shop,Hotel,Café,Steakhouse,Bar,American Restaurant,Restaurant,Pub,Italian Restaurant,Japanese Restaurant
8,Downtown Toronto,1.0,Coffee Shop,Hotel,Café,Steakhouse,Bar,American Restaurant,Restaurant,Pub,Italian Restaurant,Japanese Restaurant
9,Downtown Toronto,1.0,Brewery,Women's Store,Bowling Alley,Clothing Store,Food Truck,Food & Drink Shop,Southern / Soul Food Restaurant,Flower Shop,Chinese Restaurant,Cheese Shop
10,Downtown Toronto,1.0,Coffee Shop,Café,Restaurant,Bakery,Italian Restaurant,Hotel,Gastropub,Japanese Restaurant,Gym,Beer Bar
11,Downtown Toronto,1.0,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Tea Room,Middle Eastern Restaurant,Fast Food Restaurant,Sandwich Place,Spa,Restaurant
12,Downtown Toronto,1.0,Sandwich Place,Hotel,History Museum,American Restaurant,Café,Lounge,Fast Food Restaurant,Coffee Shop,Food Truck,Ramen Restaurant


In [141]:
# third cluster
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,2.0,Park,Playground,Gym / Fitness Center,Women's Store,Doner Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [142]:
# Fourth cluster
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,3.0,Pier,Harbor / Marina,Boat or Ferry,Café,Women's Store,Drugstore,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [143]:
# Fifth cluster
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Downtown Toronto,4.0,Farmers Market,Event Space,Italian Restaurant,Park,Women's Store,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant


### From the most common venues the neighbour hood were successfully clustered and listed.
#### Observations
Coffe shops seems to be the most common venue in most neighbour hood, which all fall under one cluster.
There is only one Pier which is a common venue in downtown toronto, and only Farmers Market is a common venue in downtown Toronto
Downtown toronto have the most venues, and east toronto has the least venues.