# Objective

In this assignment, will be trying to explore, segment, and cluster the neighborhoods in the city of Toronto. 


There are three tasks involved in this assignment

1. For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto.Scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format. https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, 

2. Get the latitude and the longitude coordinates of each neighborhood.

3. Explore and cluster the neighborhoods in Toronto


# Assignment Part I

## Scrape the wikipedia page and prepare a pandas dataframe that contains informtion on neighborhoods in Toronto

In [196]:

#Importing required libraries and packages

import pandas as pd
import numpy as np

import requests
from bs4 import BeautifulSoup

In [197]:

# Using Beautiful soup to scrape data from the given Wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
#table in soup.find_all('table')]  
table = soup.find_all('table')[0] 
table_rows = table.find_all('tr')

res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [d.text.strip() for d in td]
    #print(row)
    if row:
        res.append(row)


df = pd.DataFrame(res, columns=["PostalCode","Borough","Neighborhood"])
print(df.head())
print(df.shape)


  PostalCode           Borough      Neighborhood
0        M1A      Not assigned      Not assigned
1        M2A      Not assigned      Not assigned
2        M3A        North York         Parkwoods
3        M4A        North York  Victoria Village
4        M5A  Downtown Toronto      Harbourfront
(287, 3)


In the below steps, will perform the Data Cleaning on the Dataframe
Following Instructions have been provided in the question
1. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
2. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
3. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park

In [198]:
#Replace string 'Not assigned' with np.NaN for uniformity
df.replace('Not assigned',np.NaN, inplace = True)

#Drop all rows that does not have a Borough detail.
df.dropna(subset=['Borough'],inplace=True)
print(df.head())

  PostalCode           Borough      Neighborhood
2        M3A        North York         Parkwoods
3        M4A        North York  Victoria Village
4        M5A  Downtown Toronto      Harbourfront
5        M6A        North York  Lawrence Heights
6        M6A        North York    Lawrence Manor


In [199]:
#Replace the empty neighborhood name with the corresponsing Borough name if a Borough is present
df['Neighborhood'].replace(to_replace = np.NaN, value = df['Borough'], inplace = True) 
#print(df.iloc[9])
df[df['Borough'] == 'Queen\'s Park']

Unnamed: 0,PostalCode,Borough,Neighborhood
9,M9A,Queen's Park,Queen's Park


In [200]:
#Rolling up all neighborhoods belonging to the same postal code to the same row. 
df = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(','.join).reset_index()
#df.columns = ['PostalCode','Borough','Neighborhood']

#Verifying if this operation has been successful for Postal Code = 'M2L', Borough = 'North York', two neighborhoods 'SilverHills and 'York Mills
df[df['PostalCode'] == 'M2L']
#df.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
20,M2L,North York,"Silver Hills,York Mills"


In [201]:
df.shape

(103, 3)

In [202]:
df.to_csv(r'df_can.csv')

# Assignment Part II 

## Append the dataframe with latitude and longitude information


In [203]:
loc_df = pd.read_csv("http://cocl.us/Geospatial_data",header = None, names = ['PostalCode', 'Latitude', 'Longitude'], skiprows =1)
loc_df.head()


Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [204]:
loc_df.shape

(103, 3)

In [205]:
df_new = pd.merge(df, loc_df, on='PostalCode')
df_new.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


# Assignment Part III

## Analysis and Clustering the Toronto Neighborhoods

In [206]:
import json # library to handle JSON files


from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # map rendering library

In [207]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of city of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of city of Toronto are 43.653963, -79.387207.


In [208]:
# First let's create a map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_new['Latitude'], df_new['Longitude'], df_new['Borough'], df_new['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [209]:
#Define Foursquare credentials
# filter columns
CLIENT_ID = '0ZTQODSXDWEQYWADGXYIDEN3I3XVJZYEZLUDIEMHMOXU0GRH' # your Foursquare ID
CLIENT_SECRET = 'DUMZXRMGJIIKYSOF4GXT4HKXH5V0Z3JH3BBV113Y35ELLXWZ' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0ZTQODSXDWEQYWADGXYIDEN3I3XVJZYEZLUDIEMHMOXU0GRH
CLIENT_SECRET:DUMZXRMGJIIKYSOF4GXT4HKXH5V0Z3JH3BBV113Y35ELLXWZ


In [210]:

#Define a function that obtain nearby venues for a neighborhood 
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [211]:
toronto_venues = getNearbyVenues(names=df_new['Neighborhood'],
                                   latitudes=df_new['Latitude'],
                                   longitudes=df_new['Longitude']
                                  )


In [212]:
print(toronto_venues.shape)
toronto_venues.head()

(1335, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge,Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge,Malvern",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
3,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target


In [213]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",30,30,30,30,30,30
Agincourt,5,5,5,5,5,5
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",2,2,2,2,2,2
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",10,10,10,10,10,10
"Alderwood,Long Branch",9,9,9,9,9,9
...,...,...,...,...,...,...
Willowdale West,6,6,6,6,6,6
Woburn,3,3,3,3,3,3
"Woodbine Gardens,Parkview Hill",13,13,13,13,13,13
Woodbine Heights,9,9,9,9,9,9


In [214]:
#No of unique categories
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 236 uniques categories.


#### Analyse each neighborhood for the type of venues

In [215]:

# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [216]:
toronto_onehot.shape

(1335, 236)

In [217]:
#Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,...,0.0,0.0,0.033333,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
4,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Willowdale West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
97,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
98,"Woodbine Gardens,Parkview Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
99,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.111111,0.0,0.0,0.0,0.0,0.0


In [218]:
#Let's confirm the new size
toronto_grouped.shape

(101, 236)

In [219]:
#Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                venue  freq
0          Steakhouse  0.10
1                Café  0.10
2  Seafood Restaurant  0.07
3               Hotel  0.07
4    Asian Restaurant  0.07


----Agincourt----
                       venue  freq
0                     Lounge   0.2
1             Breakfast Spot   0.2
2  Latin American Restaurant   0.2
3               Skating Rink   0.2
4             Clothing Store   0.2


----Agincourt North,L'Amoreaux East,Milliken,Steeles East----
                        venue  freq
0                        Park   0.5
1                  Playground   0.5
2                 Yoga Studio   0.0
3  Modern European Restaurant   0.0
4                      Market   0.0


----Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown----
                 venue  freq
0        Grocery Store   0.2
1             Pharmacy   0.1
2  Fried Chicken Joint   0.1
3          Pizza Place   0.1
4          Coffee Shop   0.1


--

                venue  freq
0         Pizza Place  0.14
1      Breakfast Spot  0.14
2      Medical Center  0.14
3   Electronics Store  0.14
4  Mexican Restaurant  0.14


----Harbord,University of Toronto----
            venue  freq
0            Café  0.13
1             Bar  0.07
2  Sandwich Place  0.07
3       Bookstore  0.07
4      Restaurant  0.07


----Harbourfront----
            venue  freq
0     Coffee Shop  0.20
1            Park  0.10
2          Bakery  0.10
3             Pub  0.07
4  Breakfast Spot  0.07


----Harbourfront East,Toronto Islands,Union Station----
         venue  freq
0         Park  0.07
1         Café  0.07
2        Plaza  0.07
3        Hotel  0.07
4  Salad Place  0.03


----High Park,The Junction South----
                venue  freq
0     Thai Restaurant  0.08
1                Café  0.08
2  Mexican Restaurant  0.08
3                 Bar  0.08
4                Park  0.04


----Highland Creek,Rouge Hill,Port Union----
                        venue  freq
0  Cons

            venue  freq
0     Pizza Place  0.17
1         Butcher  0.17
2     Coffee Shop  0.17
3  Discount Store  0.17
4        Pharmacy  0.17


----Woburn----
                        venue  freq
0                 Coffee Shop  0.67
1           Korean Restaurant  0.33
2                 Yoga Studio  0.00
3  Modern European Restaurant  0.00
4                      Market  0.00


----Woodbine Gardens,Parkview Hill----
                  venue  freq
0  Fast Food Restaurant  0.15
1           Pizza Place  0.15
2    Athletics & Sports  0.08
3              Bus Line  0.08
4                  Café  0.08


----Woodbine Heights----
              venue  freq
0          Pharmacy  0.11
1  Asian Restaurant  0.11
2    Cosmetics Shop  0.11
3       Curling Ice  0.11
4      Dance Studio  0.11


----York Mills West----
                        venue  freq
0                        Park  0.50
1           Convenience Store  0.25
2                        Bank  0.25
3                 Yoga Studio  0.00
4  Modern Eur

In [220]:
#function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [221]:

#Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Steakhouse,Café,Hotel,Asian Restaurant,Seafood Restaurant,Opera House,Pizza Place,Speakeasy,Smoke Shop,Plaza
1,Agincourt,Skating Rink,Breakfast Spot,Lounge,Clothing Store,Latin American Restaurant,Women's Store,Diner,Ethiopian Restaurant,Empanada Restaurant,Electronics Store
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Park,Playground,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Grocery Store,Fried Chicken Joint,Liquor Store,Coffee Shop,Pharmacy,Pizza Place,Fast Food Restaurant,Sandwich Place,Beer Store,Drugstore
4,"Alderwood,Long Branch",Pizza Place,Gym,Skating Rink,Pharmacy,Pool,Pub,Sandwich Place,Coffee Shop,Garden Center,Gay Bar


#### Cluster Neighborhoods

In [222]:
#Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

#toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])

In [223]:
#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_new

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,0.0,Fast Food Restaurant,Print Shop,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,0.0,Construction & Landscaping,Moving Target,Bar,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,0.0,Rental Car Location,Breakfast Spot,Electronics Store,Pizza Place,Intersection,Medical Center,Mexican Restaurant,Women's Store,Empanada Restaurant,Eastern European Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Korean Restaurant,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0.0,Gas Station,Bank,Fried Chicken Joint,Caribbean Restaurant,Athletics & Sports,Thai Restaurant,Bakery,Hakka Restaurant,Eastern European Restaurant,Dumpling Restaurant


In [224]:
print(toronto_merged['Cluster Labels'].unique().tolist())
print(toronto_merged['Cluster Labels'].isnull().sum(axis = 0))
print(toronto_merged.shape)
toronto_merged.dropna(subset=['Cluster Labels'],inplace=True)
print(toronto_merged.shape)

[0.0, 1.0, nan, 2.0, 4.0, 3.0]
1
(103, 16)
(102, 16)


In [225]:
#Visualise Clusters

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [226]:
#Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[ [2] +list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Rouge,Malvern",0.0,Fast Food Restaurant,Print Shop,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
1,"Highland Creek,Rouge Hill,Port Union",0.0,Construction & Landscaping,Moving Target,Bar,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store
2,"Guildwood,Morningside,West Hill",0.0,Rental Car Location,Breakfast Spot,Electronics Store,Pizza Place,Intersection,Medical Center,Mexican Restaurant,Women's Store,Empanada Restaurant,Eastern European Restaurant
3,Woburn,0.0,Coffee Shop,Korean Restaurant,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
4,Cedarbrae,0.0,Gas Station,Bank,Fried Chicken Joint,Caribbean Restaurant,Athletics & Sports,Thai Restaurant,Bakery,Hakka Restaurant,Eastern European Restaurant,Dumpling Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Humber Summit,0.0,Empanada Restaurant,Pizza Place,Shopping Mall,Women's Store,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
99,Westmount,0.0,Chinese Restaurant,Coffee Shop,Pizza Place,Intersection,Middle Eastern Restaurant,Sandwich Place,Women's Store,Diner,Discount Store,Dog Run
100,"Kingsview Village,Martin Grove Gardens,Richvie...",0.0,Pizza Place,Sandwich Place,Bus Line,Mobile Phone Shop,Women's Store,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
101,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,Grocery Store,Fried Chicken Joint,Liquor Store,Coffee Shop,Pharmacy,Pizza Place,Fast Food Restaurant,Sandwich Place,Beer Store,Drugstore


##### Cluster 1 is the largest cluster with 83 neighborhoods. These neighborhoods have many options for Food and beverages. Restaurants, Coffee shops, Pizza and sandwich restaurants and ethnic food is available. These are mostly retails and business centres of Toronto

In [227]:
#Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough Village,1.0,Playground,Women's Store,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
14,"Agincourt North,L'Amoreaux East,Milliken,Steel...",1.0,Park,Playground,Dessert Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore


##### Cluster 2 has couple of neighborhoods. These seem to be reasonable residential neighborhoods and have all facilities needed for families like parks, playgrounds, drugstore, event space Electronics store etc.

In [228]:
#Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,"Newtonbrook,Willowdale",2.0,Gym,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore


##### Cluster 3 has two neighborhoods and has the largest concentration of Gyms which makes it stand out from other neighborhoods

In [229]:
#Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Downsview Central,3.0,Food Truck,Baseball Field,Home Service,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store
91,"Humber Bay,King's Mill Park,Kingsway Park Sout...",3.0,Baseball Field,Home Service,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
97,"Emery,Humberlea",3.0,Construction & Landscaping,Baseball Field,Women's Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant


##### Cluster 4 looks like a shopping neighborhood offering services, outlets, event spaces and restaurants

In [230]:
#Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,York Mills West,4.0,Park,Convenience Store,Bank,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
25,Parkwoods,4.0,Park,Food & Drink Shop,Farmers Market,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
30,"CFB Toronto,Downsview East",4.0,Park,Snack Place,Airport,Construction & Landscaping,Dumpling Restaurant,Discount Store,Dog Run,Donut Shop,Drugstore,Electronics Store
31,Downsview West,4.0,Grocery Store,Bank,Shopping Mall,Park,Gluten-free Restaurant,Event Space,Empanada Restaurant,Gourmet Shop,Electronics Store,Eastern European Restaurant
40,East Toronto,4.0,Park,Coffee Shop,Convenience Store,Dim Sum Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
44,Lawrence Park,4.0,Park,Bus Line,Swim School,Lake,Dim Sum Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
48,"Moore Park,Summerhill East",4.0,Park,Playground,Tennis Court,Restaurant,Department Store,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
50,Rosedale,4.0,Park,Playground,Trail,Department Store,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
74,Caledonia-Fairbanks,4.0,Park,Fast Food Restaurant,Market,Women's Store,Gift Shop,Department Store,Golf Course,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
79,"Downsview,North Park,Upwood Park",4.0,Park,Construction & Landscaping,Basketball Court,Bakery,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store


##### Cluster 5 also lists neighborhoods that have numerous venue options from restaurants to recreation. Hence could be ideal for residences

# Conclusion

The entire excercise above analyses the various neighborhoods in toronto in terms of the top venues and facilities that are availble in each of them. This can help a toronto resident decide which neighborhoods are more attractive than others. This analysis could also give an insight as to what are the most expensive and sought after neighborhoods. 