<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

 #### All the 3 tasks of web scraping, cleaning and clustering are implemented in the same notebook for the ease of evaluation.

 - Installing and Importing the required Libraries

In [1]:
!pip install beautifulsoup4
!pip install lxml
from bs4 import BeautifulSoup

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# !conda install -c conda-forge folium=0.5.0 --yes
# import folium # plotting library
!pip install folium==0.5.0
print('Folium installed')

from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Libraries imported.')

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 9.1 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=4dfaa137497b0b24cad1397b7095e88b48da3f9bff669af7d143f13c7149929b
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0
Folium installed
Libraries imported.


 #### Part 1: Scrape the List of postal codes of Canada

In [2]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup=BeautifulSoup(source,'lxml')

#taking out the table content and storing it in local variable
table = soup.table
#create a dataframe out of table
df = pd.read_html(str(table))[0]

#display_html(str(table),raw=True)

 - The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood

In [3]:
print(df.columns)
print('The dataframe has {} boroughs and {} Neighbourhood'.format(len(df['Borough'].unique()),df.shape[0]))
df


Index(['Postal Code', 'Borough', 'Neighbourhood'], dtype='object')
The dataframe has 11 boroughs and 180 Neighbourhood


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


 - Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [4]:
df = df[df.Borough != 'Not assigned'].reset_index(drop=True)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


 - More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma 

In [5]:
# are you sure, M5A is listed twise?
df2 = df[df['Postal Code'] == 'M5A']
len(df2)

1

In [6]:
# OK, as you wish :) 
df.groupby(['Postal Code','Borough'])['Neighbourhood'].apply(', '.join).reset_index(drop=True)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [7]:
# use the .shape method to print the number of rows of my dataframe.
df.shape

(103, 3)

#### Part 2 -  latitude and longitude data

In [8]:
# Importing the csv file conatining the latitudes and longitudes
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
print ("Geospatial dataframe's shape: ", lat_lon.shape)
lat_lon.head()

Geospatial dataframe's shape:  (103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
# Combine the two dataframes such that the latitude and logitude values from 'lat_lon' are assigned to their corresponding rows in original dataframe 'df'.
df1 = pd.merge(df, lat_lon, on="Postal Code")
df1.rename(columns={'Postal Code':'Postalcode'},inplace=True)
df1.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [10]:
df1.shape

(103, 5)

 ### Explore and cluster the neighborhoods in Toronto. 

 - pick neighborhoods in boroughs that contain the word "Toronto" in them.

In [11]:
# Create dataframe with boroughs containing 'Toronto'
toronto_data = df1[df1.Borough.str.contains('Toronto')]
toronto_data.reset_index(inplace=True, drop=True)
print ("Toronto dataframe shape: ", toronto_data.shape)
toronto_data

Toronto dataframe shape:  (39, 5)


Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


#### Getting geographical coordinates of Toronto

In [12]:
from geopy.geocoders import Nominatim
print('Geopy installed and Nominatim imported')

Geopy installed and Nominatim imported


In [13]:
city = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Creating a map of Toronto city

In [14]:
! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [15]:
#create map of Toronto
Toronto_map=folium.Map([latitude,longitude],zoom_start=10)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Toronto_map)  
    
Toronto_map

In [16]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: 4VC1LEMHVCTAWRVSKKMVVQV5Y0YBMOZOWYEQYHDFS213NKR5
CLIENT_SECRET:EROQFIYGGQH12JC4SWIJ4AOW2I25J51K4HG03MA0YDCQXJBF


#### We create a function to extract dataframe of top 100 venues within 500m radius of each of the postal codes in Toronto

In [17]:
limit=100
def getNearbyVenues(codes,names,latitudes,longitudes,radius=500):
    venues_list=[]
    for code,name,lat,lng in zip(codes,names,latitudes,longitudes):
        url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,limit)
        results=requests.get(url).json()['response']['groups'][0]['items']
        venues_list.append([(code,name,lat,lng,v['venue']['name'],v['venue']['location']['lat'],
                          v['venue']['location']['lng'],
                          v['venue']['categories'][0]['name'])for v in results])
    nearby_venues=pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns=['PostalCode','Neighbourhood','Neighbourhood Latitude',
                          'Neighbourhood Longitude','Venue name','Venue Latitude','Venue Longitude','Venue Category']
    return(nearby_venues)

In [18]:
Toronto_Venues=getNearbyVenues(codes=toronto_data['Postalcode'],
                               names=toronto_data['Neighbourhood'],
                               latitudes=toronto_data['Latitude'],
                               longitudes=toronto_data['Longitude'],radius=500)
Toronto_Venues.head() 

Unnamed: 0,PostalCode,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue name,Venue Latitude,Venue Longitude,Venue Category
0,M5A,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,M5A,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,M5A,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,M5A,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,M5A,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


 - check the shape of Toronto_Venues

In [19]:
Toronto_Venues.shape

(1619, 8)

 - check the number of unique venue categories 

In [20]:
print('There are {} unique venue categories'.format(len(Toronto_Venues['Venue Category'].unique())))

There are 235 unique venue categories


### Analyze Each Borough Neighborhood

In [21]:
Toronto_onehot=pd.get_dummies(Toronto_Venues['Venue Category'],prefix='',prefix_sep='') #one hot encoding

#Add Neighborhood and PostalCode column back to Toronto_onehot dataframe
Toronto_onehot['PostalCode']=Toronto_Venues['PostalCode']
columns=[Toronto_onehot.columns[-1]]+list(Toronto_onehot.columns[:-1])
Toronto_onehot=Toronto_onehot[columns]

X=Toronto_Venues['Neighbourhood']
Toronto_onehot[-1]=X
Toronto_onehot=Toronto_onehot.rename({-1:'Neighbourhood'},axis=1)

Toronto_onehot.head()


Unnamed: 0,PostalCode,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio,Neighbourhood
0,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
1,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
2,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
3,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"
4,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront"


In [22]:
col=[Toronto_onehot.columns[0]]+[Toronto_onehot.columns[-1]]+list(Toronto_onehot.columns[1:-1])
Toronto_onehot=Toronto_onehot[col]
Toronto_onehot.head()

Unnamed: 0,PostalCode,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M5A,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


 #### Group rows by Postalcode Neighbourhood and by taking the mean of the frequency of occurrence of each category

In [23]:
Toronto_grouped=Toronto_onehot.groupby(['PostalCode','Neighbourhood'],sort=False).mean().reset_index()
Toronto_grouped.head()

Unnamed: 0,PostalCode,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,M5A,"Regent Park, Harbourfront",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833
1,M7A,"Queen's Park, Ontario Provincial Government",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125
2,M5B,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0
3,M5C,St. James Town,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0125,0.0
4,M4E,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0


#### Print each neighborhood along with the top 5 most common venues for first 10 postal codes

In [24]:
num=5
for code,nbd in zip(Toronto_grouped['PostalCode'][0:10],Toronto_grouped['Neighbourhood'][0:10]):
    print('----'+code+': '+nbd+'----')
    temp=Toronto_grouped[Toronto_grouped['PostalCode']==code].T.reset_index()
    temp=temp.iloc[2:]
    temp.columns=['venue category','freq']
    temp['freq']=temp['freq'].astype(float)
    temp.sort_values(by=['freq'],ascending=False,inplace=True)
    temp=temp.round({'freq':2})
    print(temp.reset_index(drop=True).head(num))

----M5A: Regent Park, Harbourfront----
  venue category  freq
0    Coffee Shop  0.17
1           Café  0.06
2            Pub  0.06
3           Park  0.06
4         Bakery  0.06
----M7A: Queen's Park, Ontario Provincial Government----
          venue category  freq
0            Coffee Shop  0.22
1       Sushi Restaurant  0.06
2            Yoga Studio  0.03
3  Portuguese Restaurant  0.03
4          Smoothie Shop  0.03
----M5B: Garden District, Ryerson----
    venue category  freq
0   Clothing Store  0.10
1      Coffee Shop  0.10
2             Café  0.04
3   Cosmetics Shop  0.03
4  Bubble Tea Shop  0.03
----M5C: St. James Town----
        venue category  freq
0          Coffee Shop  0.08
1                 Café  0.05
2         Cocktail Bar  0.04
3       Cosmetics Shop  0.04
4  American Restaurant  0.04
----M4E: The Beaches----
      venue category  freq
0  Health Food Store   0.2
1       Neighborhood   0.2
2   Asian Restaurant   0.2
3              Trail   0.2
4                Pub   0.2
---

####  Create a dataframe to contain top 10 venues of each postal code

 - define a function to sort values of the venue in descending order

In [25]:
def most_common_venues(row,num_of_values):
    row_sorted=row.iloc[2:].sort_values(ascending=False)
    return row_sorted.index.values[0:num_of_values]

 - now create a dataframe to contain top 10 venues of each postal code

In [26]:
# Create the name of columns
num_of_values=10
indicators=['st','nd','rd']
columns=['PostalCode','Neighbourhood']
for ind in np.arange(num_of_values):
    try:
        columns.append('{}{} Most common venue'.format(ind+1,indicators[ind]))
    except:
        columns.append('{}th Most common venue'.format(ind+1))

 - Creating dataframe

In [27]:
Toronto_top=pd.DataFrame(columns=columns)

Toronto_top['PostalCode']=Toronto_grouped['PostalCode']
Toronto_top['Neighbourhood']=Toronto_grouped['Neighbourhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    Toronto_top.iloc[ind][2:]=most_common_venues(Toronto_grouped.iloc[ind],num_of_values)
    
Toronto_top.head()

Unnamed: 0,PostalCode,Neighbourhood,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
0,M5A,"Regent Park, Harbourfront",Coffee Shop,Pub,Bakery,Park,Café,Breakfast Spot,Theater,Yoga Studio,French Restaurant,Health Food Store
1,M7A,"Queen's Park, Ontario Provincial Government",Coffee Shop,Sushi Restaurant,Yoga Studio,Bank,Smoothie Shop,Beer Bar,Sandwich Place,Burrito Place,Café,Portuguese Restaurant
2,M5B,"Garden District, Ryerson",Coffee Shop,Clothing Store,Café,Japanese Restaurant,Middle Eastern Restaurant,Bubble Tea Shop,Cosmetics Shop,Lingerie Store,Ramen Restaurant,Bookstore
3,M5C,St. James Town,Coffee Shop,Café,Cocktail Bar,Clothing Store,American Restaurant,Cosmetics Shop,Lingerie Store,Department Store,Creperie,Bakery
4,M4E,The Beaches,Asian Restaurant,Pub,Health Food Store,Trail,Neighborhood,Dog Run,Dessert Shop,Diner,Discount Store,Distribution Center


### Cluster Neighborhoods

 - K means clustering to cluster the similar postal codes/Neighbourhood

In [28]:
#K-Means does not consider categorical variables, we drop the PostalCode and Neighborhoods columns from the Toronto_grouped dataframe
Toronto_grouped.drop(['PostalCode','Neighbourhood'],axis=1,inplace=True)
Toronto_grouped.head()

Unnamed: 0,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0125,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
# run k-means to cluster the neighborhood into 5 clusters
# set number of clusters
kclusters = 5

toronto_grouped_clustering = Toronto_grouped
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 1, 2, 3,
       2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 4, 2, 2, 2, 2, 2], dtype=int32)

In [30]:
# Now, we add the cluster labels column to the Toronto_top dataframe 
Toronto_top.insert(0,'Cluster labels',kmeans.labels_)

In [31]:
toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(Toronto_top.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,PostalCode,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,M5A,Coffee Shop,Pub,Bakery,Park,Café,Breakfast Spot,Theater,Yoga Studio,French Restaurant,Health Food Store
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,M7A,Coffee Shop,Sushi Restaurant,Yoga Studio,Bank,Smoothie Shop,Beer Bar,Sandwich Place,Burrito Place,Café,Portuguese Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,M5B,Coffee Shop,Clothing Store,Café,Japanese Restaurant,Middle Eastern Restaurant,Bubble Tea Shop,Cosmetics Shop,Lingerie Store,Ramen Restaurant,Bookstore
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,M5C,Coffee Shop,Café,Cocktail Bar,Clothing Store,American Restaurant,Cosmetics Shop,Lingerie Store,Department Store,Creperie,Bakery
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,M4E,Asian Restaurant,Pub,Health Food Store,Trail,Neighborhood,Dog Run,Dessert Shop,Diner,Discount Store,Distribution Center


In [32]:
# visualize the resutling clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine cluster 1

In [33]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster labels,PostalCode,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
18,Central Toronto,0,M4N,Park,Construction & Landscaping,Bus Line,Swim School,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Donut Shop


#### Examine cluster 2

In [34]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster labels,PostalCode,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
19,Central Toronto,1,M5N,Ice Cream Shop,Home Service,Music Venue,Garden,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Department Store


#### Examine cluster 3

In [35]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster labels,PostalCode,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
0,Downtown Toronto,2,M5A,Coffee Shop,Pub,Bakery,Park,Café,Breakfast Spot,Theater,Yoga Studio,French Restaurant,Health Food Store
1,Downtown Toronto,2,M7A,Coffee Shop,Sushi Restaurant,Yoga Studio,Bank,Smoothie Shop,Beer Bar,Sandwich Place,Burrito Place,Café,Portuguese Restaurant
2,Downtown Toronto,2,M5B,Coffee Shop,Clothing Store,Café,Japanese Restaurant,Middle Eastern Restaurant,Bubble Tea Shop,Cosmetics Shop,Lingerie Store,Ramen Restaurant,Bookstore
3,Downtown Toronto,2,M5C,Coffee Shop,Café,Cocktail Bar,Clothing Store,American Restaurant,Cosmetics Shop,Lingerie Store,Department Store,Creperie,Bakery
4,East Toronto,2,M4E,Asian Restaurant,Pub,Health Food Store,Trail,Neighborhood,Dog Run,Dessert Shop,Diner,Discount Store,Distribution Center


#### Examine cluster 4

In [36]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster labels,PostalCode,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
21,Central Toronto,3,M5P,Trail,Jewelry Store,Mexican Restaurant,Sushi Restaurant,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


#### Examine cluster 5

In [37]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster labels,PostalCode,1st Most common venue,2nd Most common venue,3rd Most common venue,4th Most common venue,5th Most common venue,6th Most common venue,7th Most common venue,8th Most common venue,9th Most common venue,10th Most common venue
29,Central Toronto,4,M4T,Gym,Park,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Donut Shop
33,Downtown Toronto,4,M4W,Park,Playground,Trail,Department Store,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Donut Shop
