 Segmenting and Clustering Neighborhoods in Toronto

In [85]:
#importing libraries
import pandas as pd
import numpy as np
import requests #To handle HTTPS requests
from bs4 import BeautifulSoup #To extract data out of HTML/XML
from pandas.io.json import json_normalize#Transform json file to pandas Data Frame
from sklearn.cluster import KMeans #import k-means from clustering
import matplotlib.cm as cm#Matplotlib and its plotting modules
import matplotlib.colors as colors

Installing geopy and Folium librairies

In [2]:
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # Convert an address to Latitude an Longitude

In [3]:
#!conda install -c conda-forge folium
import folium #Map Rendering Library

Extracting data from Wikipedia and using the BeautifulSoup parser to parse the response 

In [4]:
#Extracting data from Wikipedia
web_url=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text #Ping a website and return the HTML
#Parse the response in lxml
soup=BeautifulSoup(web_url,'lxml')

From inspecting elemnts we could find that the desired table is listed under class :wikitable sortable in HTML. The HTML is then used to extract the data to DataFrame

In [5]:
#From inspecting elemnts we could find that the desired table is listed under class :wikitable sortable
#Extracting the table from the page
Table=soup.find('table',{'class':'wikitable sortable'})
Table_rows=Table.find_all("tr")
l=[]
#extracting rows from Table
for rows in Table_rows:
    l.append(rows.text.split("\n"))

Data PreProcessing and Cleaning

In [6]:
#Converting List to DF
df=pd.DataFrame(l)
#Droping empty columns
df=df.drop(columns=[0,2,4,6])
#Making First Row as the column Header
df=df.rename(columns=df.iloc[0])
df=df.iloc[1:]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Checking for the constrains mentioned:
-Rows with Borough Not Assigned.
-Rows with duplicated Postal Code
-Rows with Neighbourhood is Not assigned

In [7]:
#removing rows were there is no Value assigned to the Borough
df=df[df.Borough!='Not assigned']
#Checking for rows with same postal code
print(set(df.duplicated(['Postal Code'])))
#As there are no rows in Data Frame with Duplicated Postal Code further processing is not required
print(set(df.Neighbourhood=='Not assigned'))
#As there are no neighbourhood with value 'Not Assigned' further processing is not requires
df.reset_index(drop=True)

{False}
{False}


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Final Data Frame shape after pre processing

In [8]:
df.shape

(103, 3)

Importing the CSV file for the Geographical coordinates for different postal codes

In [9]:
GeoDF=pd.read_csv('http://cocl.us/Geospatial_data')
GeoDF.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging the Data Frame with the Geospatial data with the data frame containing details on neighbourhood.

In [10]:
Toronto_DF=pd.merge(df,GeoDF,on='Postal Code')
Toronto_DF.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Using the geopy library to get latitude and longitude of Toronto City.

In [11]:
geolocator=Nominatim(user_agent="toronto_explorer")
location=geolocator.geocode('Toronto')
latitude=location.latitude
longitude=location.longitude
print('The geographical coordinates of toronto are {},{}'.format(latitude,longitude))

The geographical coordinates of toronto are 43.6534817,-79.3839347


Creating a map of Toronto  with neighbourhoods superimposed on top

In [12]:
#create a map of Toronoto
map_toronto=folium.Map(location=[latitude,longitude],zoom_start=10)
#add markers on top
for lat,lng,borough,neighbourhood in zip(Toronto_DF['Latitude'],Toronto_DF['Longitude'],Toronto_DF['Borough'],Toronto_DF['Neighbourhood']):
    label='{},{}'.format(neighbourhood,borough)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False
    ).add_to(map_toronto)
map_toronto

Segment Cluster with only Boroughs having Toronto in their name

In [13]:
newTDF=Toronto_DF[Toronto_DF['Borough'].str.contains('Toronto')]
newTDF

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


Visualising the above Boroughs in the map

In [14]:
#create a map of Toronoto
toronto_map=folium.Map(location=[latitude,longitude],zoom_start=11)
#add markers on top
for lat,lng,borough,neighbourhood in zip(newTDF['Latitude'],newTDF['Longitude'],newTDF['Borough'],newTDF['Neighbourhood']):
    label='{},{}'.format(neighbourhood,borough)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False
    ).add_to(toronto_map)
toronto_map

Defining Foursquare Credentials and Version

In [15]:
Client_id='0YBWZ2F0AHIJ2JQ53P5Y1GSIY5K1JDBRQ2JZ330YH2RNJEUH'
Client_secret='JEI1UNAMKFAJ2NA4PQSFMN3DCOTZ3QDJ2PU0G4UA05FFP5MW'
version='20210101'
limit='100'

Exploring the First Neighbourhood in the Toronto Data Frame

In [16]:
#Exploring the first Neighbourhood in the Toronto DataFrame
newTDF.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [17]:
neigh_lat=newTDF['Latitude'].iloc[0]
neigh_lat
neigh_lng=newTDF['Longitude'].iloc[0]
neigh_name=newTDF['Neighbourhood'].iloc[0]
print(neigh_lat,neigh_lng,neigh_name)

43.6542599 -79.3606359 Regent Park, Harbourfront


Top 100 venues that are in Regent Park, Harbourfront Area within 500 meters

In [18]:
LIMIT=100
radius=500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    Client_id, 
    Client_secret, 
    version, 
    neigh_lat, 
    neigh_lng, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=0YBWZ2F0AHIJ2JQ53P5Y1GSIY5K1JDBRQ2JZ330YH2RNJEUH&client_secret=JEI1UNAMKFAJ2NA4PQSFMN3DCOTZ3QDJ2PU0G4UA05FFP5MW&v=20210101&ll=43.6542599,-79.3606359&radius=500&limit=100'

In [19]:
results=requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60329ad577c0b13d348d2b04'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 44,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label': 'display',
 

In [20]:
# function to extract the category of the venue
def get_category_type(row):
    try:
        categories_list=row['categories']
    except:
        categories_list=row['venue.categories']
    if len(categories_list)==0:
        return None
    else:
        return categories_list[0]['name']

Clean the Json Data and structure it into pandas dataframe

In [21]:
venues=results['response']['groups'][0]['items']
nearby_venues=pd.json_normalize(venues)
nearby_venues
#filter columns
filtered_columns=['venue.name','venue.categories','venue.location.lat','venue.location.lng']
nearby_venues=nearby_venues.loc[:,filtered_columns]
#filter the category of each row
nearby_venues['venue.categories']=nearby_venues.apply(get_category_type,axis=1)
#clean columns
nearby_venues.columns=[col.split('.')[-1]for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Distribution Center,43.653249,-79.358008
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Impact Kitchen,Restaurant,43.656369,-79.35698


Exploring All the Neighbourhoods in Toronoto. Below is the Function to Process the same process to all the Neighbourhoods in Toronto

In [53]:
def getNearbyVenues(names,latitudes,longitudes,radius=500):
    venues_list=[]
    for name,lat,lng in zip(names,latitudes,longitudes):
        #print(name)
#create API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        Client_id, 
        Client_secret, 
        version, 
        lat, 
        lng, 
        radius, 
        LIMIT)
#make the GET Request
        results=requests.get(url).json()['response']['groups'][0]['items']

#Return only Relevant information for each nearby venue
        venues_list.append([(name,lat,lng,v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name'])for v in results])
    nearby_venues=pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns=['Neighbourhood','Neighbourhood Latitude','Neighbourhood Longitude','Venue','Venue Latitude','Venue Longitude', 'Venue Category']
    return nearby_venues

        

Call Above function on each neighbourhood and to create a new dataframe with Toronto Venues

In [54]:
toronto_venues=getNearbyVenues(names=newTDF['Neighbourhood'],latitudes=newTDF['Latitude'],longitudes=newTDF['Longitude'])
#size of resulting Dataframe
print(toronto_venues.shape)
toronto_venues.head()

(1600, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


Analyse Each Neighbourhood

In [57]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,57,57,57,57,57,57
"Brockton, Parkdale Village, Exhibition Place",24,24,24,24,24,24
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",15,15,15,15,15,15
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17
Central Bay Street,61,61,61,61,61,61
Christie,16,16,16,16,16,16
Church and Wellesley,81,81,81,81,81,81
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,37,37,37,37,37,37
Davisville North,8,8,8,8,8,8


In [59]:
print('There are {} unique categories'.format(len(toronto_venues['Venue Category'].unique())))

There are 233 unique categories


In [60]:
#one hot encoding
Toronto_onehot=pd.get_dummies(toronto_venues[['Venue Category']],prefix="",prefix_sep="")
#Adding the neighbourhood column to the one hot encoded DF
Toronto_onehot['Neighbourhood']=toronto_venues['Neighbourhood']
#move the neighbourhood coloumn as the first column
fixed_columns=[Toronto_onehot.columns[-1]]+list(Toronto_onehot.columns[:-1])
Toronto_onehot=Toronto_onehot[fixed_columns]
Toronto_onehot.shape

(1600, 234)

Group Each of the Neighbourhood and take mean of the frequency of occurency of each category

In [62]:
Toronto_grouped=Toronto_onehot.groupby('Neighbourhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighbourhood,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.016393
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,...,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024691
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,...,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Each Neighbourhood along with the top 5 most common venues 

In [78]:
num_top_venues=5
for N in Toronto_grouped['Neighbourhood']:
    print("----"+N+"----")
    temp=Toronto_grouped[Toronto_grouped['Neighbourhood']==N].T.reset_index()
    temp.columns=['venue','freq']
    temp=temp.iloc[1:]
    temp['freq']=temp['freq'].astype(float)
    temp=temp.round({'freq':2})
    print(temp.sort_values('freq',ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
            venue  freq
0     Coffee Shop  0.09
1    Cocktail Bar  0.07
2          Bakery  0.05
3        Beer Bar  0.04
4  Farmers Market  0.04


----Brockton, Parkdale Village, Exhibition Place----
                   venue  freq
0                   Café  0.12
1         Breakfast Spot  0.08
2  Performing Arts Venue  0.08
3            Coffee Shop  0.08
4                    Gym  0.04


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                venue  freq
0  Light Rail Station  0.13
1          Comic Shop  0.07
2       Auto Workshop  0.07
3                Park  0.07
4          Restaurant  0.07


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0   Airport Service  0.18
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3             Plane  0.06
4           Airport  0.06


----Central Bay Street----
                venue  freq
0 

Sorting the Venues in descending order and making a pandas DataFrame

In [79]:
def return_most_common_venues(row, num_top_venues):
    row_categories=row.iloc[1:]
    row_categories_sorted=row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [83]:
#Creating DataFrame for top 10 venues in the Neighbourhood
num_top_venues=10
indicators=['st','nd','rd']
#create columns according to number of top venues
columns=['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1,indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
#Creating A new Data Frame
N_venues_sorted=pd.DataFrame(columns=columns)
N_venues_sorted['Neighbourhood']=Toronto_grouped['Neighbourhood']
for ind in np.arange(Toronto_grouped.shape[0]):
    N_venues_sorted.iloc[ind,1:]=return_most_common_venues(Toronto_grouped.iloc[ind,:],num_top_venues)
N_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Farmers Market,Cheese Shop,Seafood Restaurant,Restaurant,Steakhouse,Breakfast Spot
1,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Performing Arts Venue,Coffee Shop,Gym,Italian Restaurant,Stadium,Furniture / Home Store,Climbing Gym,Bar
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Comic Shop,Auto Workshop,Park,Restaurant,Farmers Market,Burrito Place,Fast Food Restaurant,Skate Park,Brewery
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Plane,Airport,Boutique,Boat or Ferry,Sculpture Garden,Coffee Shop,Rental Car Location
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Thai Restaurant,Bubble Tea Shop,Burger Joint,Salad Place,French Restaurant,Seafood Restaurant


Cluster Neighbourhoods, Run k-means to cluster the neighbourhoods into 5 clusters

In [115]:
#set number of clusters
kclusters=5
Tg_cluster=Toronto_grouped.drop('Neighbourhood',1)
#run k-means clustering
kmeans=KMeans(n_clusters=kclusters,random_state=0).fit(Tg_cluster)
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [116]:
#N_venues_sorted.insert(0,'Cluster Labels',kmeans.labels_)
N_venues_sorted.head()
Toronto_merged=newTDF
newTDF.head()
# merge Toronto_grouped with newTDF to add latitude/longitude for each neighborhood
Toronto_merged=Toronto_merged.join(N_venues_sorted.set_index('Neighbourhood'),on='Neighbourhood')
Toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Bakery,Park,Breakfast Spot,Pub,Café,Theater,Brewery,Spa,Shoe Store
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Sushi Restaurant,Gym,Fried Chicken Joint,Burger Joint,Burrito Place,Café,College Auditorium,Creperie,Diner
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Italian Restaurant,Japanese Restaurant,Café,Cosmetics Shop,Middle Eastern Restaurant,Bubble Tea Shop,Ramen Restaurant,Theater
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Café,Coffee Shop,Cocktail Bar,American Restaurant,Gastropub,Moroccan Restaurant,Seafood Restaurant,Clothing Store,Restaurant,Cosmetics Shop
19,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Neighborhood,Health Food Store,Trail,Pub,Museum,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant


In [117]:
#Creating Map
#create a map of Toronoto
cluster_map=folium.Map(location=[latitude,longitude],zoom_start=11)
#Set color scheme for the clusters
x=np.arange(kclusters)
ys=[i+x+(i*x)**2 for i in range(kclusters)]
colors_array=cm.rainbow(np.linspace(0,1,len(ys)))
rainbow=[colors.rgb2hex(i) for i in colors_array]

#add markers on top
for lat,lng,poi,cluster in zip(Toronto_merged['Latitude'],Toronto_merged['Longitude'],Toronto_merged['Neighbourhood'],Toronto_merged['Cluster Labels']):
   # label='{},{}'.format(neighbourhood,borough)
    label=folium.Popup(str(poi)+'Cluster'+str(cluster),parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7,
        parse_html=False
    ).add_to(cluster_map)
cluster_map

Examine Each cluster

Cluster 1

In [100]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==0, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Bakery,Park,Breakfast Spot,Pub,Café,Theater,Brewery,Spa,Shoe Store
4,Downtown Toronto,0,Coffee Shop,Sushi Restaurant,Gym,Fried Chicken Joint,Burger Joint,Burrito Place,Café,College Auditorium,Creperie,Diner
9,Downtown Toronto,0,Clothing Store,Coffee Shop,Italian Restaurant,Japanese Restaurant,Café,Cosmetics Shop,Middle Eastern Restaurant,Bubble Tea Shop,Ramen Restaurant,Theater
15,Downtown Toronto,0,Café,Coffee Shop,Cocktail Bar,American Restaurant,Gastropub,Moroccan Restaurant,Seafood Restaurant,Clothing Store,Restaurant,Cosmetics Shop
19,East Toronto,0,Neighborhood,Health Food Store,Trail,Pub,Museum,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
20,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Farmers Market,Cheese Shop,Seafood Restaurant,Restaurant,Steakhouse,Breakfast Spot
24,Downtown Toronto,0,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Thai Restaurant,Bubble Tea Shop,Burger Joint,Salad Place,French Restaurant,Seafood Restaurant
25,Downtown Toronto,0,Grocery Store,Café,Park,Italian Restaurant,Candy Store,Baby Store,Nightclub,Athletics & Sports,Restaurant,Coffee Shop
30,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Gym,Thai Restaurant,Clothing Store,Deli / Bodega,Bakery,Salad Place,Burrito Place
31,West Toronto,0,Pharmacy,Bakery,Music Venue,Middle Eastern Restaurant,Café,Supermarket,Bar,Bank,Brewery,Park


Cluster 2

In [118]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==1, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
83,Central Toronto,1,Park,Adult Boutique,Museum,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop


Cluster 3

In [119]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==2, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Central Toronto,2,Park,Trail,Jewelry Store,Sushi Restaurant,Adult Boutique,Moroccan Restaurant,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store
91,Downtown Toronto,2,Park,Playground,Trail,Adult Boutique,Movie Theater,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant


Cluster 4

In [123]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==3, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Central Toronto,3,Park,Bus Line,Swim School,Adult Boutique,Moroccan Restaurant,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant


Cluster 5

In [121]:
Toronto_merged.loc[Toronto_merged['Cluster Labels']==4, Toronto_merged.columns[[1]+list(range(5,Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,4,Music Venue,Home Service,Garden,Museum,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
