# Segmenting and Clustering Neighborhoods in Toronto - PART 3

### Exploring and clustering the neighborhoods in Toronto

### Introduction

This assignment will be using the Foursquare API to explore neighborhoods in some selected cities of Toronto. The Foursquare explore function will be used to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. The k-means clustering algorithm will also be used for the analysis. Finally, using the Folium library to visualize the neighborhoods in Toronto and their emerging clusters.

importing the required libraries, both pandas and numpy are loaded into the jupyter notebook.

In [1]:
import pandas as pd
import numpy as np

The dataframe was read from the provided Wikipedia website, and any Borough's with the name 'Not assigned' were ommitted, this produce the table below.

In [2]:
tables=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',header=0)
df1=tables[0]
df2=df1[df1['Borough']!='Not assigned']
df2.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


The Neighborhoods with common Boroughs and Postcodes were combined and seperated by a comma, using the aggregate join command.

In [3]:
df3=df2.groupby(['Postcode','Borough']).agg(','.join)

Any Neighbourhood with the 'Not Assigned' ID were repaced with the Borough name.

In [4]:
print(df3[df3['Neighbourhood']=='Not assigned'])

df4=df3.reset_index()
df4.loc[df4['Neighbourhood']=='Not assigned','Neighbourhood']=df4.loc[df4['Neighbourhood']=='Not assigned']['Borough']
df4[df4['Postcode']=='M7A']

Empty DataFrame
Columns: [Neighbourhood]
Index: []


Unnamed: 0,Postcode,Borough,Neighbourhood
85,M7A,Downtown Toronto,Queen's Park


 The 'shape' command was used to print the number of rows in the dataframe.

In [5]:
df4.shape[0]

103

The Geocoder package was successfully read directly from the given 'Geospatial_Coordinates' link provided in the question, and was downloaded into a dataframe.

In [7]:
geodf=pd.read_csv('http://cocl.us/Geospatial_data')
geodf

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Merging the dataframes using the postal code in both dataframes. The first 5 rows and the last 5 rows are previewed using the 'dfAll.head()' and 'dfAll.tail()'.

In [8]:
dfAll=pd.merge(df4,geodf,left_on='Postcode',right_on='Postal Code')
dfAll.drop(columns='Postal Code',inplace=True)
dfAll.rename(columns={'Postcode':'PostalCode'},inplace=True)
dfAll.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [9]:
dfAll=pd.merge(df4,geodf,left_on='Postcode',right_on='Postal Code')
dfAll.drop(columns='Postal Code',inplace=True)
dfAll.rename(columns={'Postcode':'PostalCode'},inplace=True)
dfAll.tail()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437
102,M9W,Etobicoke,Northwest,43.706748,-79.594054


## Creating a map of Toronto with neighbourhoods superimposed on top.

In [10]:
import folium

tlat=43.6532
tlong=-79.3832
torontoMap=folium.Map(location=[tlat,tlong],zoom_start=10.7)

for lat,long,borough,neighborhood in zip(dfAll['Latitude'],dfAll['Longitude'],dfAll['Borough'],dfAll['Neighbourhood']):
    label='{}, {}'.format(neighborhood,borough)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],
                       radius=5,
                       popup=label,
                       color='blue',
                       fill=True,
                       fill_color='#3186cc',
                       fill_opacity=0.7,
                       parse_html=False).add_to(torontoMap)
    
torontoMap

Finding the 'Boroughs' that contain the word 'Toronto'.

In [11]:
torontoData=dfAll.loc[dfAll.Borough.str.contains('Toronto')].reset_index(drop=True)
torontoData.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


Showing the geographical coordinates of Toronto on the world map

In [12]:
print('The geographical coordinates of Toronto are {}, {}'.format(tlat,tlong))

The geographical coordinates of Toronto are 43.6532, -79.3832


Now, we create a visualization of the Boroughs that have Toronto in their names.

In [13]:
scarboroughMap=folium.Map(location=[tlat,tlong],zoom_start=12)
for lat,long,label in zip(torontoData['Latitude'],torontoData['Longitude'],torontoData['Neighbourhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],
                       radius=5,
                       popup=label,
                       color='blue',
                       fill=True,
                       fill_color='#3186cc',
                       fil_opacity=0.7,
                       parse_html=False).add_to(scarboroughMap)
scarboroughMap

Defining Foursquare Credentials and Version from the Foursquare account created.

In [14]:
clientID='PP3GRPZRCU0IM4JN5V1Y4EMHM014QYMNGJYI2UQOR512QGO0'
clientSecret='1AOHHWXW5EDEW5EIA2WOS1DQHA5UAEEP4C01VTBNHD4HA4M3'
version='20180605'
print('Your credentials:')

Your credentials:


Let explore the first neighborhood in the dataframe.

In [15]:
nName=torontoData.loc[0,'Neighbourhood']
nLat=torontoData.loc[0,'Latitude']
nLong=torontoData.loc[0,'Longitude']
print('Latitude and longitude values of {} are {}, {}.'.format(nName,nLat,nLong))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


Obtaining the top 100 venues that are located within Toronto's Borough having the word 'Toronto' in the name within a radius of 500 meters.

In [16]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(clientID,
                                                                                                                           clientSecret,
                                                                                                                           tlat,
                                                                                                                           tlong,
                                                                                                                           version,
                                                                                                                           radius,
                                                                                                                           LIMIT)

Sending the 'get request' and examine the results.

In [17]:
import requests
import json
results=requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e623b32c546f3001b1694f7'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 128,
  'suggestedBounds': {'ne': {'lat': 43.6577000045, 'lng': -79.37699210971401},
   'sw': {'lat': 43.648699995499996, 'lng': -79.389407890286}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          '

Let use the 'get_category_type' function we learnt from the IBM Foursquare lab. Then the function extracts the category of the venues.

In [18]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Cleaning the 'json' and structuring it into a 'pandas' dataframe.

In [19]:
from pandas.io.json import json_normalize
venues=results['response']['groups'][0]['items']
venuesNearBy=json_normalize(venues)

# filter columns
filtered_columns=['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venuesNearBy=venuesNearBy.loc[:, filtered_columns]

# clean columns
venuesNearBy.columns = [col.split(".")[-1] for col in venuesNearBy.columns]

venuesNearBy.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,"[{'id': '4f2a25ac4b909258e854f55f', 'name': 'N...",43.653232,-79.385296
1,Nathan Phillips Square,"[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",43.65227,-79.383516
2,Indigo,"[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",43.653515,-79.380696
3,LUSH,"[{'id': '4bf58dd8d48988d10c951735', 'name': 'C...",43.653557,-79.3804
4,Eggspectation Bell Trinity Square,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",43.653144,-79.38198


Now let check to be sure that Foursquare returned 100 venues.

In [20]:
print('{} venues were returned by Foursquare.'.format(venuesNearBy.shape[0]))

100 venues were returned by Foursquare.


## Exploring Neighborhoods in Toronto.

Let create a function that will repeat the same process to all the neighborhoods in Toronto.

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            clientID, 
            clientSecret, 
            version, 
            tlat, 
            tlong, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Writing the code to run the above function on each neighborhood and create a new dataframe which we name 'torontoVenues'.

In [22]:
torontoVenues=getNearbyVenues(names=torontoData['Neighbourhood'],
                                   latitudes=torontoData['Latitude'],
                                   longitudes=torontoData['Longitude']
                                  )

The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvalles
Runnymede

Checking the size of the result dataframe.

In [23]:
print(torontoVenues.shape)
torontoVenues.head()

(3900, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Downtown Toronto,43.653232,-79.385296,Neighborhood
1,The Beaches,43.676357,-79.293031,Nathan Phillips Square,43.65227,-79.383516,Plaza
2,The Beaches,43.676357,-79.293031,Indigo,43.653515,-79.380696,Bookstore
3,The Beaches,43.676357,-79.293031,LUSH,43.653557,-79.3804,Cosmetics Shop
4,The Beaches,43.676357,-79.293031,Eggspectation Bell Trinity Square,43.653144,-79.38198,Breakfast Spot


Checking the venues that were returned for each neighborhood.

In [24]:
torontoVenues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,100,100,100,100,100,100
"Brockton,Exhibition Place,Parkdale Village",100,100,100,100,100,100
Business Reply Mail Processing Centre 969 Eastern,100,100,100,100,100,100
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",100,100,100,100,100,100
"Cabbagetown,St. James Town",100,100,100,100,100,100
Central Bay Street,100,100,100,100,100,100
"Chinatown,Grange Park,Kensington Market",100,100,100,100,100,100
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100


Finding out number of unique categories that can be curated from all the returned venues

In [25]:
print('There are {} uniques categories.'.format(len(torontoVenues['Venue Category'].unique())))

There are 65 uniques categories.


## Analyzing Each Neighborhood.

In [26]:
# one hot encoding
torontoOnehot=pd.get_dummies(torontoVenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
torontoOnehot['Neighborhood'] = torontoVenues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [torontoOnehot.columns[-1]] + list(torontoOnehot.columns[:-1])
torontoOnehot=torontoOnehot[fixed_columns]

torontoOnehot.head()

Unnamed: 0,Women's Store,American Restaurant,Asian Restaurant,Bakery,Bank,Bar,Beer Bar,Bookstore,Breakfast Spot,Bubble Tea Shop,...,Shopping Mall,Speakeasy,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tanning Salon,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


Let examine the new dataframe size.

In [27]:
torontoOnehot.shape

(3900, 65)

Grouping the rows by 'neighborhood' and by taking the mean of the frequency of occurrence of each category.

In [28]:
torontoGrouped=torontoOnehot.groupby('Neighborhood').mean().reset_index()
torontoGrouped.head()

Unnamed: 0,Neighborhood,Women's Store,American Restaurant,Asian Restaurant,Bakery,Bank,Bar,Beer Bar,Bookstore,Breakfast Spot,...,Shopping Mall,Speakeasy,Sporting Goods Shop,Steakhouse,Sushi Restaurant,Tanning Salon,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant
0,"Adelaide,King,Richmond",0.01,0.03,0.01,0.02,0.01,0.01,0.01,0.01,0.03,...,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.02
1,Berczy Park,0.01,0.03,0.01,0.02,0.01,0.01,0.01,0.01,0.03,...,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.02
2,"Brockton,Exhibition Place,Parkdale Village",0.01,0.03,0.01,0.02,0.01,0.01,0.01,0.01,0.03,...,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.02
3,Business Reply Mail Processing Centre 969 Eastern,0.01,0.03,0.01,0.02,0.01,0.01,0.01,0.01,0.03,...,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.02
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.01,0.03,0.01,0.02,0.01,0.01,0.01,0.01,0.03,...,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.02


Checking the new size.

In [29]:
torontoGrouped.shape

(39, 65)

Printing each neighborhood along with the top 5 most common venues.

In [30]:
num_top_venues = 5

for hood in torontoGrouped['Neighborhood']:
    print("----"+hood+"----")
    temp = torontoGrouped[torontoGrouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                 venue  freq
0       Clothing Store  0.07
1           Restaurant  0.04
2       Breakfast Spot  0.03
3  American Restaurant  0.03
4          Coffee Shop  0.03


----Berczy Park----
                 venue  freq
0       Clothing Store  0.07
1           Restaurant  0.04
2       Breakfast Spot  0.03
3  American Restaurant  0.03
4          Coffee Shop  0.03


----Brockton,Exhibition Place,Parkdale Village----
                 venue  freq
0       Clothing Store  0.07
1           Restaurant  0.04
2       Breakfast Spot  0.03
3  American Restaurant  0.03
4          Coffee Shop  0.03


----Business Reply Mail Processing Centre 969 Eastern----
                 venue  freq
0       Clothing Store  0.07
1           Restaurant  0.04
2       Breakfast Spot  0.03
3  American Restaurant  0.03
4          Coffee Shop  0.03


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
                 venue  fre

Putting the above into a pandas dataframe.

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Lets create a new dataframe and display the top 10 venues for each neighbourhood.

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted=pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood']=torontoGrouped['Neighborhood']

for ind in np.arange(torontoGrouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:]=return_most_common_venues(torontoGrouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
1,Berczy Park,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
2,"Brockton,Exhibition Place,Parkdale Village",Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
5,"Cabbagetown,St. James Town",Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
6,Central Bay Street,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
7,"Chinatown,Grange Park,Kensington Market",Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
8,Christie,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
9,Church and Wellesley,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant


In [47]:
neighborhoods_venues_sorted.shape

(39, 11)

## Cluster Neighborhoods

Running k-means to cluster the neighborhood into 5 clusters.

In [78]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

torontoGroupedClustering=torontoGrouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans=KMeans(n_clusters=kclusters, random_state=0).fit(torontoGroupedClustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

  return_n_iter=True)


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Creating a new dataframe which includes the 'cluster' as well as the top 10 venues for each neighborhood. First previewing the first 5 rows, afterward all the rows.

In [34]:
torontoMerged=torontoData
torontoMerged.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)

# add clustering labels
torontoMerged['Cluster Labels']=kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
torontoMerged=torontoMerged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

torontoMerged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant


In [35]:
torontoMerged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant


Lastly, visualizing the resulting clusters.

In [36]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters=folium.Map(location=[tlat,tlong], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array=cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(torontoMerged['Latitude'],torontoMerged['Longitude'],torontoMerged['Neighborhood'],torontoMerged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining the Clusters

We can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

In [37]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 0,torontoMerged.columns[[1] + list(range(5,torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
1,East Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
2,East Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
3,East Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
4,Central Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
5,Central Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
6,Central Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
7,Central Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
8,Central Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
9,Central Toronto,0,Clothing Store,Restaurant,Hotel,Plaza,Seafood Restaurant,Coffee Shop,Italian Restaurant,Burger Joint,Breakfast Spot,American Restaurant
