Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [16]:
!pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 7.5 MB/s  eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import io

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Webscraping and Explore Dataset


In [4]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url)[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Clear the rows with unassigned postal code 

In [5]:
df=df.drop(df[df.Borough=='Not assigned'].index)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


## 2. Download Geographical coordinate using geocoder API


We have tried the geocode but it cannot obtain any result even using the while loop. Then I abandon this api and directly download the csv file.

In [6]:
url="http://cocl.us/Geospatial_data"
s=requests.get(url).content
postal=pd.read_csv(io.StringIO(s.decode('utf-8')))
postal.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Using Join to connect the two tables by matching column `Postal Code`.

In [106]:
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [107]:
postal

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [79]:
result = pd.merge(df, postal, how='left', on=['Postal Code'])

In [71]:
result

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


## 3. Expand the Neibourhood column to allow each row represents one neighbourhood


In [80]:
data=result['Neighbourhood'].str.split(',',expand=True).stack()
data=data.reset_index(level=1)
data.columns=['L1','NB']
result=result.join(data)
result.drop('L1',axis=1,inplace=True)
result.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,NB
0,M3A,North York,Parkwoods,43.753259,-79.329656,Parkwoods
1,M4A,North York,Victoria Village,43.725882,-79.315572,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Regent Park
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Harbourfront
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Lawrence Manor


In [81]:
result.shape

(219, 6)

In [82]:
result.drop('Neighbourhood',axis=1,inplace=True)
result.rename(columns = {'NB':'Neighbourhood'},inplace = True)

In [84]:
result.head()

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Neighbourhood
0,M3A,North York,43.753259,-79.329656,Parkwoods
1,M4A,North York,43.725882,-79.315572,Victoria Village
2,M5A,Downtown Toronto,43.65426,-79.360636,Regent Park
2,M5A,Downtown Toronto,43.65426,-79.360636,Harbourfront
3,M6A,North York,43.718518,-79.464763,Lawrence Manor


In [85]:
result=result[['Neighbourhood','Borough','Postal Code','Latitude','Longitude']]

In [86]:
result.head()

Unnamed: 0,Neighbourhood,Borough,Postal Code,Latitude,Longitude
0,Parkwoods,North York,M3A,43.753259,-79.329656
1,Victoria Village,North York,M4A,43.725882,-79.315572
2,Regent Park,Downtown Toronto,M5A,43.65426,-79.360636
2,Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636
3,Lawrence Manor,North York,M6A,43.718518,-79.464763


#### Use geopy library to get the latitude and longitude values of Toronto.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [88]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [95]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(result['Latitude'], result['Longitude'], result['Borough'], result['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.
#### Define Foursquare Credentials and Version


In [96]:
CLIENT_ID = '5XIMYHOCOR30GGNRYTHQQVJ1UOOQXGLVJBWBS44JN43O4RG4' # your Foursquare ID
CLIENT_SECRET = 'R1UPQH33UDN5ATR44JJBH2EFIIIT4R4QWFAPKUWTIVNY0TL1' # your Foursquare Secret
VERSION = '20210312' # Foursquare API version
LIMIT = 200 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5XIMYHOCOR30GGNRYTHQQVJ1UOOQXGLVJBWBS44JN43O4RG4
CLIENT_SECRET:R1UPQH33UDN5ATR44JJBH2EFIIIT4R4QWFAPKUWTIVNY0TL1


#### Let's explore the first neighborhood in our dataframe.
Get the neighborhood's name.


In [97]:
result.loc[0, 'Neighbourhood']

'Parkwoods'

Get the neighborhood's latitude and longitude values.

In [100]:
neighborhood_latitude = result.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = result.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = result.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


#### Now, let's get the top 200 venues that are in Parkwoods within a radius of 500 meters.
First, let's create the GET request URL. Name your URL **url**.

In [102]:
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=5XIMYHOCOR30GGNRYTHQQVJ1UOOQXGLVJBWBS44JN43O4RG4&client_secret=R1UPQH33UDN5ATR44JJBH2EFIIIT4R4QWFAPKUWTIVNY0TL1&v=20210312&ll=43.7532586,-79.3296565&radius=500&limit=200'

Send the GET request and examine the resutls

In [103]:
results = requests.get(url).json()

From the Foursquare lab in the previous module, we know that all the information is in the _items_ key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [104]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a _pandas_ dataframe.

In [105]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Brookbanks Pool,Pool,43.751389,-79.332184
2,Variety Store,Food & Drink Shop,43.751974,-79.333114
3,Corrosion Service Company Limited,Construction & Landscaping,43.752432,-79.334661


And how many venues were returned by Foursquare?


In [106]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


# 4. Explore Neighborhoods in Toronto

### Let's create a function to repeat the same process to all the neighborhoods in Grate Toronto Area


In [107]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called _toronto_venues_.

In [108]:
toronto_venues = getNearbyVenues(names=result['Neighbourhood'],
                                   latitudes=result['Latitude'],
                                   longitudes=result['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park
 Harbourfront
Lawrence Manor
 Lawrence Heights
Queen's Park
 Ontario Provincial Government
Islington Avenue
 Humber Valley Village
Malvern
 Rouge
Don Mills
Parkview Hill
 Woodbine Gardens
Garden District
 Ryerson
Glencairn
West Deane Park
 Princess Gardens
 Martin Grove
 Islington
 Cloverdale
Rouge Hill
 Port Union
 Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate
 Bloordale Gardens
 Old Burnhamthorpe
 Markland Wood
Guildwood
 Morningside
 West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor
 Wilson Heights
 Downsview North
Thorncliffe Park
Richmond
 Adelaide
 King
Dufferin
 Dovercourt Village
Scarborough Village
Fairview
 Henry Farm
 Oriole
Northwood Park
 York University
East Toronto
 Broadview North (Old East York)
Harbourfront East
 Union Station
 Toronto Islands
Little Portugal
 Trinity
Kennedy Park
 Ionview
 East Birchmo

#### Let's check the size of the resulting dataframe


In [109]:
print(toronto_venues.shape)
toronto_venues.head()

(4298, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Brookbanks Pool,43.751389,-79.332184,Pool
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Parkwoods,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena


Let's check how many venues were returned for each neighborhood


In [110]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,94,94,94,94,94,94
Agincourt North,3,3,3,3,3,3
Albion Gardens,10,10,10,10,10,10
Bathurst Quay,16,16,16,16,16,16
Beaumond Heights,10,10,10,10,10,10
...,...,...,...,...,...,...
Wexford,7,7,7,7,7,7
Willowdale,40,40,40,40,40,40
Woburn,5,5,5,5,5,5
Woodbine Heights,5,5,5,5,5,5


#### Let's find out how many unique categories can be curated from all the returned venues


In [111]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 274 uniques categories.


# 5. Analyze Each Neighborhood


In [115]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.


In [117]:
toronto_onehot.shape

(4298, 274)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [119]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Adelaide,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.021277,...,0.0,0.0,0.0,0.010638,0.0,0.000,0.0,0.0,0.0,0.0
1,Agincourt North,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.000,0.0,0.0,0.0,0.0
2,Albion Gardens,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.000,0.0,0.0,0.0,0.0
3,Bathurst Quay,0.0,0.0,0.0,0.0625,0.0625,0.0625,0.1875,0.125,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.000,0.0,0.0,0.0,0.0
4,Beaumond Heights,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
201,Wexford,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.000,0.0,0.0,0.0,0.0
202,Willowdale,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.025,0.0,0.0,0.0,0.0
203,Woburn,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.000,0.0,0.0,0.0,0.0
204,Woodbine Heights,0.0,0.0,0.0,0.0000,0.0000,0.0000,0.0000,0.000,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.000,0.0,0.0,0.0,0.0


#### Let's confirm the new size


In [120]:
toronto_grouped.shape

(206, 274)

### Let's skip printing each neighborhood along with the top 5 most common venues, since it is not a necessaary step in the problem demends and there are over 200 neighborhoods in this case. 


### But let's put that into a _pandas_ dataframe

First, let's write a function to sort the venues in descending order.


In [121]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [211]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Restaurant,Clothing Store,Hotel,Deli / Bodega,Gym,Bakery,Thai Restaurant,Cosmetics Shop
1,Agincourt North,Intersection,Park,Playground,Metro Station,Molecular Gastronomy Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant
2,Albion Gardens,Grocery Store,Pizza Place,Japanese Restaurant,Coffee Shop,Sandwich Place,Fast Food Restaurant,Beer Store,Fried Chicken Joint,Pharmacy,Hostel
3,Bathurst Quay,Airport Service,Airport Terminal,Harbor / Marina,Bar,Rental Car Location,Coffee Shop,Sculpture Garden,Boutique,Boat or Ferry,Plane
4,Beaumond Heights,Grocery Store,Pizza Place,Japanese Restaurant,Coffee Shop,Sandwich Place,Fast Food Restaurant,Beer Store,Fried Chicken Joint,Pharmacy,Hostel


# 6. Cluster Neighborhoods


Run _k_-means to cluster the neighborhood into 5 clusters.

In [212]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=4).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 1, 4, 4, 4, 4, 1, 4, 4, 4], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [213]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = result

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Neighbourhood,Borough,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,North York,M3A,43.753259,-79.329656,1.0,Park,Construction & Landscaping,Pool,Food & Drink Shop,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
1,Victoria Village,North York,M4A,43.725882,-79.315572,4.0,Pizza Place,Hockey Arena,Coffee Shop,Portuguese Restaurant,Massage Studio,Medical Center,Martial Arts School,Mediterranean Restaurant,Market,Moroccan Restaurant
2,Regent Park,Downtown Toronto,M5A,43.65426,-79.360636,4.0,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Distribution Center,Brewery,Performing Arts Venue
2,Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636,4.0,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Distribution Center,Brewery,Performing Arts Venue
3,Lawrence Manor,North York,M6A,43.718518,-79.464763,4.0,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Vietnamese Restaurant,Athletics & Sports,Gift Shop,Coffee Shop,Yoga Studio,Middle Eastern Restaurant


There are some neighbourhoods does not have any venue record. These neighbourhoods cannot be clustered since we do not have any information other than the coordinate of theneighbourhood. We manually catagory it to the 6th catagory.

In [214]:
toronto_merged['Cluster Labels'].fillna(5,inplace=True)
toronto_merged['Cluster Labels']=toronto_merged['Cluster Labels'].astype('int32')
toronto_merged.head()

Unnamed: 0,Neighbourhood,Borough,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,North York,M3A,43.753259,-79.329656,1,Park,Construction & Landscaping,Pool,Food & Drink Shop,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
1,Victoria Village,North York,M4A,43.725882,-79.315572,4,Pizza Place,Hockey Arena,Coffee Shop,Portuguese Restaurant,Massage Studio,Medical Center,Martial Arts School,Mediterranean Restaurant,Market,Moroccan Restaurant
2,Regent Park,Downtown Toronto,M5A,43.65426,-79.360636,4,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Distribution Center,Brewery,Performing Arts Venue
2,Harbourfront,Downtown Toronto,M5A,43.65426,-79.360636,4,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Distribution Center,Brewery,Performing Arts Venue
3,Lawrence Manor,North York,M6A,43.718518,-79.464763,4,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Vietnamese Restaurant,Athletics & Sports,Gift Shop,Coffee Shop,Yoga Studio,Middle Eastern Restaurant


Finally, let's visualize the resulting clusters


In [217]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters+1)
ys = [i + x + (i*x)**2 for i in range(kclusters+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# 7. Examine Clusters
Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

### Cluster 1

In [196]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,0,Bar,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
12,Scarborough,0,Bar,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
12,Scarborough,0,Bar,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
77,Etobicoke,0,Sandwich Place,Mobile Phone Shop,Yoga Studio,Metro Station,Modern European Restaurant,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Pet Store
77,Etobicoke,0,Sandwich Place,Mobile Phone Shop,Yoga Studio,Metro Station,Modern European Restaurant,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Pet Store
77,Etobicoke,0,Sandwich Place,Mobile Phone Shop,Yoga Studio,Metro Station,Modern European Restaurant,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Pet Store
77,Etobicoke,0,Sandwich Place,Mobile Phone Shop,Yoga Studio,Metro Station,Modern European Restaurant,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Pet Store


### Cluster 2

In [197]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1,Park,Construction & Landscaping,Pool,Food & Drink Shop,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
21,York,1,Park,Women's Store,Pool,Men's Store,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station
35,East York,1,Park,Convenience Store,Yoga Studio,Men's Store,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station
35,East York,1,Park,Convenience Store,Yoga Studio,Men's Store,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station
49,North York,1,Bakery,Park,Construction & Landscaping,Massage Studio,Yoga Studio,Mexican Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
49,North York,1,Bakery,Park,Construction & Landscaping,Massage Studio,Yoga Studio,Mexican Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
49,North York,1,Bakery,Park,Construction & Landscaping,Massage Studio,Yoga Studio,Mexican Restaurant,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
52,North York,1,Park,Yoga Studio,Men's Store,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station,Mediterranean Restaurant
61,Central Toronto,1,Park,Swim School,Bus Line,Yoga Studio,Metro Station,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store
64,York,1,Park,Yoga Studio,Men's Store,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station,Mediterranean Restaurant


### Cluster 3

In [198]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,2,Paper / Office Supplies Store,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store
57,North York,2,Paper / Office Supplies Store,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
101,Etobicoke,2,Baseball Field,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark


### Cluster 4

In [199]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,3,Bakery,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
11,Etobicoke,3,Bakery,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
11,Etobicoke,3,Bakery,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
11,Etobicoke,3,Bakery,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark
11,Etobicoke,3,Bakery,Yoga Studio,Metro Station,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Men's Store,Monument / Landmark


### Cluster 5

In [200]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,4,Pizza Place,Hockey Arena,Coffee Shop,Portuguese Restaurant,Massage Studio,Medical Center,Martial Arts School,Mediterranean Restaurant,Market,Moroccan Restaurant
2,Downtown Toronto,4,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Distribution Center,Brewery,Performing Arts Venue
2,Downtown Toronto,4,Coffee Shop,Park,Bakery,Café,Pub,Breakfast Spot,Theater,Distribution Center,Brewery,Performing Arts Venue
3,North York,4,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Vietnamese Restaurant,Athletics & Sports,Gift Shop,Coffee Shop,Yoga Studio,Middle Eastern Restaurant
3,North York,4,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Vietnamese Restaurant,Athletics & Sports,Gift Shop,Coffee Shop,Yoga Studio,Middle Eastern Restaurant
4,Downtown Toronto,4,Coffee Shop,Diner,Sushi Restaurant,Yoga Studio,Italian Restaurant,Mexican Restaurant,Smoothie Shop,Café,Fried Chicken Joint,Sandwich Place
4,Downtown Toronto,4,Coffee Shop,Diner,Sushi Restaurant,Yoga Studio,Italian Restaurant,Mexican Restaurant,Smoothie Shop,Café,Fried Chicken Joint,Sandwich Place
6,Scarborough,4,Fast Food Restaurant,Yoga Studio,Men's Store,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station,Mediterranean Restaurant
6,Scarborough,4,Fast Food Restaurant,Yoga Studio,Men's Store,Modern European Restaurant,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Mexican Restaurant,Metro Station,Mediterranean Restaurant
7,North York,4,Gym,Beer Store,Coffee Shop,Restaurant,Sporting Goods Shop,Asian Restaurant,Athletics & Sports,Clothing Store,Chinese Restaurant,Bike Shop


### Cluster 6 (No venue information from Foursquare)

In [218]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Etobicoke,5,,,,,,,,,,
5,Etobicoke,5,,,,,,,,,,
45,North York,5,,,,,,,,,,
45,North York,5,,,,,,,,,,
95,Scarborough,5,,,,,,,,,,
