# Segmenting and Clustering Neighborhoods of Boroughs in Toronto

## Introduction

In this project, I have scrapped Wikipedia to get data in Toronto. Also, I used the Foursquare API to explore neighborhoods in Toronto. I used the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. I used the *k*-means clustering algorithm to complete this task. Finally, I used the Folium library to visualize the neighborhoods in Toronto and their emerging clusters.

In [1]:
# Importing libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

  _nan_object_mask = _nan_object_array != _nan_object_array


Libraries imported.


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Employing Foursquare API</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

## 1. Download and Explore Dataset

Toronto has a total of 6 boroughs. In order to segement the neighborhoods and explore them, I essentially need a dataset that contains the 6 boroughs and the neighborhoods that exist in each borough as well as the the latitude and longitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web, and here is the link to the dataset: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

To get the latitude and longitudes, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

**Note:** I would have used the Geocoder package, but this package can be very unreliable.

## Scraping the table from Wikipedia

In [2]:
# Importing requests
import requests

In [3]:
# Scraping the table from Wikipedia
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [4]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"XfMDOwpAAEMAAChR@1cAAACQ","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":930529633,"wgRevisionId":930529633,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communi

In [5]:
table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

### Creating the dataframe from the scraped data

In [6]:
data = []
for row in table_rows:
    data.append([t.text.strip() for t in row.find_all('td')])

df = pd.DataFrame(data, columns=['PostalCode', 'Borough', 'Neighborhood'])
df = df[~df['PostalCode'].isnull()]  # to filter out bad rows

In [7]:
# Viewwing the first 5 rows of the data frame
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 287 entries, 1 to 287
Data columns (total 3 columns):
PostalCode      287 non-null object
Borough         287 non-null object
Neighborhood    287 non-null object
dtypes: object(3)
memory usage: 5.6+ KB


### Observations

Clearly, there are some Borough that are not assigned. I decided to remove such borough

In [9]:
df[df['Borough'] == 'Not assigned'].describe()

Unnamed: 0,PostalCode,Borough,Neighborhood
count,77,77,77
unique,77,1,1
top,M7W,Not assigned,Not assigned
freq,1,77,77


### Observations 

From the information about the data, there are 288 rows. However, I saw that 77 observations had Boroughs as not assigned.

Therefore, I dropped those rows and we have 211 rows left

In [10]:
# delete all rows with column 'Borough' has value Not assigned
names = df[ df['Borough'] == 'Not assigned'].index
names
df.drop(names , inplace=True)

In [11]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor


## Joining Neighborhoods with same Postal Code Area

For example, in the table on the Wikipedia page, one can see that **M5A** is listed twice and has two neighborhoods: **Harbourfront** and **Regent Park.** These two rows will be combined into one row with the neighborhoods separated with a comma.

In [12]:
#Combining two rows into one row with the neighborhoods separated with a comma
d = {'Borough': 'first','Neighborhood': ','.join}
df_new = df.groupby('PostalCode', as_index=False).aggregate(d).reindex(columns=df.columns)
print(df_new)

    PostalCode           Borough  \
0          M1B       Scarborough   
1          M1C       Scarborough   
2          M1E       Scarborough   
3          M1G       Scarborough   
4          M1H       Scarborough   
5          M1J       Scarborough   
6          M1K       Scarborough   
7          M1L       Scarborough   
8          M1M       Scarborough   
9          M1N       Scarborough   
10         M1P       Scarborough   
11         M1R       Scarborough   
12         M1S       Scarborough   
13         M1T       Scarborough   
14         M1V       Scarborough   
15         M1W       Scarborough   
16         M1X       Scarborough   
17         M2H        North York   
18         M2J        North York   
19         M2K        North York   
20         M2L        North York   
21         M2M        North York   
22         M2N        North York   
23         M2P        North York   
24         M2R        North York   
25         M3A        North York   
26         M3B        North 

In [13]:
# Putting in a dataframe
df_new = pd.DataFrame(df_new)

In [14]:
df_new

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [15]:
# Checking the shape of the data
df_new.shape

(103, 3)

## Merging the Latitudes and Longitudes to the dataframe above

Since my scrapped data is properly cleaned. I can add the csv file containing the latitudes and longitudes.

In [16]:
# Loading the CSV file
geo_data = pd.read_csv('Geospatial_Coordinates.csv')

In [17]:
# Dropping the postal code column in the CSV file
geo_data.drop('Postal Code', axis=1, inplace = True)

In [18]:
# This shows that the CSV file was properly loaded
geo_data.head()

Unnamed: 0,Latitude,Longitude
0,43.806686,-79.194353
1,43.784535,-79.160497
2,43.763573,-79.188711
3,43.770992,-79.216917
4,43.773136,-79.239476


In [19]:
# I checked the shape to make sure it has the same number of rows (103) with the data I scrapped and cleaned.
# Since they have the same shape. I can merge them

geo_data.shape

(103, 2)

In [20]:
# Merging the cleaned scrapped data and the cleaned Geo Spatial Data
geo_data_new = pd.concat([df_new, geo_data], axis =1)

In [21]:
# Viewing the first five rows of the data
geo_data_new.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [22]:
geo_data_new.Borough.unique()

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       "Queen's Park", 'Mississauga', 'Etobicoke'], dtype=object)

### Create a map of Toronto with neighborhoods superimposed on top.

In [23]:
# Toronto Latitude and Longitude
latitude = 43.6532
longitude = -79.3832

In [24]:
# create map of Toronto using latitude and longitude values
map_toron = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(geo_data_new['Latitude'], geo_data_new['Longitude'], geo_data_new['Borough'], 
                                           geo_data_new['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toron)  
    
map_toron
#map_toron.save('toronto_neigh_map.html') # Saving the map for viewing later 

**Folium** is a great visualization library. Clicking on each circle mark reveals the name of the neighborhood and its respective borough.

However, for illustration purposes, I simplify the above map and segment and cluster only the neighborhoods in Scarborough. So I sliced the original dataframe and create a new dataframe of the Scarborough data.

In [25]:
scar_data = geo_data_new[geo_data_new['Borough'] == 'Scarborough'].reset_index(drop=True)
scar_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


As I did with all of Toronto, I visualize Scarborough and the neighborhoods in it.

In [26]:
# Scarborough Latitude and Longitude
latitude = 43.7777
longitude = -79.2332

In [27]:
# create map of Scarborough using latitude and longitude values
map_scar = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(scar_data['Latitude'], scar_data['Longitude'], scar_data['Borough'], 
                                           scar_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scar)  
    
map_scar
#map_scar.save('Scarborough_neigh_map.html') # Saving the map of Scarborough for viewing later.

## 2. Employing Foursquare API

Next, I start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define my Foursquare Credentials and Version

In [28]:
CLIENT_ID = 'KS0GCILNWWLOOOOTCLGKSVSEWDHCMNLSDDNS1TX1MEITC1C3' # my Foursquare ID
CLIENT_SECRET = 'ACHJPQK2UUDDBV4S1R3JBCIOLDSMFEZN3TI3CFV33ZXLGU3F' # my Foursquare Secret
VERSION = '20180604'
LIMIT = 30
radius = 500

#### Exploring the first neighborhood in our dataframe.

In [29]:
# Get the Neighbourhood's name
scar_data.loc[0, 'Neighborhood']

'Rouge,Malvern'

In [30]:
# Get the neighbourhood's lotitude and longitude values
neighborhood_latitude = scar_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = scar_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = scar_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Rouge,Malvern are 43.806686299999996, -79.19435340000001.


#### Now, I got the top 100 venues that are in Rouge,Malvern within a radius of 5000 meters.

In [31]:
# Getting the URL
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 5000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=KS0GCILNWWLOOOOTCLGKSVSEWDHCMNLSDDNS1TX1MEITC1C3&client_secret=ACHJPQK2UUDDBV4S1R3JBCIOLDSMFEZN3TI3CFV33ZXLGU3F&v=20180604&ll=43.806686299999996,-79.19435340000001&radius=5000&limit=100'

In [32]:
#Send the GET requests and examine the results
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5dff42a9006dce001b662e75'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-542858a0498e22b7cfa91070-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/sports_outdoors_',
          'suffix': '.png'},
         'id': '4f4528bc4b90abdf24c9de85',
         'name': 'Athletics & Sports',
         'pluralName': 'Athletics & Sports',
         'primary': True,
         'shortName': 'Athletics & Sports'}],
       'id': '542858a0498e22b7cfa91070',
       'location': {'address': '875 Morningside Ave',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'distance': 1788,
        'formattedAddress': ['875 Morningside Ave',
         'Toronto ON M1C 0C7',
         'Canada'],
        'labeledLatLngs': [{'label': 'disp

In [33]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Cleaning the json and structure it into a *pandas* dataframe.

In [34]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Toronto Pan Am Sports Centre,Athletics & Sports,43.790623,-79.193869
1,African Rainforest Pavilion,Zoo Exhibit,43.817725,-79.183433
2,Toronto Zoo,Zoo,43.820582,-79.181551
3,Polar Bear Exhibit,Zoo,43.823372,-79.185145
4,Australasia Pavillion,Zoo Exhibit,43.822563,-79.183286


### And how many venues were returned by Foursquare?

In [35]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


## Explore Neighborhoods in Scarborough

#### Creating a function to repeat the same process to all the neighborhoods in Scarborough

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Creating a function on each neighborhood and create a new dataframe called *Scarborough_venues*.

In [37]:

scar_venues = getNearbyVenues(names=scar_data['Neighborhood'],
                                   latitudes=scar_data['Latitude'],
                                   longitudes=scar_data['Longitude']
                                  )


Rouge,Malvern
Highland Creek,Rouge Hill,Port Union
Guildwood,Morningside,West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park,Ionview,Kennedy Park
Clairlea,Golden Mile,Oakridge
Cliffcrest,Cliffside,Scarborough Village West
Birch Cliff,Cliffside West
Dorset Park,Scarborough Town Centre,Wexford Heights
Maryvale,Wexford
Agincourt
Clarks Corners,Sullivan,Tam O'Shanter
Agincourt North,L'Amoreaux East,Milliken,Steeles East
L'Amoreaux West
Upper Rouge


In [38]:
# Checking how many venues were returned for each neighborhood
scar_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,100,100,100,100,100,100
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",100,100,100,100,100,100
"Birch Cliff,Cliffside West",100,100,100,100,100,100
Cedarbrae,100,100,100,100,100,100
"Clairlea,Golden Mile,Oakridge",100,100,100,100,100,100
"Clarks Corners,Sullivan,Tam O'Shanter",100,100,100,100,100,100
"Cliffcrest,Cliffside,Scarborough Village West",100,100,100,100,100,100
"Dorset Park,Scarborough Town Centre,Wexford Heights",100,100,100,100,100,100
"East Birchmount Park,Ionview,Kennedy Park",100,100,100,100,100,100
"Guildwood,Morningside,West Hill",100,100,100,100,100,100


#### How many unique categories can be curated from all the returned venues?

In [39]:
print('There are {} uniques categories.'.format(len(scar_venues['Venue Category'].unique())))

There are 139 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Neighborhood

In [40]:
# one hot encoding
scar_onehot = pd.get_dummies(scar_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scar_onehot['Neighborhood'] = scar_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scar_onehot.columns[-1]] + list(scar_onehot.columns[:-1])
scar_onehot = scar_onehot[fixed_columns]

scar_onehot.head()

Unnamed: 0,Zoo Exhibit,Afghan Restaurant,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Store,Big Box Store,Bistro,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Butcher,Café,Campground,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Design Studio,Dessert Shop,Diner,Discount Store,Dog Run,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,French Restaurant,Fried Chicken Joint,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hardware Store,Health Food Store,History Museum,Hockey Arena,Hong Kong Restaurant,Hotel,Hotpot Restaurant,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Liquor Store,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Music Store,Nail Salon,National Park,Neighborhood,Noodle House,Optical Shop,Organic Grocery,Paper / Office Supplies Store,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Pool Hall,Portuguese Restaurant,Pub,Restaurant,Rock Climbing Spot,Sandwich Place,Science Museum,Seafood Restaurant,Shopping Mall,Shopping Plaza,Skating Rink,Smoothie Shop,Snack Place,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tattoo Parlor,Tea Room,Thai Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Zoo
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Rouge,Malvern",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [41]:
# Examining the new dataframe size.
scar_onehot.shape

(1682, 139)

#### Next, grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [42]:
scar_grouped = scar_onehot.groupby('Neighborhood').mean().reset_index()
scar_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Afghan Restaurant,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Store,Big Box Store,Bistro,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Butcher,Café,Campground,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Design Studio,Dessert Shop,Diner,Discount Store,Dog Run,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,French Restaurant,Fried Chicken Joint,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Hardware Store,Health Food Store,History Museum,Hockey Arena,Hong Kong Restaurant,Hotel,Hotpot Restaurant,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Liquor Store,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Music Store,Nail Salon,National Park,Noodle House,Optical Shop,Organic Grocery,Paper / Office Supplies Store,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Pool Hall,Portuguese Restaurant,Pub,Restaurant,Rock Climbing Spot,Sandwich Place,Science Museum,Seafood Restaurant,Shopping Mall,Shopping Plaza,Skating Rink,Smoothie Shop,Snack Place,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tattoo Parlor,Tea Room,Thai Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Zoo
0,Agincourt,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.05,0.1,0.0,0.01,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.05,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.03,0.0,0.01,0.02,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.03,0.03,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0
1,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.01,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.1,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.04,0.0,0.01,0.03,0.01,0.02,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.03,0.02,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0
2,"Birch Cliff,Cliffside West",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.01,0.04,0.0,0.01,0.08,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.02,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.06,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
3,Cedarbrae,0.0,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.02,0.04,0.03,0.0,0.01,0.0,0.09,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.03,0.05,0.0,0.0,0.01,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0
4,"Clairlea,Golden Mile,Oakridge",0.0,0.01,0.03,0.0,0.02,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.06,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.02,0.02,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0
5,"Clarks Corners,Sullivan,Tam O'Shanter",0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.02,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.07,0.08,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.02,0.0,0.03,0.01,0.0,0.02,0.0,0.05,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.05,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
6,"Cliffcrest,Cliffside,Scarborough Village West",0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.03,0.0,0.03,0.0,0.08,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.04,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.01,0.05,0.03,0.0,0.0,0.0,0.02,0.01,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0
7,"Dorset Park,Scarborough Town Centre,Wexford He...",0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.05,0.03,0.0,0.01,0.0,0.06,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.02,0.01,0.0,0.02,0.02,0.0,0.01,0.01,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.03,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.04,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0
8,"East Birchmount Park,Ionview,Kennedy Park",0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.04,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.08,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.02,0.04,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.06,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.07,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0
9,"Guildwood,Morningside,West Hill",0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.13,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.03,0.0,0.0,0.0,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.04,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.07,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0


In [43]:
# Confirming the new size
scar_grouped.shape

(17, 139)

#### Printing each neighborhood along with the top 5 most common venues

In [44]:
num_top_venues = 5

for hood in scar_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = scar_grouped[scar_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.10
1     Indian Restaurant  0.05
2  Caribbean Restaurant  0.05
3           Coffee Shop  0.04
4                Bakery  0.04


----Agincourt North,L'Amoreaux East,Milliken,Steeles East----
                  venue  freq
0    Chinese Restaurant  0.10
1                Bakery  0.06
2  Caribbean Restaurant  0.06
3     Indian Restaurant  0.04
4          Noodle House  0.03


----Birch Cliff,Cliffside West----
            venue  freq
0           Beach  0.08
1     Coffee Shop  0.06
2            Park  0.06
3          Bakery  0.04
4  Breakfast Spot  0.04


----Cedarbrae----
                  venue  freq
0           Coffee Shop  0.09
1     Indian Restaurant  0.06
2        Sandwich Place  0.06
3           Pizza Place  0.05
4  Fast Food Restaurant  0.05


----Clairlea,Golden Mile,Oakridge----
                       venue  freq
0                       Park  0.07
1                Coffee Shop  0.06
2                      Beach  0.

#### Putting that into a *pandas* dataframe
First, writing a function to sort the venues in descending order.

In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now, creating the new dataframe and display the top 10 venues for each neighborhood.

In [46]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scar_grouped['Neighborhood']

for ind in np.arange(scar_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scar_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Indian Restaurant,Caribbean Restaurant,Coffee Shop,Bakery,Park,Sushi Restaurant,Noodle House,Hakka Restaurant,Supermarket
1,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Chinese Restaurant,Bakery,Caribbean Restaurant,Indian Restaurant,Japanese Restaurant,Vietnamese Restaurant,Bubble Tea Shop,Noodle House,Breakfast Spot,Supermarket
2,"Birch Cliff,Cliffside West",Beach,Coffee Shop,Park,Breakfast Spot,Bakery,Café,BBQ Joint,Gastropub,Ice Cream Shop,Fish & Chips Shop
3,Cedarbrae,Coffee Shop,Indian Restaurant,Sandwich Place,Fast Food Restaurant,Pizza Place,Caribbean Restaurant,Breakfast Spot,Pharmacy,Park,Chinese Restaurant
4,"Clairlea,Golden Mile,Oakridge",Park,Coffee Shop,Beach,Middle Eastern Restaurant,Café,American Restaurant,Thai Restaurant,Skating Rink,Breakfast Spot,Gym / Fitness Center


<a id='item4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [47]:
# set number of clusters
kclusters = 5

scar_grouped_clustering = scar_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scar_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 1, 2, 1, 3, 2, 4, 4, 2])

#### Creating a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [48]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

scar_merged = scar_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
scar_merged = scar_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

scar_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,0,Zoo Exhibit,Coffee Shop,Pharmacy,Sandwich Place,Gas Station,Fast Food Restaurant,Pizza Place,Burger Joint,Fried Chicken Joint,Bank
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,0,Zoo Exhibit,Park,Coffee Shop,Pharmacy,Pizza Place,Beer Store,Breakfast Spot,Gas Station,Smoothie Shop,Fast Food Restaurant
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,2,Coffee Shop,Pharmacy,Sandwich Place,Bank,Fast Food Restaurant,Indian Restaurant,Park,Pizza Place,Gym,Gas Station
3,M1G,Scarborough,Woburn,43.770992,-79.216917,2,Coffee Shop,Fast Food Restaurant,Pizza Place,Pharmacy,Indian Restaurant,Breakfast Spot,Burger Joint,Sandwich Place,Caribbean Restaurant,Grocery Store
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2,Coffee Shop,Indian Restaurant,Sandwich Place,Fast Food Restaurant,Pizza Place,Caribbean Restaurant,Breakfast Spot,Pharmacy,Park,Chinese Restaurant


### Finally, visualizing the resulting clusters

In [49]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scar_merged['Latitude'], scar_merged['Longitude'], scar_merged['Neighborhood'], scar_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters
#map_clusters.save('Clustered_Scarborough_Neighborhood_Map.html') # Saving the map of clustered Scarborough for viewing later.

<a id='item5'></a>

## 5. Examine Clusters

Now, one can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, one can then assign a name to each cluster.

#### Cluster 1

In [50]:
scar_merged.loc[scar_merged['Cluster Labels'] == 0, scar_merged.columns[[1] + list(range(5, scar_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,0,Zoo Exhibit,Coffee Shop,Pharmacy,Sandwich Place,Gas Station,Fast Food Restaurant,Pizza Place,Burger Joint,Fried Chicken Joint,Bank
1,Scarborough,0,Zoo Exhibit,Park,Coffee Shop,Pharmacy,Pizza Place,Beer Store,Breakfast Spot,Gas Station,Smoothie Shop,Fast Food Restaurant
16,Scarborough,0,Zoo Exhibit,Coffee Shop,Fast Food Restaurant,Sandwich Place,Pizza Place,Pharmacy,Gas Station,Grocery Store,Hakka Restaurant,Burger Joint


For cluster One, **Zoo Exhibit** happen to be the most common venue. This tells me that people can go on excursion here and still find places to relax like the parks and coffee shops.

#### Cluster 2

In [51]:
scar_merged.loc[scar_merged['Cluster Labels'] == 1, scar_merged.columns[[1] + list(range(5, scar_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Scarborough,1,Park,Coffee Shop,Beach,Middle Eastern Restaurant,Café,American Restaurant,Thai Restaurant,Skating Rink,Breakfast Spot,Gym / Fitness Center
9,Scarborough,1,Beach,Coffee Shop,Park,Breakfast Spot,Bakery,Café,BBQ Joint,Gastropub,Ice Cream Shop,Fish & Chips Shop


Cluster 2 does not clearly have a distinctive characteristics differentiating the cluster. However, I can conclude and name this cluster **Relaxation Venues** because it occurs to me that most venues here are for relaxation (looking at the parks, Beach etc)

#### Cluster 3

In [52]:
scar_merged.loc[scar_merged['Cluster Labels'] == 2, scar_merged.columns[[1] + list(range(5, scar_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,2,Coffee Shop,Pharmacy,Sandwich Place,Bank,Fast Food Restaurant,Indian Restaurant,Park,Pizza Place,Gym,Gas Station
3,Scarborough,2,Coffee Shop,Fast Food Restaurant,Pizza Place,Pharmacy,Indian Restaurant,Breakfast Spot,Burger Joint,Sandwich Place,Caribbean Restaurant,Grocery Store
4,Scarborough,2,Coffee Shop,Indian Restaurant,Sandwich Place,Fast Food Restaurant,Pizza Place,Caribbean Restaurant,Breakfast Spot,Pharmacy,Park,Chinese Restaurant
8,Scarborough,2,Coffee Shop,Pharmacy,Sandwich Place,Park,Burger Joint,Grocery Store,Gym,Bank,Fast Food Restaurant,Pizza Place



In cluster three as seen above, the most common venue is the **Coffee Shop**. This means there are more coffee shops, and that is why the cluster is grouped this way. It is possible that most people in this neighborhood takes coffee.

#### Cluster 4

In [53]:
scar_merged.loc[scar_merged['Cluster Labels'] == 3, scar_merged.columns[[1] + list(range(5, scar_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,3,Chinese Restaurant,Indian Restaurant,Caribbean Restaurant,Coffee Shop,Bakery,Park,Sushi Restaurant,Noodle House,Hakka Restaurant,Supermarket
13,Scarborough,3,Chinese Restaurant,Caribbean Restaurant,Bakery,Middle Eastern Restaurant,Supermarket,Coffee Shop,Indian Restaurant,Breakfast Spot,Noodle House,Korean Restaurant
14,Scarborough,3,Chinese Restaurant,Bakery,Caribbean Restaurant,Indian Restaurant,Japanese Restaurant,Vietnamese Restaurant,Bubble Tea Shop,Noodle House,Breakfast Spot,Supermarket
15,Scarborough,3,Chinese Restaurant,Caribbean Restaurant,Bubble Tea Shop,Bakery,Japanese Restaurant,Supermarket,Middle Eastern Restaurant,Noodle House,Cantonese Restaurant,Pharmacy



Cluster Four can be named **Chinese Restaurant** because the most common value in the first cluster is the **Chinese Restaurant**. Well, who knows, there could be more Chinese staying or living around those neighborhoods.

#### Cluster 5

In [54]:
scar_merged.loc[scar_merged['Cluster Labels'] == 4, scar_merged.columns[[1] + list(range(5, scar_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough,4,Coffee Shop,Park,Indian Restaurant,Burger Joint,Pharmacy,Fast Food Restaurant,Supermarket,Chinese Restaurant,Pet Store,Burrito Place
6,Scarborough,4,Coffee Shop,Park,Middle Eastern Restaurant,Burger Joint,Indian Restaurant,Gym,Chinese Restaurant,Bakery,Pharmacy,Pet Store
10,Scarborough,4,Coffee Shop,Middle Eastern Restaurant,Indian Restaurant,Caribbean Restaurant,Supermarket,Chinese Restaurant,Pizza Place,Pharmacy,Burger Joint,Bookstore
11,Scarborough,4,Middle Eastern Restaurant,Coffee Shop,Supermarket,Restaurant,Mediterranean Restaurant,Chinese Restaurant,Burrito Place,Caribbean Restaurant,Indian Restaurant,Burger Joint


Cluster five occurs to me to be more of restaurants or eateries just like cluster three.

# Working on Etobicoke Borough

However, for illustration purposes, I simplify the above map and segment and cluster only the neighborhoods in Etobicoke. So I sliced the original dataframe and create a new dataframe of the Etobicoke data.

In [55]:
eto_data = geo_data_new[geo_data_new['Borough'] == 'Etobicoke'].reset_index(drop=True)
eto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M8V,Etobicoke,"Humber Bay Shores,Mimico South,New Toronto",43.605647,-79.501321
1,M8W,Etobicoke,"Alderwood,Long Branch",43.602414,-79.543484
2,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North",43.653654,-79.506944
3,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout...",43.636258,-79.498509
4,M8Z,Etobicoke,"Kingsway Park South West,Mimico NW,The Queensw...",43.628841,-79.520999


As I did with all of Toronto and Scarborough, I visualize Etobicoke and the neighborhoods in it.

In [56]:
# Etobicoke Latitude and Longitude
latitude = 43.6205
longitude = -79.5132

In [57]:
# create map of Etobicoke using latitude and longitude values
map_eto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(eto_data['Latitude'], eto_data['Longitude'], eto_data['Borough'], 
                                           eto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_eto)  
    
map_eto
#map_eto.save('Etobicoke_neigh_map.html') # Saving the map of Etobicoke for viewing later.

## 2. Employing Foursquare API

Next, I start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define my Foursquare Credentials and Version

In [58]:
CLIENT_ID = 'KS0GCILNWWLOOOOTCLGKSVSEWDHCMNLSDDNS1TX1MEITC1C3' # my Foursquare ID
CLIENT_SECRET = 'ACHJPQK2UUDDBV4S1R3JBCIOLDSMFEZN3TI3CFV33ZXLGU3F' # my Foursquare Secret
VERSION = '20180604'
LIMIT = 30
radius = 500

#### Exploring the first neighborhood in our dataframe.

In [59]:
# Get the Neighbourhood's name
eto_data.loc[0, 'Neighborhood']

'Humber Bay Shores,Mimico South,New Toronto'

In [60]:
# Get the neighbourhood's lotitude and longitude values
neighborhood_latitude = eto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = eto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = eto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Humber Bay Shores,Mimico South,New Toronto are 43.6056466, -79.50132070000001.


#### Now, I got the top 100 venues that are in Rouge,Malvern within a radius of 5000 meters.

In [61]:
# Getting the URL
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 5000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=KS0GCILNWWLOOOOTCLGKSVSEWDHCMNLSDDNS1TX1MEITC1C3&client_secret=ACHJPQK2UUDDBV4S1R3JBCIOLDSMFEZN3TI3CFV33ZXLGU3F&v=20180604&ll=43.6056466,-79.50132070000001&radius=5000&limit=100'

In [62]:
#Send the GET requests and examine the results
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5dff42e11835dd001bb39901'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-5395d784498e085ff3c18198-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/mexican_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1c1941735',
         'name': 'Mexican Restaurant',
         'pluralName': 'Mexican Restaurants',
         'primary': True,
         'shortName': 'Mexican'}],
       'id': '5395d784498e085ff3c18198',
       'location': {'address': '2888 Lakeshore Blvd. W.',
        'cc': 'CA',
        'city': 'Etobicoke',
        'country': 'Canada',
        'distance': 532,
        'formattedAddress': ['2888 Lakeshore Blvd. W.',
         'Etobicoke ON',
         'Canada'],
        'labeledLatLngs': [{'label': 'display',
          

In [63]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Cleaning the json and structure it into a *pandas* dataframe.

In [64]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Huevos Gourmet,Mexican Restaurant,43.601188,-79.503717
1,LCBO,Liquor Store,43.602281,-79.499302
2,Sweet Olenka's,Dessert Shop,43.601099,-79.500325
3,Kitchen on 6th,Breakfast Spot,43.601396,-79.504563
4,SanRemo Bakery,Bakery,43.618542,-79.499485


### And how many venues were returned by Foursquare?

In [65]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


## Explore Neighborhoods in Etobicoke

#### Creating a function to repeat the same process to all the neighborhoods in Etobicoke

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Creating a function on each neighborhood and create a new dataframe called *Etobicoke_venues*.

In [67]:
eto_venues = getNearbyVenues(names=eto_data['Neighborhood'],
                                   latitudes=eto_data['Latitude'],
                                   longitudes=eto_data['Longitude']
                                  )

Humber Bay Shores,Mimico South,New Toronto
Alderwood,Long Branch
The Kingsway,Montgomery Road,Old Mill North
Humber Bay,King's Mill Park,Kingsway Park South East,Mimico NE,Old Mill South,The Queensway East,Royal York South East,Sunnylea
Kingsway Park South West,Mimico NW,The Queensway West,Royal York South West,South of Bloor
Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park
Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe
Westmount
Kingsview Village,Martin Grove Gardens,Richview Gardens,St. Phillips
Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown
Northwest


In [68]:
# Checking how many venues were returned for each neighborhood
eto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",100,100,100,100,100,100
"Alderwood,Long Branch",100,100,100,100,100,100
"Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe",100,100,100,100,100,100
"Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park",100,100,100,100,100,100
"Humber Bay Shores,Mimico South,New Toronto",100,100,100,100,100,100
"Humber Bay,King's Mill Park,Kingsway Park South East,Mimico NE,Old Mill South,The Queensway East,Royal York South East,Sunnylea",100,100,100,100,100,100
"Kingsview Village,Martin Grove Gardens,Richview Gardens,St. Phillips",100,100,100,100,100,100
"Kingsway Park South West,Mimico NW,The Queensway West,Royal York South West,South of Bloor",100,100,100,100,100,100
Northwest,100,100,100,100,100,100
"The Kingsway,Montgomery Road,Old Mill North",100,100,100,100,100,100


#### How many unique categories can be curated from all the returned venues?

In [69]:
print('There are {} uniques categories.'.format(len(eto_venues['Venue Category'].unique())))

There are 116 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Neighborhood

In [70]:
# one hot encoding
eto_onehot = pd.get_dummies(eto_venues[['Venue Category']], prefix ="", prefix_sep ="")

# add neighborhood column back to dataframe
eto_onehot['Neighborhood'] = eto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [eto_onehot.columns[-1]] + list(eto_onehot.columns[:-1])
eto_onehot = eto_onehot[fixed_columns]

eto_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Lounge,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bar,Beer Bar,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridge,Burger Joint,Burrito Place,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Farmers Market,Field,Fish & Chips Shop,Flower Shop,Food,Food Court,French Restaurant,Furniture / Home Store,Garden,Gastropub,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Hardware Store,Historic Site,Hobby Shop,Hockey Arena,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Liquor Store,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Music Store,New American Restaurant,Nightclub,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Pool Hall,Portuguese Restaurant,Poutine Place,Racetrack,Recreation Center,Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Snack Place,Soccer Stadium,South American Restaurant,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,"Humber Bay Shores,Mimico South,New Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Humber Bay Shores,Mimico South,New Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Humber Bay Shores,Mimico South,New Toronto",0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Humber Bay Shores,Mimico South,New Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Humber Bay Shores,Mimico South,New Toronto",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [71]:
# Examining the new dataframe size.
eto_onehot.shape

(1100, 117)

#### Next, grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [72]:
eto_grouped = eto_onehot.groupby('Neighborhood').mean().reset_index()
eto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Lounge,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bar,Beer Bar,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridge,Burger Joint,Burrito Place,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Farmers Market,Field,Fish & Chips Shop,Flower Shop,Food,Food Court,French Restaurant,Furniture / Home Store,Garden,Gastropub,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Hardware Store,Historic Site,Hobby Shop,Hockey Arena,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Liquor Store,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Movie Theater,Music Store,New American Restaurant,Nightclub,Outdoor Supply Store,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Pool Hall,Portuguese Restaurant,Poutine Place,Racetrack,Recreation Center,Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Snack Place,Soccer Stadium,South American Restaurant,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.01,0.01,0.02,0.0,0.0,0.02,0.0,0.01,0.04,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.01,0.01,0.02,0.04,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.02,0.02,0.01,0.01,0.0,0.01,0.0,0.06,0.02,0.02,0.04,0.01,0.0,0.0,0.01,0.03,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.04,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.05,0.01,0.04,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.01,0.0
1,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.04,0.01,0.0,0.01,0.0,0.03,0.02,0.01,0.04,0.02,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.07,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.02,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.04,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.01,0.02,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01
2,"Bloordale Gardens,Eringate,Markland Wood,Old B...",0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.05,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.04,0.03,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.01,0.02,0.01,0.01,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.04,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.03,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.01
3,"Cloverdale,Islington,Martin Grove,Princess Gar...",0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.04,0.01,0.0,0.01,0.0,0.01,0.02,0.01,0.03,0.02,0.0,0.05,0.0,0.01,0.01,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.03,0.0,0.02,0.01,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.02,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.04,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.03,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01
4,"Humber Bay Shores,Mimico South,New Toronto",0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.02,0.02,0.01,0.0,0.01,0.03,0.01,0.03,0.02,0.0,0.04,0.0,0.0,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.01,0.04,0.01,0.01,0.01,0.01,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01
5,"Humber Bay,King's Mill Park,Kingsway Park Sout...",0.0,0.0,0.02,0.01,0.0,0.02,0.0,0.0,0.05,0.04,0.02,0.02,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.08,0.0,0.0,0.0,0.0,0.06,0.03,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.02,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.06,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.02,0.0,0.03,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01
6,"Kingsview Village,Martin Grove Gardens,Richvie...",0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.02,0.01,0.01,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.04,0.01,0.03,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.01,0.01,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.02,0.01,0.01,0.04,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.03,0.02,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01
7,"Kingsway Park South West,Mimico NW,The Queensw...",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.02,0.02,0.01,0.03,0.02,0.0,0.05,0.0,0.01,0.01,0.0,0.0,0.05,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.02,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.04,0.01,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.02,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01
8,Northwest,0.01,0.01,0.01,0.0,0.01,0.02,0.0,0.01,0.04,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.03,0.02,0.03,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.02,0.01,0.02,0.0,0.0,0.04,0.0,0.01,0.0,0.01,0.0,0.02,0.06,0.0,0.03,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.03,0.02,0.03,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.01
9,"The Kingsway,Montgomery Road,Old Mill North",0.0,0.0,0.02,0.01,0.0,0.02,0.0,0.0,0.02,0.04,0.02,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.05,0.04,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.02,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.02,0.03,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.08,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.03,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01


In [73]:
# Confirming the new size
eto_grouped.shape

(11, 117)

#### Printing each neighborhood along with the top 5 most common venues

In [74]:
num_top_venues = 5

for hood in eto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = eto_grouped[eto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown----
                venue  freq
0               Hotel  0.06
1          Steakhouse  0.05
2          Restaurant  0.04
3              Bakery  0.04
4  Chinese Restaurant  0.04


----Alderwood,Long Branch----
                venue  freq
0         Coffee Shop  0.07
1                Park  0.05
2  Seafood Restaurant  0.04
3              Bakery  0.04
4        Burger Joint  0.04


----Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe----
                    venue  freq
0                  Bakery  0.05
1            Burger Joint  0.04
2  Furniture / Home Store  0.04
3      Seafood Restaurant  0.04
4             Coffee Shop  0.04


----Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park----
           venue  freq
0           Café  0.05
1           Park  0.05
2         Bakery  0.04
3    Coffee Shop  0.04
4  Grocery Store  0.04


----Humber Bay Shores,Mimico South,New Toro

#### Putting that into a *pandas* dataframe
First, writing a function to sort the venues in descending order.

In [75]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now, creating the new dataframe and display the top 10 venues for each neighborhood.

In [76]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = eto_grouped['Neighborhood']

for ind in np.arange(eto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(eto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Hotel,Steakhouse,Restaurant,Bakery,Italian Restaurant,Sushi Restaurant,Chinese Restaurant,Liquor Store,Pizza Place,Coffee Shop
1,"Alderwood,Long Branch",Coffee Shop,Park,Burger Joint,Seafood Restaurant,Bakery,Furniture / Home Store,Pizza Place,Breakfast Spot,Liquor Store,Grocery Store
2,"Bloordale Gardens,Eringate,Markland Wood,Old B...",Bakery,Grocery Store,Seafood Restaurant,Coffee Shop,Burger Joint,Furniture / Home Store,Japanese Restaurant,Café,Middle Eastern Restaurant,Burrito Place
3,"Cloverdale,Islington,Martin Grove,Princess Gar...",Café,Park,Seafood Restaurant,Grocery Store,Coffee Shop,Bakery,Furniture / Home Store,Ice Cream Shop,Steakhouse,Burger Joint
4,"Humber Bay Shores,Mimico South,New Toronto",Park,Ice Cream Shop,Café,Restaurant,Seafood Restaurant,Bakery,Pizza Place,Burger Joint,Coffee Shop,Italian Restaurant


<a id='item4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [77]:
# set number of clusters
kclusters = 5

eto_grouped_clustering = eto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(eto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 4, 4, 2, 2, 0, 3, 2, 1, 0])

#### Creating a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [78]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

eto_merged = eto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
eto_merged = eto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

eto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M8V,Etobicoke,"Humber Bay Shores,Mimico South,New Toronto",43.605647,-79.501321,2,Park,Ice Cream Shop,Café,Restaurant,Seafood Restaurant,Bakery,Pizza Place,Burger Joint,Coffee Shop,Italian Restaurant
1,M8W,Etobicoke,"Alderwood,Long Branch",43.602414,-79.543484,4,Coffee Shop,Park,Burger Joint,Seafood Restaurant,Bakery,Furniture / Home Store,Pizza Place,Breakfast Spot,Liquor Store,Grocery Store
2,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North",43.653654,-79.506944,0,Park,Café,Cocktail Bar,Coffee Shop,Bar,Ice Cream Shop,Brewery,Grocery Store,Seafood Restaurant,Japanese Restaurant
3,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout...",43.636258,-79.498509,0,Café,Cocktail Bar,Park,Bakery,Restaurant,Bar,Ice Cream Shop,Seafood Restaurant,Coffee Shop,Pizza Place
4,M8Z,Etobicoke,"Kingsway Park South West,Mimico NW,The Queensw...",43.628841,-79.520999,2,Park,Café,Coffee Shop,Pizza Place,Seafood Restaurant,Italian Restaurant,Grocery Store,Restaurant,Bakery,Burger Joint


### Finally, visualizing the resulting clusters

In [79]:
# create map
eto_map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(eto_merged['Latitude'], eto_merged['Longitude'], eto_merged['Neighborhood'],  eto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(eto_map_clusters)
       
eto_map_clusters
#eto_map_clusters.save('Clustered_Etobicoke_Neighborhood_Map.html') # Saving the map of clustered Etobicoke for viewing later.

<a id='item5'></a>

## 5. Examine Clusters

Now, one can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, one can then assign a name to each cluster.

#### Cluster 1

In [80]:
eto_merged.loc[eto_merged['Cluster Labels'] == 0, eto_merged.columns[[1] + list(range(5, eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Etobicoke,0,Park,Café,Cocktail Bar,Coffee Shop,Bar,Ice Cream Shop,Brewery,Grocery Store,Seafood Restaurant,Japanese Restaurant
3,Etobicoke,0,Café,Cocktail Bar,Park,Bakery,Restaurant,Bar,Ice Cream Shop,Seafood Restaurant,Coffee Shop,Pizza Place


Cluster one does not clearly have a distinctive characteristics differentiating the cluster. However, I can conclude and name this cluster **Relaxation Venues** because it occurs to me that most venues here are for relaxation (looking at the parks, Cafe, Bars etc)

#### Cluster 2

In [81]:
eto_merged.loc[eto_merged['Cluster Labels'] == 1, eto_merged.columns[[1] + list(range(5, eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Etobicoke,1,Hotel,Steakhouse,Restaurant,Bakery,Italian Restaurant,Sushi Restaurant,Chinese Restaurant,Liquor Store,Pizza Place,Coffee Shop
10,Etobicoke,1,Coffee Shop,Hotel,Bakery,Grocery Store,Middle Eastern Restaurant,Chinese Restaurant,Café,Sushi Restaurant,Indian Restaurant,Steakhouse


Cluster 2 does not clearly have a distinctive characteristics differentiating the cluster. However, I can conclude and name this cluster **Housing Venues** because it occurs to me that most venues here are for housing people. One can see hotels, steak houses, coffee shops etc. It looks to me that this neighbourhood house people, possibly people who are travelling or come on a visit 

#### Cluster 3

In [82]:
eto_merged.loc[eto_merged['Cluster Labels'] == 2, eto_merged.columns[[1] + list(range(5, eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Etobicoke,2,Park,Ice Cream Shop,Café,Restaurant,Seafood Restaurant,Bakery,Pizza Place,Burger Joint,Coffee Shop,Italian Restaurant
4,Etobicoke,2,Park,Café,Coffee Shop,Pizza Place,Seafood Restaurant,Italian Restaurant,Grocery Store,Restaurant,Bakery,Burger Joint
5,Etobicoke,2,Café,Park,Seafood Restaurant,Grocery Store,Coffee Shop,Bakery,Furniture / Home Store,Ice Cream Shop,Steakhouse,Burger Joint


Cluster 3 does not clearly have a distinctive characteristics differentiating the cluster. However, I can conclude and name this cluster **Leisure Place** because it occurs to me that most venues here are for passing time during leisure (looking at the parks, Ice cream shops, cafe etc)

#### Cluster 4

In [83]:
eto_merged.loc[eto_merged['Cluster Labels'] == 3, eto_merged.columns[[1] + list(range(5, eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Etobicoke,3,Park,Hotel,Steakhouse,Coffee Shop,Grocery Store,Japanese Restaurant,Café,Eastern European Restaurant,Chinese Restaurant,Restaurant
8,Etobicoke,3,Hotel,Coffee Shop,Japanese Restaurant,Grocery Store,Park,Café,Steakhouse,Chinese Restaurant,Middle Eastern Restaurant,Seafood Restaurant


Cluster 4 looks somewhat like cluster one for Etobicoke. There is possibility that these two neighbourhoods look alike in terms of their environment, topography, and features

#### Cluster 5

In [84]:
eto_merged.loc[eto_merged['Cluster Labels'] == 4, eto_merged.columns[[1] + list(range(5, eto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Etobicoke,4,Coffee Shop,Park,Burger Joint,Seafood Restaurant,Bakery,Furniture / Home Store,Pizza Place,Breakfast Spot,Liquor Store,Grocery Store
6,Etobicoke,4,Bakery,Grocery Store,Seafood Restaurant,Coffee Shop,Burger Joint,Furniture / Home Store,Japanese Restaurant,Café,Middle Eastern Restaurant,Burrito Place


Cluster five occurs to me to be more of restaurants or eateries.