## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

    
<font size = 2>

0. <a href="#0.-Imports-and-variable-initialization">Imports and Variable Initialization</a>
    
</font>
<font size = 3>

1. <a href="#1.-Scraping-Wikipedia">Scraping Wikipedia</a>
    
    1.1. <a href="#Final-size-after-scraping-the-Wikipedia">DataFrame final size after scraping</a>
    
    1.2. <a href="#Enrich-neighbourhood-data-with-GPS-coordinates">Enrich neighbourhood data with GPS coordinates</a>

2. <a href="#2.-Use-Foursquare-API-to-get-information-about-venues">Use Foursquare API to get information about venues</a>
    
3. <a href="#3.-Explore-Neighborhoods-in-Toronto">Explore Neighbourhoods in Toronto</a>

4. <a href="#4.-Analyze-Each-Neighborhood">Analyze Each Neighborhood</a>

5. <a href="#5.-Cluster-Neighborhoods">Cluster Neighborhoods</a>

6. <a href="#6.-Examine-Clusters">Examine Clusters</a>
</font>
</div>

# 0. Imports and variable initialization

Run this cell if __folium__ library cannot be found

In [3]:
!pip install --user folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 22.6MB/s eta 0:00:01
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1


In [5]:
# to use geocoder need to install it first
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 9.3MB/s ta 0:00:011
[?25hCollecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [4]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

All necessary imports

In [19]:
import requests
from bs4 import BeautifulSoup

import pandas as pd
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np
import folium

# import k-means from clustering stage
from sklearn.cluster import KMeans

# geocoder library for retrieving coordinates using postal codes
import geocoder
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim, GoogleV3 # convert an address into latitude and longitude values
import time
import random
import math

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

Initialize project storage for files

In [7]:
# The code was removed by Watson Studio for sharing.

# 1. Scraping Wikipedia

<font size=2><a href="#Table-of-Contents">go to toc</a></font>

For scraping wikipedia page with postal codes of Toronto I will use __*BeautifulSoup*__ python library

### Scrape the wikipedia page using requests and BeautifulSoup libraries

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [3]:
# function which returns a soup object
def get_soup(url):
    """Returns a BeautifulSoup object for the provided url"""
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    return soup

In [4]:
soup = get_soup(url)

Find the table with needed data

In [15]:
find = soup.find(class_='wikitable').tbody

In [16]:
rows = find.find_all('tr')[1:]
rows[:3]

[<tr>
 <td>M1A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>, <tr>
 <td>M2A</td>
 <td>Not assigned</td>
 <td>Not assigned
 </td></tr>, <tr>
 <td>M3A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
 </td></tr>, <tr>
 <td>M4A</td>
 <td><a href="/wiki/North_York" title="North York">North York</a></td>
 <td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
 </td></tr>, <tr>
 <td>M5A</td>
 <td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
 <td><a href="/wiki/Regent_Park" title="Regent Park">Harbourfront</a>
 </td></tr>]

### Parse all the data in table row by row

In [54]:
postals = []
boroughs = []
neighs = {} # map: postal code to neighbourhoods list
for row in rows:
    postal = row.td.get_text(strip=True)
    borough = row.find_all('td')[1].get_text(strip=True)
    neigh = row.find_all('td')[2].get_text(strip=True)
    if borough != 'Not assigned':
        # If neighbourhood with the same postal code already exist
        if postal in neighs:
            # then add neighbourhood name after comma
            # *if neighbourhood is Not assigned use borough name instead
            neighs[postal] += (','+[neigh, borough][neigh=='Not assigned'])
        else:
            postals.append(postal)
            boroughs.append(borough)
            # else create new link
            neighs[postal] = [neigh, borough][neigh=='Not assigned']
        #print(postal, borough, neigh)
print('We have found {} neighbourhoods, {} postal codes.'.format(len(neighs), len(postals)))

We have found 103 neighbourhoods, 103 postal codes.


In [55]:
# Just to double check that all the data lists have the same size
len(postals), len(boroughs), len(neighs)

(103, 103, 103)

### Create DataFrame with all the scraped data

In [56]:
# Fill only postals and boroughs data 
toronto = pd.DataFrame(zip(postals, boroughs), columns=['postal_code', 'borough'])
toronto.head()

Unnamed: 0,postal_code,borough
0,M3A,North York
1,M4A,North York
2,M5A,Downtown Toronto
3,M6A,North York
4,M7A,Downtown Toronto


#### Fill the neighbourhoods column

In [57]:
# Create new neighbourhood column and fill it with corresponding postal_codes
toronto['neighbourhood'] = toronto['postal_code']
# replace all the postal codes in neighbourhood column using neighs dictionary
toronto['neighbourhood'].replace(neighs, inplace=True)
toronto.head()

Unnamed: 0,postal_code,borough,neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


In [59]:
# just to make sure that all postal codes were replaced
toronto[toronto.postal_code == toronto.neighbourhood]

Unnamed: 0,postal_code,borough,neighbourhood


### Final size after scraping the Wikipedia

In [60]:
toronto.shape

(103, 3)

### Save resulted DataFrame for future use

In [62]:
#project.save_data('toronto_data.csv', toronto.to_csv())

{'file_name': 'toronto_data.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'applieddatasciencecapstoneproject-donotdelete-pr-4kvrowiaaunvu2',
 'asset_id': '686bddf8-9fea-4647-a155-3c93f0616099'}

### Read saved result from file

In [8]:
toronto = pd.read_csv(project.get_file('toronto_data.csv'), index_col=0)
toronto.head()

Unnamed: 0,postal_code,borough,neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


## Enrich neighbourhood data with GPS coordinates

Make empty columns for coordinates

In [9]:
toronto['latitude'] = pd.Series()
toronto['longitude'] = pd.Series()
toronto.head()

Unnamed: 0,postal_code,borough,neighbourhood,latitude,longitude
0,M3A,North York,Parkwoods,,
1,M4A,North York,Victoria Village,,
2,M5A,Downtown Toronto,Harbourfront,,
3,M6A,North York,"Lawrence Heights,Lawrence Manor",,
4,M7A,Downtown Toronto,Queen's Park,,


In [84]:
# The code was removed by Watson Studio for sharing.

In [None]:
#g = geocoder.arcgis('M3A, Toronto, Ontario')
#g = geocoder.google('M3A, Toronto, Ontario', key=GOOGLE_KEY)
#g = geocoder.geolytica('M1B, Toronto, Canada')

In [35]:
g = geocoder.google('M1B, Toronto, Ontario', components="country:CA")
g

<[REQUEST_DENIED] Google - Geocode [empty]>

After latest changes in Google Maps Geocoding API looks like geocoder with google doesn't work anymore.
I will use __*ArcGis*__ for geocoding

In [11]:
MAX_TRIES = 2
found = 0

for index, item in toronto.iterrows():
    if not np.isnan(item.latitude):
        continue
    
    # Google
    # lat_lon = None
    location = None
    tries = 0
    
    while((location is None) and (tries <= MAX_TRIES)):
        # ArcGIS
        location = geocoder.arcgis('{}, Toronto, Ontario'.format(item.postal_code))
        # Google
#        location = geocoder.google('{}, Toronto, Ontario'.format(item.postal_code), key=GOOGLE_KEY)
        # lat_lon = location.latlng
        tries += 1
        if location is None:
            # delay before next retry
            time.sleep(0.3 + .2 * random.random() )
    
    if location:
        # Google
        #toronto.loc[index, 'latitude'] = lat_lon[0]
        #toronto.loc[index, 'longitude'] = lat_lon[1]
        # ArcGis
        toronto.loc[index, 'latitude'] = location.json['lat']
        toronto.loc[index, 'longitude'] = location.json['lng']
        found += 1
    else:
        print('%s skipped' % item.postal_code)
        
    # make small delay before next request not to overload the server
    time.sleep(1 + .3 * random.random() )
print('%i coordinates found' % found)

103 coordinates found


In [14]:
print("{} out of {} coordinates found ({}%)".format(found, toronto.shape[0], round(100*found/toronto.shape[0], 2)))

103 out of 103 coordinates found (100.0%)


#### Nominatim

Please, skip this section of document. This was used for geocoding during first try. I cannot get all the information from ArcGIS, so I used Nominatim insted

In [22]:
locator = Nominatim(user_agent="toronto_explorer_coursera")

In [23]:
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

In [31]:
MAX_TRIES = 5
found = 0

for index, item in toronto.iterrows():
    if not np.isnan(item.latitude):
        continue
        
    location = None
    tries = 0
    
    while((location is None) and (tries <= MAX_TRIES)):
        location = geocode('{}, Toronto, Ontario'.format(item.postal_code))
        tries += 1
    
    if location:
        toronto.loc[index, 'latitude'] = location.latitude
        toronto.loc[index, 'longitude'] = location.longitude
        found += 1
    else:
        print('%s skipped' % item.postal_code)
        
    # make small delay before next request not to overload the server
    #time.sleep(1 + .3 * random.random() )
print('%i coordinates found' % found)

M4A skipped
M5A skipped
M6A skipped
M9A skipped
M3B skipped
M4B skipped
M5B skipped
M6B skipped
M4C skipped
M5C skipped
M6C skipped
M1E skipped
M4E skipped
M6E skipped
M4G skipped
M5G skipped
M6G skipped
M1H skipped
M2H skipped
M4H skipped
M6H skipped
M1J skipped
M4J skipped
M1K skipped
M2K skipped
M3K skipped
M4K skipped
M5K skipped
M1L skipped


RateLimiter caught an error, retrying (0/2 tries). Called with (*('M2L, Toronto, Ontario',), **{}).
Traceback (most recent call last):
  File "/opt/conda/envs/Python36/lib/python3.6/urllib/request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/opt/conda/envs/Python36/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/conda/envs/Python36/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/conda/envs/Python36/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/conda/envs/Python36/lib/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/opt/conda/envs/Python36/lib/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/opt/conda/envs/Python36/lib/python3.6/http

M2L skipped
M3L skipped
M5L skipped
M6L skipped
M9L skipped
M1M skipped
M3M skipped
M4M skipped
M5M skipped
M6M skipped
M9M skipped
M1N skipped
M3N skipped
M4N skipped
M5N skipped
M9N skipped
M1P skipped
M2P skipped
M4P skipped
M5P skipped
M9P skipped
M1R skipped
M2R skipped
M4R skipped
M5R skipped
M6R skipped
M7R skipped
M1S skipped
M4S skipped
M5S skipped
M1T skipped
M4T skipped
M5T skipped
M1V skipped
M4V skipped
M8V skipped
M9V skipped
M4W skipped
M5W skipped
M8W skipped
M9W skipped
M1X skipped
M5X skipped
M8X skipped
M4Y skipped
M7Y skipped
M8Y skipped
M8Z skipped
26 coordinates found


### Save file with coordinates

In [15]:
project.save_data('toronto_enriched_arcgis_data.csv', toronto.to_csv())

{'file_name': 'toronto_enriched_arcgis_data.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'applieddatasciencecapstoneproject-donotdelete-pr-4kvrowiaaunvu2',
 'asset_id': 'c1ad7320-f88b-4183-9de2-529656a81a20'}

## Let's verify coordinates from file in assignment and our requested from ArcGis&Nominatim

To make sure that all the received coordinates from ArcGIS are accurate I want to check it with document from Assignment instructions

I downloaded document with coordinates from https://cocl.us/Geospatial_data and saved it to the project.

In [17]:
geo_df = pd.read_csv(project.get_file('Geospatial_Coordinates.csv'), index_col=0)
geo_df.head()

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


In [18]:
geo_df.columns=['latitude', 'longitude']
geo_df.index.name = 'postal_code'
geo_df.head()

Unnamed: 0_level_0,latitude,longitude
postal_code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


Let's prepare data to calculate distance between coordinates from ArcGIS and referential data (downloaded from submission link)

In [24]:
geo_df.sort_values(by='postal_code', inplace=True)
geo_df.shape, geo_df.head(), geo_df.tail()

((103, 2),               latitude  longitude
 postal_code                      
 M1B          43.806686 -79.194353
 M1C          43.784535 -79.160497
 M1E          43.763573 -79.188711
 M1G          43.770992 -79.216917
 M1H          43.773136 -79.239476,               latitude  longitude
 postal_code                      
 M9N          43.706876 -79.518188
 M9P          43.696319 -79.532242
 M9R          43.688905 -79.554724
 M9V          43.739416 -79.588437
 M9W          43.706748 -79.594054)

In [22]:
toronto_geo = toronto.loc[:, ['postal_code', 'latitude', 'longitude']]
toronto_geo.set_index('postal_code', inplace=True)
toronto_geo.sort_values(by='postal_code', inplace=True)
toronto_geo.shape, toronto_geo.head(), toronto_geo.tail()

((103, 2),               latitude  longitude
 postal_code                      
 M1B          43.811525 -79.195517
 M1C          43.785665 -79.158725
 M1E          43.765815 -79.175193
 M1G          43.768369 -79.217590
 M1H          43.769688 -79.239440,               latitude  longitude
 postal_code                      
 M9N          43.704845 -79.517546
 M9P          43.696505 -79.530252
 M9R          43.686810 -79.557284
 M9V          43.743145 -79.584664
 M9W          43.711740 -79.579181)

To calculate distance between two points I use following function.
Taken from here: https://stackoverflow.com/questions/365826/calculate-distance-between-2-gps-coordinates

In [26]:
def degToRad(degrees):
    return degrees * math.pi / 180;


def distanceInKmBetweenEarthCoordinates(lat1, lon1, lat2, lon2):
    # Haversine formula
    earthRadiusKm = 6371;

    dLat = degToRad(lat2-lat1);
    dLon = degToRad(lon2-lon1);

    lat1 = degToRad(lat1);
    lat2 = degToRad(lat2);

    a = math.sin(dLat/2) ** 2 + \
          math.sin(dLon/2) ** 2 * math.cos(lat1) * math.cos(lat2); 
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a)); 
    return earthRadiusKm * c;

In [27]:
geo_df['lat2'] = toronto_geo.latitude
geo_df['lon2'] = toronto_geo.longitude
geo_df.head()

Unnamed: 0_level_0,latitude,longitude,lat2,lon2
postal_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M1B,43.806686,-79.194353,43.811525,-79.195517
M1C,43.784535,-79.160497,43.785665,-79.158725
M1E,43.763573,-79.188711,43.765815,-79.175193
M1G,43.770992,-79.216917,43.768369,-79.21759
M1H,43.773136,-79.239476,43.769688,-79.23944


In [28]:
geo_df['distance'] = geo_df.apply(lambda i: distanceInKmBetweenEarthCoordinates(i.latitude, i.longitude, i.lat2, i.lon2), axis=1)
geo_df.head()

Unnamed: 0_level_0,latitude,longitude,lat2,lon2,distance
postal_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1B,43.806686,-79.194353,43.811525,-79.195517,0.546087
M1C,43.784535,-79.160497,43.785665,-79.158725,0.189821
M1E,43.763573,-79.188711,43.765815,-79.175193,1.113854
M1G,43.770992,-79.216917,43.768369,-79.21759,0.29662
M1H,43.773136,-79.239476,43.769688,-79.23944,0.383411


In [29]:
geo_df.describe()

Unnamed: 0,latitude,longitude,lat2,lon2,distance
count,103.0,103.0,103.0,103.0,103.0
mean,43.704608,-79.397153,43.704649,-79.394625,0.690483
std,0.052463,0.097146,0.052485,0.094421,1.884545
min,43.602414,-79.615819,43.601131,-79.584664,0.009285
25%,43.660567,-79.464763,43.658649,-79.451152,0.193943
50%,43.696948,-79.38879,43.69677,-79.385964,0.385005
75%,43.74532,-79.340923,43.74552,-79.345634,0.615509
max,43.836125,-79.160497,43.834215,-79.158725,18.583662


In [48]:
# let's show top 10 differences
distances = geo_df.sort_values('distance', ascending=False).head(20)
distances

Unnamed: 0_level_0,latitude,longitude,lat2,lon2,distance
postal_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M7R,43.636966,-79.615819,43.64869,-79.38544,18.583662
M7Y,43.662744,-79.321558,43.64869,-79.38544,5.371623
M3L,43.739015,-79.506944,43.72014,-79.51698,2.248349
M5J,43.640816,-79.381752,43.63021,-79.362433,1.951398
M5V,43.628947,-79.39442,43.640815,-79.399538,1.382475
M9W,43.706748,-79.594054,43.71174,-79.579181,1.318036
M1M,43.716316,-79.239476,43.724235,-79.227925,1.279452
M6A,43.718518,-79.464763,43.72327,-79.451286,1.205087
M4C,43.695344,-79.318389,43.68964,-79.306874,1.122214
M1E,43.763573,-79.188711,43.765815,-79.175193,1.113854


Let's visualize the differences on map

In [47]:
latitude = 43.717899
longitude = -79.395

# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, postal_code, lat2, lon2 in zip(distances['latitude'], distances['longitude'], distances.index, geo_df['lat2'], distances['lon2']):
    label = '{}'.format(postal_code)
    
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='green',
        fill=True,
        fill_color='#53cc31',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  

    label = folium.Popup(postal_code + ' arcgis', parse_html=True)
    folium.CircleMarker(
        [lat2, lon2],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#cc1f1f',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)
    
    folium.vector_layers.PolyLine([(lat, lng), (lat2, lon2)], color='black', opacity=0.4).add_to(toronto_map)

toronto_map

__Looks like ArcGIS returned coordinates are not very reliable source. Mean error of 690 meters is very high error. For further analysis I will use coordinates provided in assignment description(https://cocl.us/Geospatial_data).__

In [56]:
geospatial_data = pd.read_csv(project.get_file('Geospatial_Coordinates.csv'))
geospatial_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [50]:
#toronto = pd.read_csv(project.get_file('toronto_data.csv'), index_col=0)
#toronto.head()

Unnamed: 0,postal_code,borough,neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


In [52]:
toronto.drop(columns=['latitude', 'longitude'], inplace=True) # drop columns with previous coordinates

In [53]:
toronto.shape, toronto.head() # toronto neighbourhoods scraped from wikipedia

((103, 3),   postal_code           borough                    neighbourhood
 0         M3A        North York                        Parkwoods
 1         M4A        North York                 Victoria Village
 2         M5A  Downtown Toronto                     Harbourfront
 3         M6A        North York  Lawrence Heights,Lawrence Manor
 4         M7A  Downtown Toronto                     Queen's Park)

In [57]:
geospatial_data.shape # referential coordinates from coursera

(103, 3)

In [58]:
toronto_data = toronto.merge(geospatial_data, left_on='postal_code', right_on='Postal Code', how='left')
toronto_data.head()

Unnamed: 0,postal_code,borough,neighbourhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,M5A,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",M6A,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,M7A,43.662301,-79.389494


In [59]:
toronto_data.tail()

Unnamed: 0,postal_code,borough,neighbourhood,Postal Code,Latitude,Longitude
98,M8X,Etobicoke,"The Kingsway,Montgomery Road,Old Mill North",M8X,43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,M4Y,43.66586,-79.38316
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,M7Y,43.662744,-79.321558
101,M8Y,Etobicoke,"Humber Bay,King's Mill Park,Kingsway Park Sout...",M8Y,43.636258,-79.498509
102,M8Z,Etobicoke,"Kingsway Park South West,Mimico NW,The Queensw...",M8Z,43.628841,-79.520999


In [60]:
toronto_data.shape

(103, 6)

In [61]:
# Save intermediate results to file
project.save_data('toronto_full.csv', toronto_geo.to_csv())
# File with list of neighbourhoods scraped from wikipedia and coordinates provided by course supervisors

{'file_name': 'toronto_full.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'applieddatasciencecapstoneproject-donotdelete-pr-4kvrowiaaunvu2',
 'asset_id': '554631f3-8ea5-40d7-bcb4-8ee096b601c2'}

In [58]:
toronto_data = pd.read_csv(project.get_file('toronto_full.csv'), index_col = 0)
toronto_data.head()

Unnamed: 0,postal_code,borough,neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


In [61]:
toronto_data.shape

(103, 6)

# Visualize Map of Toronto with neighbourhoods

In [62]:
latitude = 43.717899
longitude = -79.395

# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['borough'], toronto_data['neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

### Filter out outside neighbourhoods

In [63]:
toronto_data['borough'].unique()

array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [66]:
toronto_boroughs = ['North York',  
                    'Etobicoke', 
                    'East York',
                    'York',
                    'Scarborough',
                    'Downtown Toronto',
                    'East Toronto',
                    'West Toronto',
                    'Central Toronto']

According to https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto only 9 boroughs are part of Toronto. Let's filter out

In [76]:
toronto_data = toronto_data[toronto_data['borough'].isin(toronto_boroughs)].reset_index(drop=True)
toronto_data.shape

(102, 6)

In [75]:
toronto_data.head()

Unnamed: 0,postal_code,borough,neighbourhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,M5A,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",M6A,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,M7A,43.662301,-79.389494


In [77]:
# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['borough'], toronto_data['neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

# 2. Use Foursquare API to get information about venues

<font size=2><a href="#Table-of-Contents">go to toc</a></font>

#### Define Foursquare credentials

In [78]:
# The code was removed by Watson Studio for sharing.

Your credentials set


In [82]:
neigh_lat = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neigh_lon = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neigh_name = toronto_data.loc[0, 'neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neigh_name, 
                                                               neigh_lat, 
                                                               neigh_lon))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [84]:
url = "https://api.foursquare.com/v2/venues/explore?\
client_id={cid}&client_secret={csecret}&v={version}&ll={lat},{lon}&radius={rad}&limit={limit}"\
.format(cid=CLIENT_ID, csecret=CLIENT_SECRET, version=VERSION,\
        lat=neigh_lat, lon=neigh_lon, 
        rad=500, 
        limit=100)

print('Url set')
#url

Url set


In [85]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e667560949393001b71f66f'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

In [86]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [87]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,TTC stop - 44 Valley Woods,Bus Stop,43.755402,-79.333741


## 3. Explore Neighborhoods in Toronto

<font size=2><a href="#Table-of-Contents">go to toc</a></font>

### 3.1 Get neighbourhood venues data using Foursquare API

Let's use the same function as in hand-on lab to explore neighbourhoods in Toronto

In [88]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    err_count = 0
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name, end=' : ')
        
        if err_count >= 5:
            break
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        response = requests.get(url)
        if response:
            try:
                results = response.json()["response"]['groups'][0]['items']
            except KeyError as err:
                print("PARSING ERROR: {}:\nRESPONSE: {}".format(err, response.text))
                err_count += 1
                continue
        else:
            print("ERROR: {}".format(response.text))
            err_count += 1
            continue
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
        print(len(venues_list[-1]))


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [89]:
# get info about venues in Toronto
toronto_venues = getNearbyVenues(names=toronto_data['neighbourhood'],
                                latitudes=toronto_data['Latitude'],
                                longitudes=toronto_data['Longitude'],
                                radius=1000)

Parkwoods : 29
Victoria Village : 13
Harbourfront : 100
Lawrence Heights,Lawrence Manor : 49
Queen's Park : 100
Islington Avenue : 12
Rouge,Malvern : 16
Don Mills North : 30
Woodbine Gardens,Parkview Hill : 19
Ryerson,Garden District : 100
Glencairn : 31
Cloverdale,Islington,Martin Grove,Princess Gardens,West Deane Park : 17
Highland Creek,Rouge Hill,Port Union : 5
Flemingdon Park,Don Mills South : 44
Woodbine Heights : 27
St. James Town : 100
Humewood-Cedarvale : 26
Bloordale Gardens,Eringate,Markland Wood,Old Burnhamthorpe : 17
Guildwood,Morningside,West Hill : 25
The Beaches : 81
Berczy Park : 100
Caledonia-Fairbanks : 22
Woburn : 8
Leaside : 62
Central Bay Street : 100
Christie : 100
Cedarbrae : 32
Hillcrest Village : 21
Bathurst Manor,Downsview North,Wilson Heights : 29
Thorncliffe Park : 50
Adelaide,King,Richmond : 100
Dovercourt Village,Dufferin : 71
Scarborough Village : 12
Fairview,Henry Farm,Oriole : 44
Northwood Park,York University : 22
East Toronto : 100
Harbourfront East,

In [90]:
toronto_venues.shape

(4842, 7)

In [91]:
toronto_venues.tail()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
4837,"Kingsway Park South West,Mimico NW,The Queensw...",43.628841,-79.520999,High Seas Restaurant,43.636058,-79.520502,Mediterranean Restaurant
4838,"Kingsway Park South West,Mimico NW,The Queensw...",43.628841,-79.520999,Mr.Sub,43.636174,-79.520655,Restaurant
4839,"Kingsway Park South West,Mimico NW,The Queensw...",43.628841,-79.520999,Queensway Fish & Chips,43.62172,-79.524588,Fish & Chips Shop
4840,"Kingsway Park South West,Mimico NW,The Queensw...",43.628841,-79.520999,Pet Valu,43.624136,-79.511427,Pet Store
4841,"Kingsway Park South West,Mimico NW,The Queensw...",43.628841,-79.520999,Sleep Country,43.62134,-79.526708,Mattress Store


In [93]:
# what is mean number of venues per Neighbourhood
toronto_venues['Neighborhood'].value_counts().mean()

47.94059405940594

In [95]:
# toronto_venues_1000.csv - File which contains top picks for all neighbourhoods in radius of 1000 meters 
# toronto_venues_1000.csv - File which contains top picks for all neighbourhoods in radius of 500 meters (First run)
#project.save_data('toronto_venues_1000.csv', toronto_venues.to_csv(), overwrite=True)

{'file_name': 'toronto_venues_1000.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'applieddatasciencecapstoneproject-donotdelete-pr-4kvrowiaaunvu2',
 'asset_id': '312dadcf-b334-4a72-90ab-8f77896d6ef9'}

In [37]:
toronto_venues = pd.read_csv(project.get_file('toronto_venues_1000.csv'), index_col=0)

In [97]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


## 4. Analyze Each Neighborhood

<font size=2><a href="#Table-of-Contents">go to toc</a></font>

#### Remove venues with Category = 'Neighborhood'

Looks like there are venues with category Neighborhood which are not actually venues. Let's delete them before proceeding with analysis

In [99]:
neighbourhood_category = toronto_venues[toronto_venues["Venue Category"] == 'Neighborhood']
neighbourhood_category

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
428,"Ryerson,Garden District",43.657162,-79.378937,Downtown Toronto,43.653232,-79.385296,Neighborhood
819,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
921,Berczy Park,43.644771,-79.373306,Harbourfront,43.639526,-79.380688,Neighborhood
1046,Central Bay Street,43.657952,-79.387383,Downtown Toronto,43.653232,-79.385296,Neighborhood
1371,"Adelaide,King,Richmond",43.650571,-79.384568,Downtown Toronto,43.653232,-79.385296,Neighborhood
1715,"Harbourfront East,Toronto Islands,Union Station",43.640816,-79.381752,Harbourfront,43.639526,-79.380688,Neighborhood
2206,"Brockton,Exhibition Place,Parkdale Village",43.636847,-79.428191,Parkdale,43.640524,-79.4322,Neighborhood
2568,Studio District,43.659526,-79.340923,Leslieville,43.66207,-79.337856,Neighborhood
3988,"Moore Park,Summerhill East",43.689574,-79.38316,Summerhill,43.682976,-79.389123,Neighborhood
4204,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049,Summerhill,43.682976,-79.389123,Neighborhood


In [100]:
# removing items
toronto_venues.drop(neighbourhood_category.index.values, axis=0, inplace=True)

In [101]:
# rebuild index
toronto_venues.reset_index(drop=True)
toronto_venues.shape

(4831, 7)

#### Onehot-encode categories into columns

In [102]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot.head()

Unnamed: 0,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundry Service,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Photography Lab,Photography Studio,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [103]:
# add neighborhood column back to dataframe
toronto_onehot.insert(0, 'Neighborhood', toronto_venues['Neighborhood'])
toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundry Service,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Photography Lab,Photography Studio,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [104]:
toronto_onehot.shape

(4831, 330)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [105]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundry Service,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Photography Lab,Photography Studio,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.020202,0.0,0.0,0.0,0.010101,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.020202,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.050505,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.020202,0.010101,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.040404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.020202,0.0,0.010101,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.020202,0.040404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.039216,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.039216,0.0,0.0,0.0,0.137255,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.058824,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.233333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [106]:
toronto_grouped.shape

(101, 330)

#### Let's print each neighborhood along with the top 5 most common venues

In [107]:
num_top_venues = 5

for neigh in toronto_grouped['Neighborhood']:
    print("----"+neigh+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == neigh].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
         venue  freq
0  Coffee Shop  0.06
1         Café  0.06
2        Hotel  0.05
3      Theater  0.04
4   Restaurant  0.04


----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.14
1         Shopping Mall  0.06
2           Coffee Shop  0.04
3  Caribbean Restaurant  0.04
4            Restaurant  0.04


----Agincourt North,L'Amoreaux East,Milliken,Steeles East----
                venue  freq
0  Chinese Restaurant  0.23
1   Korean Restaurant  0.07
2         Pizza Place  0.07
3              Bakery  0.07
4                Park  0.07


----Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown----
                  venue  freq
0         Grocery Store  0.18
1           Pizza Place  0.18
2   Fried Chicken Joint  0.06
3  Fast Food Restaurant  0.06
4          Liquor Store  0.06


----Alderwood,Long Branch----
            venue  freq
0        Pharmacy  0.12
1  Discount Store  0.12
2     Pizza Pl

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [108]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [109]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Hotel,Theater,Restaurant,Bakery,Tea Room,Sushi Restaurant,Beer Bar,Breakfast Spot
1,Agincourt,Chinese Restaurant,Shopping Mall,Coffee Shop,Bakery,Restaurant,Caribbean Restaurant,Sandwich Place,Pizza Place,Malay Restaurant,Latin American Restaurant
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Chinese Restaurant,Bakery,Pizza Place,Noodle House,Korean Restaurant,Park,Gym,Fast Food Restaurant,Event Space,Malay Restaurant
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Grocery Store,Pizza Place,Gym Pool,Bus Line,Liquor Store,Caribbean Restaurant,Fast Food Restaurant,Sandwich Place,Fried Chicken Joint,Hardware Store
4,"Alderwood,Long Branch",Discount Store,Pharmacy,Pizza Place,Pool,Coffee Shop,Convenience Store,Skating Rink,Shopping Mall,Donut Shop,Liquor Store


## 5. Cluster Neighborhoods

<font size=2><a href="#Table-of-Contents">go to toc</a></font>

Run *k*-means to cluster the neighborhood into 5 clusters.

In [110]:
# set number of clusters
kclusters = 5

# drop Neighborhood column as it's redundant for analysis
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, n_init=20).fit(toronto_grouped_clustering) #, random_state=0

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 0, 0, 1, 2, 1, 1, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [111]:
# add clustering labels
try:
    neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
except ValueError:
    neighborhoods_venues_sorted['Cluster Labels'] = kmeans.labels_
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,"Adelaide,King,Richmond",Coffee Shop,Café,Hotel,Theater,Restaurant,Bakery,Tea Room,Sushi Restaurant,Beer Bar,Breakfast Spot
1,0,Agincourt,Chinese Restaurant,Shopping Mall,Coffee Shop,Bakery,Restaurant,Caribbean Restaurant,Sandwich Place,Pizza Place,Malay Restaurant,Latin American Restaurant
2,0,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Chinese Restaurant,Bakery,Pizza Place,Noodle House,Korean Restaurant,Park,Gym,Fast Food Restaurant,Event Space,Malay Restaurant
3,0,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Grocery Store,Pizza Place,Gym Pool,Bus Line,Liquor Store,Caribbean Restaurant,Fast Food Restaurant,Sandwich Place,Fried Chicken Joint,Hardware Store
4,0,"Alderwood,Long Branch",Discount Store,Pharmacy,Pizza Place,Pool,Coffee Shop,Convenience Store,Skating Rink,Shopping Mall,Donut Shop,Liquor Store


In [112]:
toronto_data.shape

(102, 6)

In [113]:
neighborhoods_venues_sorted.shape

(101, 12)

In [118]:
toronto_data.head(1), toronto_data.shape

(  postal_code     borough neighbourhood Postal Code   Latitude  Longitude
 0         M3A  North York     Parkwoods         M3A  43.753259 -79.329656,
 (102, 6))

In [119]:
toronto_merged = toronto_data.copy()
toronto_merged.rename(columns={'neighbourhood': 'Neighborhood'}, inplace=True)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the first rows!

Unnamed: 0,postal_code,borough,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656,2.0,Park,Shopping Mall,Convenience Store,Pharmacy,Bus Stop,Fish & Chips Shop,Supermarket,Food & Drink Shop,Cosmetics Shop,Fast Food Restaurant
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572,2.0,Coffee Shop,Portuguese Restaurant,Boxing Gym,Lounge,Park,Golf Course,Men's Store,Gym / Fitness Center,Pizza Place,Playground
2,M5A,Downtown Toronto,Harbourfront,M5A,43.65426,-79.360636,1.0,Coffee Shop,Café,Diner,Park,Theater,Pub,Breakfast Spot,Restaurant,Bakery,Italian Restaurant
3,M6A,North York,"Lawrence Heights,Lawrence Manor",M6A,43.718518,-79.464763,1.0,Furniture / Home Store,Fast Food Restaurant,Restaurant,Coffee Shop,Arts & Crafts Store,Sushi Restaurant,Women's Store,Vietnamese Restaurant,Dessert Shop,Fried Chicken Joint
4,M7A,Downtown Toronto,Queen's Park,M7A,43.662301,-79.389494,1.0,Coffee Shop,Burger Joint,Park,Gastropub,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Men's Store,Seafood Restaurant,Bookstore


In [120]:
toronto_merged[np.isnan(toronto_merged["Cluster Labels"])]

Unnamed: 0,postal_code,borough,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,M1X,Scarborough,Upper Rouge,M1X,43.836125,-79.205636,,,,,,,,,,,


#### There is one neighbourhood which has not venues at all. Drop it

In [129]:
toronto_merged.dropna(subset=["Cluster Labels"], inplace=True) # Remove rows with without Cluster Label
toronto_merged.reset_index(inplace=True, drop=True)
toronto_merged.shape

(101, 17)

In [130]:
toronto_merged["Cluster Labels"] = toronto_merged["Cluster Labels"].astype(int)
toronto_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 17 columns):
postal_code               101 non-null object
borough                   101 non-null object
Neighborhood              101 non-null object
Postal Code               101 non-null object
Latitude                  101 non-null float64
Longitude                 101 non-null float64
Cluster Labels            101 non-null int64
1st Most Common Venue     101 non-null object
2nd Most Common Venue     101 non-null object
3rd Most Common Venue     101 non-null object
4th Most Common Venue     101 non-null object
5th Most Common Venue     101 non-null object
6th Most Common Venue     101 non-null object
7th Most Common Venue     101 non-null object
8th Most Common Venue     101 non-null object
9th Most Common Venue     101 non-null object
10th Most Common Venue    101 non-null object
dtypes: float64(2), int64(1), object(14)
memory usage: 13.5+ KB


Finally, let's visualize the resulting clusters

In [132]:
latitude = 43.717899
longitude = -79.395

In [131]:
toronto_merged['Cluster Labels'].value_counts()

1    55
0    25
2    19
4     1
3     1
Name: Cluster Labels, dtype: int64

In [133]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + '\n Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 6. Examine Clusters

<font size=2><a href="#Table-of-Contents">go to toc</a></font>

In [134]:
toronto_merged.head()

Unnamed: 0,postal_code,borough,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656,2,Park,Shopping Mall,Convenience Store,Pharmacy,Bus Stop,Fish & Chips Shop,Supermarket,Food & Drink Shop,Cosmetics Shop,Fast Food Restaurant
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572,2,Coffee Shop,Portuguese Restaurant,Boxing Gym,Lounge,Park,Golf Course,Men's Store,Gym / Fitness Center,Pizza Place,Playground
2,M5A,Downtown Toronto,Harbourfront,M5A,43.65426,-79.360636,1,Coffee Shop,Café,Diner,Park,Theater,Pub,Breakfast Spot,Restaurant,Bakery,Italian Restaurant
3,M6A,North York,"Lawrence Heights,Lawrence Manor",M6A,43.718518,-79.464763,1,Furniture / Home Store,Fast Food Restaurant,Restaurant,Coffee Shop,Arts & Crafts Store,Sushi Restaurant,Women's Store,Vietnamese Restaurant,Dessert Shop,Fried Chicken Joint
4,M7A,Downtown Toronto,Queen's Park,M7A,43.662301,-79.389494,1,Coffee Shop,Burger Joint,Park,Gastropub,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Men's Store,Seafood Restaurant,Bookstore


In [135]:
def cluster_df(df, cluster_n):
    return df.loc[df['Cluster Labels'] == cluster_n, df.columns[[2] + list(range(5, df.shape[1]))]]

In [136]:
cluster_df(toronto_merged, 0)

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Woodbine Gardens,Parkview Hill",-79.309937,0,Pizza Place,Brewery,Bus Line,Intersection,Bank,Bakery,Coffee Shop,Rock Climbing Spot,Restaurant,Gym / Fitness Center
16,Humewood-Cedarvale,-79.428191,0,Pizza Place,Coffee Shop,Playground,Trail,Middle Eastern Restaurant,Field,Farmers Market,Frozen Yogurt Shop,Chinese Restaurant,Optical Shop
18,"Guildwood,Morningside,West Hill",-79.188711,0,Pizza Place,Coffee Shop,Fast Food Restaurant,Bus Line,Beer Store,Liquor Store,Supermarket,Fried Chicken Joint,Sports Bar,Burger Joint
26,Cedarbrae,-79.239476,0,Bakery,Coffee Shop,Pharmacy,Indian Restaurant,Gas Station,Burger Joint,Intersection,Chinese Restaurant,Lounge,Bank
27,Hillcrest Village,-79.363452,0,Coffee Shop,Pharmacy,Park,Convenience Store,Grocery Store,Sandwich Place,Korean Restaurant,Bank,Bakery,Shopping Mall
29,Thorncliffe Park,-79.349372,0,Coffee Shop,Pizza Place,Indian Restaurant,Grocery Store,Supermarket,Afghan Restaurant,Burger Joint,Brewery,Gym,Turkish Restaurant
32,Scarborough Village,-79.239476,0,Ice Cream Shop,Sandwich Place,Grocery Store,Coffee Shop,Japanese Restaurant,Restaurant,Fast Food Restaurant,Bowling Alley,Train Station,Pizza Place
34,"Northwood Park,York University",-79.487262,0,Furniture / Home Store,Coffee Shop,Pizza Place,Bar,Restaurant,Caribbean Restaurant,Bank,Sandwich Place,Sushi Restaurant,Falafel Restaurant
38,"East Birchmount Park,Ionview,Kennedy Park",-79.262029,0,Chinese Restaurant,Coffee Shop,Fast Food Restaurant,Discount Store,Grocery Store,Sandwich Place,Pharmacy,Light Rail Station,Bank,Asian Restaurant
44,"Clairlea,Golden Mile,Oakridge",-79.284577,0,Intersection,Bus Line,Bakery,Coffee Shop,Metro Station,Mexican Restaurant,Pizza Place,Fast Food Restaurant,Sandwich Place,Beer Store


In [137]:
cluster_df(toronto_merged, 1)

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Harbourfront,-79.360636,1,Coffee Shop,Café,Diner,Park,Theater,Pub,Breakfast Spot,Restaurant,Bakery,Italian Restaurant
3,"Lawrence Heights,Lawrence Manor",-79.464763,1,Furniture / Home Store,Fast Food Restaurant,Restaurant,Coffee Shop,Arts & Crafts Store,Sushi Restaurant,Women's Store,Vietnamese Restaurant,Dessert Shop,Fried Chicken Joint
4,Queen's Park,-79.389494,1,Coffee Shop,Burger Joint,Park,Gastropub,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Men's Store,Seafood Restaurant,Bookstore
6,"Rouge,Malvern",-79.194353,1,Coffee Shop,Fast Food Restaurant,Trail,Bakery,Arts & Crafts Store,Sandwich Place,Paper / Office Supplies Store,Caribbean Restaurant,Gym,Fruit & Vegetable Store
7,Don Mills North,-79.352188,1,Japanese Restaurant,Pizza Place,Burger Joint,Coffee Shop,Bar,Caribbean Restaurant,Sandwich Place,Supermarket,Office,Liquor Store
9,"Ryerson,Garden District",-79.378937,1,Coffee Shop,Clothing Store,Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Diner,Italian Restaurant,Gym,Electronics Store,Ramen Restaurant
13,"Flemingdon Park,Don Mills South",-79.340923,1,Restaurant,Gym,Coffee Shop,Japanese Restaurant,Supermarket,Beer Store,Asian Restaurant,History Museum,Bank,Bar
15,St. James Town,-79.375418,1,Coffee Shop,Café,Restaurant,Bakery,Hotel,Clothing Store,Breakfast Spot,Seafood Restaurant,Gym,Cosmetics Shop
17,"Bloordale Gardens,Eringate,Markland Wood,Old B...",-79.577201,1,Coffee Shop,Cosmetics Shop,College Rec Center,Farmers Market,Café,Shopping Plaza,Shopping Mall,Beer Store,Liquor Store,Gas Station
19,The Beaches,-79.293031,1,Pub,Coffee Shop,Pizza Place,Breakfast Spot,Japanese Restaurant,Beach,Caribbean Restaurant,Café,Indian Restaurant,Bar


In [138]:
cluster_df(toronto_merged, 2)

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,-79.329656,2,Park,Shopping Mall,Convenience Store,Pharmacy,Bus Stop,Fish & Chips Shop,Supermarket,Food & Drink Shop,Cosmetics Shop,Fast Food Restaurant
1,Victoria Village,-79.315572,2,Coffee Shop,Portuguese Restaurant,Boxing Gym,Lounge,Park,Golf Course,Men's Store,Gym / Fitness Center,Pizza Place,Playground
5,Islington Avenue,-79.532242,2,Pharmacy,Convenience Store,Café,Skating Rink,Shopping Mall,Bank,Golf Course,Park,Grocery Store,Bakery
10,Glencairn,-79.445073,2,Grocery Store,Fast Food Restaurant,Park,Pizza Place,Coffee Shop,Italian Restaurant,Gas Station,Mediterranean Restaurant,Japanese Restaurant,Trail
11,"Cloverdale,Islington,Martin Grove,Princess Gar...",-79.554724,2,Park,Hotel,Pizza Place,Convenience Store,Gym,Fish & Chips Shop,Mexican Restaurant,Café,Bank,Clothing Store
12,"Highland Creek,Rouge Hill,Port Union",-79.160497,2,Breakfast Spot,Burger Joint,Playground,Park,Italian Restaurant,Zoo,Farmers Market,Electronics Store,Empanada Restaurant,Ethiopian Restaurant
14,Woodbine Heights,-79.318389,2,Coffee Shop,Park,Thai Restaurant,Pizza Place,Sandwich Place,Pastry Shop,Spa,Farmers Market,Skating Rink,Café
21,Caledonia-Fairbanks,-79.453512,2,Pharmacy,Park,Bus Stop,Construction & Landscaping,Japanese Restaurant,Pizza Place,Coffee Shop,Discount Store,Falafel Restaurant,Fast Food Restaurant
22,Woburn,-79.216917,2,Park,Coffee Shop,Fast Food Restaurant,Indian Restaurant,Chinese Restaurant,Mobile Phone Shop,Fish Market,Fish & Chips Shop,Eastern European Restaurant,Electronics Store
39,Bayview Village,-79.385975,2,Bank,Gas Station,Japanese Restaurant,Intersection,Café,Park,Grocery Store,Trail,Shopping Mall,Restaurant


In [139]:
cluster_df(toronto_merged, 3)

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,Downsview Central,-79.495697,3,Vietnamese Restaurant,Thai Restaurant,Baseball Field,Zoo,Farm,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space


In [140]:
cluster_df(toronto_merged, 4)

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,"Silver Hills,York Mills",-79.374714,4,Park,Pool,Zoo,Farmers Market,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant
