# Battle of the Neighborhoods: Vancouver vs Toronto

## 1. Introduction

This project will analyze neighborhoods between Vancouver, BC and Toronto, ON Canada. An IT startup company is looking to move its office from New York to either Vancouver or Toronto. The company needs to determine which city better suits the living standards of its employees, and alongside with  office rentals it is considering the overall attractiveness of the  neighborhoods and local businesses in the cities.  In this project we will explore and compare the central neighborhoods in the two cities, and determine which neighborhoods best fit the culture of the IT company’s employees. The highest-rated by the employees is the easy access to cafe/restaurants, gyms and parks.

## 2. Data


The data used for the project is acquired through the wikipedia and the respective cities' web-sites. We will be using geopy to find the coordinates for each neighborhood. Foursquare API will be used then to collect information on the neighborhood venues. 

The following data sources have been used in this project:

Vancouver data: https://opendata.vancouver.ca/explore/dataset/local-area-boundary/export/                    
Toronto data: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' ; http://cocl.us/Geospatial_data


## 3. Methodology

After the data is collected, it will be processed into pandas dataframes and explored. 
Folium Python visualisation lubrary will be used to visualize the neighborhoods distribution over the maps of the two cities. 
Extensive comparative analysis of the central neighborhoods of the two cities will be performed, Downtown Vancouver and Central Toronto, respectively. 
Finally, unsupervised machine learning algorithm k-means clustering will be applied to form the clusters of different categories of places in the above neighborhoods and visualize the data.  

#### Download libraries and dependencies

In [2]:
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

!conda install -c conda-forge lxml --yes 

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


In [2]:
pip install beautifulsoup4


Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 4.9MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.1 soupsieve-2.0.1
Note: you may need to restart the kernel to use updated packages.


In [3]:
from bs4 import BeautifulSoup as bsoup
from urllib.request import urlopen as uReq
import requests
import lxml
import pandas as pd
from pandas import DataFrame
import numpy as np

#### Download Vancouver data

In [62]:
with open('local-area-boundary.json') as json_data:
    vancouver_data = json.load(json_data)

In [63]:
vancouver_data

[{'datasetid': 'local-area-boundary',
  'recordid': '69e8688669554e4a50ff15bea41cdac9d088901b',
  'fields': {'mapid': 'DS',
   'geom': {'type': 'Polygon',
    'coordinates': [[[-123.17016601562499, 49.24789047240798],
      [-123.17024993896482, 49.23470306396071],
      [-123.17870330810545, 49.23472213744702],
      [-123.17909240722653, 49.216804504390375],
      [-123.17908477783202, 49.215557098384515],
      [-123.17910003662107, 49.215557098384515],
      [-123.17975616455075, 49.21558761596264],
      [-123.18041229248044, 49.2156372070271],
      [-123.1810607910156, 49.21571350097242],
      [-123.18170166015622, 49.215812683101305],
      [-123.18232727050778, 49.21593475341382],
      [-123.18295288085935, 49.21607971190991],
      [-123.18355560302733, 49.2162475585896],
      [-123.18415069580075, 49.21643447875562],
      [-123.1847229003906, 49.21664428710522],
      [-123.18527984619139, 49.21687316894117],
      [-123.18581390380857, 49.21712112426342],
      [-123.18

#### Create the dataframe

In [64]:
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [65]:
for data in vancouver_data:
    
    neighborhood_name = data['fields']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [66]:
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Dunbar-Southlands,49.237962,-123.189547
1,Kerrisdale,49.223655,-123.159576
2,Killarney,49.217022,-123.037647
3,Kitsilano,49.26754,-123.163295
4,South Cambie,49.245556,-123.121801
5,Victoria-Fraserview,49.220012,-123.064135
6,Kensington-Cedar Cottage,49.246686,-123.072885
7,Mount Pleasant,49.263065,-123.098513
8,Oakridge,49.226403,-123.123025
9,Renfrew-Collingwood,49.247343,-123.040166


In [67]:
neighborhoods.shape

(22, 3)

#### Use geopy library to get the latitude and longitude values of Vancouver, CA

In [68]:
address = 'Vancouver, CA'

geolocator = Nominatim(user_agent="my_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Vancouver are 49.2608724, -123.1139529.


#### Use Folium to create a map of Vancouver with neoghborhoods superimposed

In [69]:
map_vancouver = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vancouver)  
    
map_vancouver

#### Create Vancouver downtown dataframe with neighborhoods in central Vancouver only

In [70]:
vancouver_downtown = neighborhoods.drop([0,1,2,4,5,6,8,9,10,12,16,17,18,19], axis=0)
vancouver_downtown

Unnamed: 0,Neighborhood,Latitude,Longitude
3,Kitsilano,49.26754,-123.163295
7,Mount Pleasant,49.263065,-123.098513
11,West Point Grey,49.268401,-123.203467
13,Downtown,49.280747,-123.116567
14,Fairview,49.26454,-123.131049
15,Grandview-Woodland,49.27644,-123.066728
20,Strathcona,49.27822,-123.088235
21,West End,49.285011,-123.135438


#### Explore Downton Vancouver neighborhoods  with Foursquare API

In [71]:
CLIENT_ID = '*****' # your Foursquare ID
CLIENT_SECRET = '*******' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

In [72]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)       
# make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print('Found {} venues in {} neighborhoods.'.format(nearby_venues.shape[0], len(venues_list)))
        
    return(nearby_venues) 

In [73]:
vancouver_venues = getNearbyVenues(names=vancouver_downtown['Neighborhood'],
                                   latitudes=vancouver_downtown['Latitude'],
                                   longitudes=vancouver_downtown['Longitude']
                                  )

Found 306 venues in 8 neighborhoods.


In [74]:
vancouver_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kitsilano,49.26754,-123.163295,Cafe Lokal,49.268174,-123.16471,Coffee Shop
1,Kitsilano,49.26754,-123.163295,The Only Cafe,49.268197,-123.165536,Café
2,Kitsilano,49.26754,-123.163295,Guanaco Salvadoran Cuisine food truck,49.268251,-123.161749,Food Truck
3,Kitsilano,49.26754,-123.163295,Terra Breads,49.268139,-123.159275,Bakery
4,Kitsilano,49.26754,-123.163295,Raisu,49.268244,-123.15843,Japanese Restaurant


#### Use onehot encoding to analyze each neighborhood

In [75]:
# use one hot encoding
vancouver_onehot = pd.get_dummies(vancouver_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vancouver_onehot['Neighborhood'] = vancouver_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vancouver_onehot.columns[-1]] + list(vancouver_onehot.columns[:-1])
vancouver_onehot = vancouver_onehot[fixed_columns]

vancouver_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Belgian Restaurant,Board Shop,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Stop,Café,Cajun / Creole Restaurant,Camera Store,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Disc Golf,Dive Bar,Donut Shop,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fish & Chips Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,Gay Bar,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Men's Store,Mexican Restaurant,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Noodle House,Optical Shop,Outdoor Sculpture,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Plaza,Poke Place,Pool Hall,Pub,Ramen Restaurant,Record Shop,Restaurant,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shopping Mall,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Kitsilano,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Kitsilano,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Kitsilano,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Kitsilano,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Kitsilano,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [76]:
vancouver_grouped = vancouver_onehot.groupby('Neighborhood').mean().reset_index()

vancouver_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beer Bar,Belgian Restaurant,Board Shop,Boat or Ferry,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Stop,Café,Cajun / Creole Restaurant,Camera Store,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Disc Golf,Dive Bar,Donut Shop,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fish & Chips Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gastropub,Gay Bar,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Men's Store,Mexican Restaurant,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Noodle House,Optical Shop,Outdoor Sculpture,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Plaza,Poke Place,Pool Hall,Pub,Ramen Restaurant,Record Shop,Restaurant,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shopping Mall,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Downtown,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.013514,0.0,0.0,0.013514,0.027027,0.0,0.0,0.013514,0.013514,0.0,0.054054,0.0,0.0,0.0,0.013514,0.013514,0.0,0.040541,0.040541,0.0,0.013514,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.013514,0.013514,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.027027,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.094595,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.013514,0.013514,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.013514,0.0,0.0,0.013514,0.013514,0.013514,0.013514,0.0,0.0,0.013514,0.0,0.040541,0.0,0.027027,0.0,0.040541,0.013514,0.0,0.013514,0.0,0.013514,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.013514,0.027027,0.0,0.013514,0.013514,0.0,0.0,0.0
1,Fairview,0.0,0.0,0.038462,0.076923,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.076923,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0
2,Grandview-Woodland,0.0,0.0,0.0,0.027778,0.027778,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.055556,0.027778,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.027778,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.055556,0.0,0.027778,0.055556,0.0,0.0,0.0,0.0,0.0,0.0
3,Kitsilano,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.019231,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.096154,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.019231,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.019231,0.0,0.019231,0.0,0.0,0.019231,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.057692,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.057692,0.0,0.0,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.019231,0.0,0.019231,0.0,0.038462,0.019231,0.038462
4,Mount Pleasant,0.0,0.0,0.025,0.0125,0.0,0.0,0.0125,0.0,0.0,0.025,0.0125,0.0,0.0125,0.0125,0.0125,0.0375,0.025,0.0125,0.0125,0.0125,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0125,0.0125,0.0875,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0125,0.0125,0.0125,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0125,0.025,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0125,0.025,0.0,0.0125,0.0125,0.0,0.0,0.0125,0.0,0.0125,0.0125,0.0,0.0125,0.0,0.0,0.0125,0.0,0.0125,0.0125,0.0,0.0,0.0125,0.0125,0.0125,0.0125,0.0125,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0375,0.0125,0.0125,0.0125,0.0,0.0,0.0125,0.0,0.0,0.0125,0.025,0.0,0.0,0.0,0.0125
5,Strathcona,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,West End,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,West Point Grey,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [77]:
num_top_venues = 10

for hood in vancouver_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = vancouver_grouped[vancouver_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Downtown----
                venue  freq
0               Hotel  0.09
1                Café  0.05
2  Seafood Restaurant  0.04
3          Restaurant  0.04
4        Concert Hall  0.04
5         Coffee Shop  0.04
6          Steakhouse  0.03
7      Breakfast Spot  0.03
8             Theater  0.03
9         Art Gallery  0.03


----Fairview----
                    venue  freq
0             Coffee Shop  0.12
1          Breakfast Spot  0.08
2     Japanese Restaurant  0.08
3        Asian Restaurant  0.08
4                    Park  0.08
5              Restaurant  0.04
6       Indian Restaurant  0.04
7                     Gym  0.04
8              Nail Salon  0.04
9  Furniture / Home Store  0.04


----Grandview-Woodland----
                           venue  freq
0                    Coffee Shop  0.11
1                    Pizza Place  0.08
2                  Deli / Bodega  0.06
3  Vegetarian / Vegan Restaurant  0.06
4                        Theater  0.06
5              Indian Restaurant  0.06
6 

In [78]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]



In [79]:

import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        # create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)

neighborhoods_venues_sorted['Neighborhood'] = vancouver_grouped['Neighborhood']

for ind in np.arange(vancouver_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vancouver_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown,Hotel,Café,Coffee Shop,Restaurant,Seafood Restaurant,Concert Hall,Vegetarian / Vegan Restaurant,Gastropub,Theater,Art Gallery
1,Fairview,Coffee Shop,Park,Asian Restaurant,Breakfast Spot,Japanese Restaurant,Gym,Pharmacy,Camera Store,Spa,Korean Restaurant
2,Grandview-Woodland,Coffee Shop,Pizza Place,Vegetarian / Vegan Restaurant,Theater,Indian Restaurant,Café,Deli / Bodega,Grocery Store,Cajun / Creole Restaurant,Park
3,Kitsilano,Coffee Shop,Pizza Place,Japanese Restaurant,Grocery Store,French Restaurant,Food Truck,Yoga Studio,Electronics Store,Wine Shop,Bakery
4,Mount Pleasant,Coffee Shop,Diner,Breakfast Spot,Sushi Restaurant,Sandwich Place,Brewery,Lounge,Arts & Crafts Store,Vietnamese Restaurant,Indian Restaurant
5,Strathcona,Park,Brewery,Deli / Bodega,Food Truck,Sandwich Place,Cheese Shop,Restaurant,Pub,Coffee Shop,Ethiopian Restaurant
6,West End,Café,Coffee Shop,Gay Bar,Farmers Market,Grocery Store,Sushi Restaurant,Noodle House,Pub,Restaurant,Sandwich Place
7,West Point Grey,Harbor / Marina,Gym / Fitness Center,Gym,Disc Golf,Performing Arts Venue,Park,Yoga Studio,Diner,Electronics Store,Donut Shop


In [85]:
vancouver_venues[vancouver_venues["Venue Category"] == "Park"]["Venue"].count()

7

In [86]:
vancouver_venues[vancouver_venues["Venue Category"] == "Gym"]["Venue"].count()

3

In [80]:
kclusters = 3

vancouver_grouped_clustering = vancouver_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=2).fit(vancouver_grouped_clustering)

# check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10] 
kmeans.labels_

array([0, 0, 0, 0, 0, 2, 0, 1], dtype=int32)

In [81]:
vancouver_merged = vancouver_downtown

# add clustering labels
vancouver_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
vancouver_merged = vancouver_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

vancouver_merged.head() 

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Kitsilano,49.26754,-123.163295,0,Coffee Shop,Pizza Place,Japanese Restaurant,Grocery Store,French Restaurant,Food Truck,Yoga Studio,Electronics Store,Wine Shop,Bakery
7,Mount Pleasant,49.263065,-123.098513,0,Coffee Shop,Diner,Breakfast Spot,Sushi Restaurant,Sandwich Place,Brewery,Lounge,Arts & Crafts Store,Vietnamese Restaurant,Indian Restaurant
11,West Point Grey,49.268401,-123.203467,0,Harbor / Marina,Gym / Fitness Center,Gym,Disc Golf,Performing Arts Venue,Park,Yoga Studio,Diner,Electronics Store,Donut Shop
13,Downtown,49.280747,-123.116567,0,Hotel,Café,Coffee Shop,Restaurant,Seafood Restaurant,Concert Hall,Vegetarian / Vegan Restaurant,Gastropub,Theater,Art Gallery
14,Fairview,49.26454,-123.131049,0,Coffee Shop,Park,Asian Restaurant,Breakfast Spot,Japanese Restaurant,Gym,Pharmacy,Camera Store,Spa,Korean Restaurant


In [82]:
# create map
map_vancouver_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vancouver_merged['Latitude'], vancouver_merged['Longitude'], vancouver_merged['Neighborhood'], vancouver_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_vancouver_clusters)
       
map_vancouver_clusters

In [83]:
vancouver_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Kitsilano,49.26754,-123.163295,0,Coffee Shop,Pizza Place,Japanese Restaurant,Grocery Store,French Restaurant,Food Truck,Yoga Studio,Electronics Store,Wine Shop,Bakery
7,Mount Pleasant,49.263065,-123.098513,0,Coffee Shop,Diner,Breakfast Spot,Sushi Restaurant,Sandwich Place,Brewery,Lounge,Arts & Crafts Store,Vietnamese Restaurant,Indian Restaurant
11,West Point Grey,49.268401,-123.203467,0,Harbor / Marina,Gym / Fitness Center,Gym,Disc Golf,Performing Arts Venue,Park,Yoga Studio,Diner,Electronics Store,Donut Shop
13,Downtown,49.280747,-123.116567,0,Hotel,Café,Coffee Shop,Restaurant,Seafood Restaurant,Concert Hall,Vegetarian / Vegan Restaurant,Gastropub,Theater,Art Gallery
14,Fairview,49.26454,-123.131049,0,Coffee Shop,Park,Asian Restaurant,Breakfast Spot,Japanese Restaurant,Gym,Pharmacy,Camera Store,Spa,Korean Restaurant


In [96]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 0, vancouver_merged.columns[[0] + list(range(4, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
3,Kitsilano,Coffee Shop,Pizza Place,Japanese Restaurant,Grocery Store,French Restaurant,Food Truck,Yoga Studio,Electronics Store,Wine Shop,Bakery,Bank,Gastropub,Pharmacy,Mac & Cheese Joint,Liquor Store
7,Mount Pleasant,Coffee Shop,Diner,Breakfast Spot,Sushi Restaurant,Sandwich Place,Brewery,Lounge,Arts & Crafts Store,Vietnamese Restaurant,Indian Restaurant,Bar,Grocery Store,Donut Shop,Electronics Store,Ethiopian Restaurant
11,West Point Grey,Harbor / Marina,Gym / Fitness Center,Gym,Disc Golf,Performing Arts Venue,Park,Yoga Studio,Diner,Electronics Store,Donut Shop,Dive Bar,Dessert Shop,Falafel Restaurant,Deli / Bodega,Dance Studio
13,Downtown,Hotel,Café,Coffee Shop,Restaurant,Seafood Restaurant,Concert Hall,Vegetarian / Vegan Restaurant,Gastropub,Theater,Art Gallery,Breakfast Spot,Bar,Steakhouse,Sandwich Place,Burrito Place
14,Fairview,Coffee Shop,Park,Asian Restaurant,Breakfast Spot,Japanese Restaurant,Gym,Pharmacy,Camera Store,Spa,Korean Restaurant,Restaurant,Indian Restaurant,Pet Store,Thai Restaurant,Furniture / Home Store
20,Strathcona,Park,Brewery,Deli / Bodega,Food Truck,Sandwich Place,Cheese Shop,Restaurant,Pub,Coffee Shop,Ethiopian Restaurant,Dessert Shop,Dance Studio,Diner,Disc Golf,Dive Bar


In [95]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 1, vancouver_merged.columns[[0] + list(range(4, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
21,West End,Café,Coffee Shop,Gay Bar,Farmers Market,Grocery Store,Sushi Restaurant,Noodle House,Pub,Restaurant,Sandwich Place,Falafel Restaurant,Bookstore,Lingerie Store,Spanish Restaurant,Park


In [97]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 2, vancouver_merged.columns[[0] + list(range(4, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
15,Grandview-Woodland,Coffee Shop,Pizza Place,Vegetarian / Vegan Restaurant,Theater,Indian Restaurant,Café,Deli / Bodega,Grocery Store,Cajun / Creole Restaurant,Park,Pub,Record Shop,Scandinavian Restaurant,Brewery,Cuban Restaurant


## Download and explore Toronto dataset

In [4]:
#Obtain Postal Code, Borough, and Neighborhood information from Wikipedia
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header = 0)

df_toronto = df[0]
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


#### Transform the Data

In [5]:
df_toronto.rename(columns = {'Postal Code': 'PostalCode', 'Neighbourhood': 'Neighborhood'}, inplace = True)

#Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
df_toronto.drop(df_toronto[df_toronto.Borough == 'Not assigned'].index, inplace=True)

#Combine the neighborhoods 
df_toronto = df_toronto.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(lambda x: ','.join(x)).reset_index()

#Change unassigned Neighborhoods to its Borough's names
df_toronto.loc[85,'Neighborhood'] = 'Queen\'s Park'

print (df_toronto.shape)

df_toronto.head()

(103, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### Obtain longitude and latitude information and join with neighborhood data

In [6]:
#Create a dataframe of the latitude and longitudes 
df_latlong = pd.read_csv("http://cocl.us/Geospatial_data")
df_latlong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
df_latlong.rename(columns = {"Postal Code": "PostalCode"}, inplace = True)
df_latlong.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
df_latlong.shape

(103, 3)

#### Join neighborhood dataframe with latlong dataframe

In [10]:
df_toronto.set_index("PostalCode")
df_latlong.set_index("PostalCode")
df_neighbor=pd.merge(df_toronto, df_latlong)

print (df_neighbor.shape)

df_neighbor.head()

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


#### Use geopy library to get the latitude and longitude values of Toronto

In [11]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="my_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top

In [12]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_neighbor['Latitude'], df_neighbor['Longitude'], df_neighbor['Borough'], df_neighbor['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Segmenting and clustering only the neighborhoods in Central Toronto. 

In [14]:
# let's slice the original dataframe and create a new dataframe of the Central Toronto Neighborhood data.

central_toronto = df_neighbor[df_neighbor['Borough'] == 'Central Toronto'].reset_index(drop=True)
central_toronto

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
5,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049
6,M5N,Central Toronto,Roselawn,43.711695,-79.416936
7,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307
8,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678


In [15]:
central_toronto.shape

(9, 5)

In [16]:
# get the geographical coordinates of Central Toronto

address = 'Central Toronto, CA'

geolocator = Nominatim(user_agent="my_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Central Toronto, CA are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Central Toronto, CA are 43.6534817, -79.3839347.


In [17]:
map_central_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(central_toronto['Latitude'], central_toronto['Longitude'], central_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_central_toronto)  
    
map_central_toronto  
    

#### Function to explore Central Toronto neighborhoods

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print('Found {} venues in {} neighborhoods.'.format(nearby_venues.shape[0], len(venues_list)))
    
    return(nearby_venues)

In [20]:
central_toronto_venues = getNearbyVenues(names=central_toronto['Neighborhood'],
                                   latitudes=central_toronto['Latitude'],
                                   longitudes=central_toronto['Longitude']
                                  )

Found 109 venues in 9 neighborhoods.


In [21]:
central_toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
2,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
3,Davisville North,43.712751,-79.390197,Sherwood Park,43.716551,-79.387776,Park
4,Davisville North,43.712751,-79.390197,Summerhill Market North,43.715499,-79.392881,Food & Drink Shop


In [22]:
print('There are {} distinct venues in {} categories.'.format(
    len(central_toronto_venues['Venue'].unique()),len(central_toronto_venues['Venue Category'].unique())))


There are 97 distinct venues in 61 categories.


#### Analyze each neighborhood

In [23]:
# use one hot encoding
central_toronto_onehot = pd.get_dummies(central_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
central_toronto_onehot['Neighborhood'] = central_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [central_toronto_onehot.columns[-1]] + list(central_toronto_onehot.columns[:-1])
central_toronto_onehot = central_toronto_onehot[fixed_columns]

central_toronto_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,BBQ Joint,Bagel Shop,Bank,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Department Store,Dessert Shop,Diner,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Fried Chicken Joint,Garden,Gas Station,Gift Shop,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,History Museum,Hotel,Indian Restaurant,Indoor Play Area,Italian Restaurant,Jewelry Store,Light Rail Station,Liquor Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Park,Pet Store,Pharmacy,Pizza Place,Pool,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Spa,Sporting Goods Shop,Sports Bar,Supermarket,Sushi Restaurant,Swim School,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Lawrence Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Lawrence Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
2,Lawrence Park,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Davisville North,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Davisville North,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
central_toronto_grouped = central_toronto_onehot.groupby('Neighborhood').mean().reset_index()
central_toronto_grouped

Unnamed: 0,Neighborhood,American Restaurant,BBQ Joint,Bagel Shop,Bank,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Department Store,Dessert Shop,Diner,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Fried Chicken Joint,Garden,Gas Station,Gift Shop,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,History Museum,Hotel,Indian Restaurant,Indoor Play Area,Italian Restaurant,Jewelry Store,Light Rail Station,Liquor Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Park,Pet Store,Pharmacy,Pizza Place,Pool,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Spa,Sporting Goods Shop,Sports Bar,Supermarket,Sushi Restaurant,Swim School,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Davisville,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.060606,0.0,0.0,0.060606,0.0,0.090909,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.030303,0.060606,0.0,0.0,0.0,0.030303,0.030303,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.090909,0.0,0.0,0.030303,0.0,0.090909,0.030303,0.0,0.0,0.0,0.0,0.060606,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0
1,Davisville North,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Forest Hill North & West, Forest Hill Road Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0
3,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
4,"Moore Park, Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"North Toronto West, Lawrence Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.111111,0.111111,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.055556,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556
6,Roselawn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Summerhill West, Rathnelly, South Hill, Forest...",0.055556,0.0,0.055556,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.111111,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0
8,"The Annex, North Midtown, Yorkville",0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.136364,0.0,0.0,0.090909,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.045455,0.045455,0.0,0.045455,0.045455,0.0,0.045455,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0


#### Explore the top 10 venues of each neighborhood

In [25]:
num_top_venues = 10

for hood in central_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = central_toronto_grouped[central_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Davisville----
                venue  freq
0      Sandwich Place  0.09
1         Pizza Place  0.09
2        Dessert Shop  0.09
3  Italian Restaurant  0.06
4                 Gym  0.06
5                Café  0.06
6    Sushi Restaurant  0.06
7         Coffee Shop  0.06
8   Indian Restaurant  0.03
9               Diner  0.03


----Davisville North----
                  venue  freq
0        Sandwich Place  0.12
1  Gym / Fitness Center  0.12
2           Pizza Place  0.12
3      Department Store  0.12
4                  Park  0.12
5     Food & Drink Shop  0.12
6                 Hotel  0.12
7        Breakfast Spot  0.12
8    Mexican Restaurant  0.00
9            Restaurant  0.00


----Forest Hill North & West, Forest Hill Road Park----
                 venue  freq
0        Jewelry Store  0.25
1                Trail  0.25
2     Sushi Restaurant  0.25
3                 Park  0.25
4  American Restaurant  0.00
5           Restaurant  0.00
6   Light Rail Station  0.00
7         Liquor Store  0.

#### Create pandas dataframe for the top 10 venues 

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


In [35]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
central_toronto_venues_sorted = pd.DataFrame(columns=columns)

central_toronto_venues_sorted['Neighborhood'] = central_toronto_grouped['Neighborhood']

for ind in np.arange(central_toronto_grouped.shape[0]):
    central_toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(central_toronto_grouped.iloc[ind, :], num_top_venues)

central_toronto_venues_sorted


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Davisville,Pizza Place,Dessert Shop,Sandwich Place,Gym,Italian Restaurant,Sushi Restaurant,Coffee Shop,Café,Gas Station,Greek Restaurant
1,Davisville North,Hotel,Food & Drink Shop,Gym / Fitness Center,Department Store,Breakfast Spot,Sandwich Place,Park,Pizza Place,Fast Food Restaurant,Flower Shop
2,"Forest Hill North & West, Forest Hill Road Park",Trail,Park,Jewelry Store,Sushi Restaurant,Yoga Studio,Fried Chicken Joint,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop
3,Lawrence Park,Swim School,Bus Line,Park,Yoga Studio,Donut Shop,Gym,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station
4,"Moore Park, Summerhill East",Gym,Yoga Studio,Hotel,Gym / Fitness Center,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station,Garden,Fried Chicken Joint
5,"North Toronto West, Lawrence Park",Coffee Shop,Clothing Store,Seafood Restaurant,Gift Shop,Fast Food Restaurant,Mexican Restaurant,Diner,Park,Pet Store,Restaurant
6,Roselawn,Garden,Pool,Yoga Studio,Donut Shop,Gym / Fitness Center,Gym,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station
7,"Summerhill West, Rathnelly, South Hill, Forest...",Coffee Shop,Pub,Burger Joint,Vietnamese Restaurant,Fried Chicken Joint,Light Rail Station,Liquor Store,Pizza Place,Restaurant,American Restaurant
8,"The Annex, North Midtown, Yorkville",Sandwich Place,Café,Coffee Shop,Indian Restaurant,Donut Shop,Park,Metro Station,Pharmacy,Pizza Place,Liquor Store


In [44]:
central_toronto_venues[central_toronto_venues["Venue Category"] == "Park"]["Venue"].count()

6

In [84]:
central_toronto_venues[central_toronto_venues["Venue Category"] == "Gym"]["Venue"].count()

3

#### Clustering the Central Toronto neighborhoods uding k-means

In [51]:
kclusters = 3

central_toronto_grouped_clustering = central_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=2).fit(central_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10] 
kmeans.labels_

array([0, 0, 0, 0, 2, 0, 1, 0, 0], dtype=int32)

#### Create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhood

In [56]:

central_toronto_merged = central_toronto

# add clustering labels
central_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
central_toronto_merged = central_toronto_merged.join(central_toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

central_toronto_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Swim School,Bus Line,Park,Yoga Studio,Donut Shop,Gym,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Hotel,Food & Drink Shop,Gym / Fitness Center,Department Store,Breakfast Spot,Sandwich Place,Park,Pizza Place,Fast Food Restaurant,Flower Shop
2,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678,0,Coffee Shop,Clothing Store,Seafood Restaurant,Gift Shop,Fast Food Restaurant,Mexican Restaurant,Diner,Park,Pet Store,Restaurant
3,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Pizza Place,Dessert Shop,Sandwich Place,Gym,Italian Restaurant,Sushi Restaurant,Coffee Shop,Café,Gas Station,Greek Restaurant
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,2,Gym,Yoga Studio,Hotel,Gym / Fitness Center,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station,Garden,Fried Chicken Joint


In [54]:
map_central_toronto_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(central_toronto_merged['Latitude'], central_toronto_merged['Longitude'], central_toronto_merged['Neighborhood'], central_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_central_toronto_clusters)
       
map_central_toronto_clusters

#### Explore the clusters

In [59]:
central_toronto_merged.loc[central_toronto_merged['Cluster Labels'] == 0, central_toronto_merged.columns[[0] + list(range(2, central_toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,Lawrence Park,43.72802,-79.38879,0,Swim School,Bus Line,Park,Yoga Studio,Donut Shop,Gym,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station
1,M4P,Davisville North,43.712751,-79.390197,0,Hotel,Food & Drink Shop,Gym / Fitness Center,Department Store,Breakfast Spot,Sandwich Place,Park,Pizza Place,Fast Food Restaurant,Flower Shop
2,M4R,"North Toronto West, Lawrence Park",43.715383,-79.405678,0,Coffee Shop,Clothing Store,Seafood Restaurant,Gift Shop,Fast Food Restaurant,Mexican Restaurant,Diner,Park,Pet Store,Restaurant
3,M4S,Davisville,43.704324,-79.38879,0,Pizza Place,Dessert Shop,Sandwich Place,Gym,Italian Restaurant,Sushi Restaurant,Coffee Shop,Café,Gas Station,Greek Restaurant
5,M4V,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049,0,Coffee Shop,Pub,Burger Joint,Vietnamese Restaurant,Fried Chicken Joint,Light Rail Station,Liquor Store,Pizza Place,Restaurant,American Restaurant
7,M5P,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307,0,Trail,Park,Jewelry Store,Sushi Restaurant,Yoga Studio,Fried Chicken Joint,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop
8,M5R,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,0,Sandwich Place,Café,Coffee Shop,Indian Restaurant,Donut Shop,Park,Metro Station,Pharmacy,Pizza Place,Liquor Store


In [60]:
central_toronto_merged.loc[central_toronto_merged['Cluster Labels'] == 1, central_toronto_merged.columns[[0] + list(range(2, central_toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,M5N,Roselawn,43.711695,-79.416936,1,Garden,Pool,Yoga Studio,Donut Shop,Gym / Fitness Center,Gym,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station


In [61]:
central_toronto_merged.loc[central_toronto_merged['Cluster Labels'] == 2, central_toronto_merged.columns[[0] + list(range(2, central_toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M4T,"Moore Park, Summerhill East",43.689574,-79.38316,2,Gym,Yoga Studio,Hotel,Gym / Fitness Center,Greek Restaurant,Gourmet Shop,Gift Shop,Gas Station,Garden,Fried Chicken Joint


## 4. Results

### 4.1. Downtown Vancouver Neighborhoods

I used the k-means algorithm to cluster the Downtown Vancouver neighborhoods in 3 clusters. Cluster_0 has 6 neighborhoods where the most common values are coffee-shops, different food places and parks. Cluster_1 has only one neighborhood which has a lot of places to eat, but no gym. Cluster_2 also has only one neighborhood with prevailing coffee-shops and restaurants, but no gyms. Altogether, there are total of 306 venues in 8 Downtown Vancouver neighborhoods. Obviously, there are plenty of places to eat in all three clusters, but only 7 parks and 3 gyms, all of which are concentrated in cluster_0.

### 4.2. Central Toronto Neighborhoods

The clustering in Central Toronto is very similar to the one in Downtown Vancouver. I used the k-means algorithm to cluster the Central Toronto neighborhoods into 3 clusters. Cluster_0 has 7 neighborhoods where the most common values are coffee-shops, different food places and restaurants. Cluster_1 has only one neighborhood which has a garden and a pool and no places to eat and no gyms. Cluster_2 also has only one neighborhood with one gym, but no food places. Altogether, there are total of 109 venues in 9 Central Toronto neighborhoods, out of which most are different restaurants and food places; there are 6 parks and 3 gyms.

## 5. Discussion

The segmentation and clustering of the cities of Vancouver and Toronto provided very similar results. In the central parts of the cities, 8 neighborhoods were compared and analyzed in Vancouver and 9 in Toronto. According to the data provided by Foursquare, there are 306 venues in Vancouver and 109 in Toronto, the venue categories being very similar, with prevailing coffee shops, different kinds of restaurants and places to eat, entertainment, parks etc. The clustering of the two cities also showed similar results, where most of the neighborhoods were clustered in one cluster (6 in Vancouver and 7 in Toronto) with the remaining two clusters having one each. 

## 6. Conclusion and Recommendations

As already mentioned, the data analysis has produced very similar results for the two cities in terms of variety of venues and homogenity. However, since there are considerably more venues in Downtown Vancouver (306) as compared to Central Toronto (109), our recommnedation to IT startup will be to move the office to one of the neighborhoods shown in cluster_0 in Downtown Vancouver, since they are most closely fitting the requirements of having an easy access to places to eat, gyms and parks. 