# Toronto. Battle of neighborhoods

In this notebook is presented research on the similarity of different neighborhoods of Toronto based on the information about places of public interest from Foursquare.

## Table of Contents

<div style="margin-top: 30px">

<font size = 4>

1. <a href="#item1">Preparing dataset</a>

2. <a href="#item2">Merging Borough names and geolocation info</a>

3. <a href="#item3">Research</a>
   
</font>
</div>

## 1. Preparing dataset <a class="anchor" id="item1"></a>

We need to prepare a dataset from the table from wikipedia 'Toronto - FSAs' - a Toronto neighbourhoods/ZIP codes table

First let download a .csv file

In [1]:
# import pandas, numpy

import pandas as pd
import numpy as np

In [2]:
# The code was removed by Watson Studio for sharing.

In [3]:
#dataset with Toronto postal codes have been imported as 'body' from IBM Watson assets storage in invisible cell above

df = pd.read_csv(body)
df.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


In [4]:
df.shape

(288, 3)

Now we will get rid of Not Assigned values in 'Borough' column and change this value in Neighbourhood with the correspoding Borough

In [5]:
df = df[df.Borough != 'Not assigned']
df.reset_index(drop = True, inplace = True)

df.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [6]:
# Get index of 'Not assigned' values in Neighbours

df.index[df.Neighbourhood =='Not assigned'].tolist()

[6]

In [7]:
#Change them with corresponding Borough names

changename = df.iloc[6,1]
df = df.replace(to_replace = 'Not assigned', value = changename)
df.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [8]:
df.shape

(211, 3)

## 2. Merging Borough names and geolocation info <a class="anchor" id="item2"></a>

Let's import new table with coordinates for the postal codes

In [9]:
# The code was removed by Watson Studio for sharing.

In [10]:
df_geo = pd.read_csv(body)
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [11]:
#Merge both dataframes based on Postal codes values

df_geo = df_geo.rename(columns = {'Postal Code' : 'Postcode'})
df_neigh = pd.merge(df, df_geo, on = 'Postcode')
df_neigh.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,43.718518,-79.464763
5,M6A,North York,Lawrence Manor,43.718518,-79.464763
6,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
7,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
8,M1B,Scarborough,Rouge,43.806686,-79.194353
9,M1B,Scarborough,Malvern,43.806686,-79.194353


## 3. Research <a class="anchor" id="item3"></a>

In [12]:
# get all boroughs names

df_neigh.Borough.unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

Let's select neighbourhoods located only in Toronto area (boroughs contaning word 'Toronto')

In [13]:
df_neigh_tor = df_neigh.loc[df_neigh['Borough'].str.contains('Toronto')].reset_index(drop = True)
df_neigh_tor = df_neigh_tor.rename(columns = {'Neighbourhood' : 'Neighborhood'})
df_neigh_tor.head(8)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
2,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937
3,M5B,Downtown Toronto,Garden District,43.657162,-79.378937
4,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
5,M4E,East Toronto,The Beaches,43.676357,-79.293031
6,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
7,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


In [14]:
df_neigh_tor.shape

(74, 5)

### Create a map of Toronto with neighborhoods superimposed on top.

In [15]:
!pip install folium
import folium

!pip install geopy
from geopy.geocoders import Nominatim

print('Folium installed and imported')

Folium installed and imported


In [16]:
address = 'Toronto'

geolocator = Nominatim(user_agent="torontox")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [20]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_neigh_tor['Latitude'], df_neigh_tor['Longitude'], df_neigh_tor['Borough'], df_neigh_tor['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [21]:
# The code was removed by Watson Studio for sharing.

**Let's explore the first neighborhood in our dataframe.** 
Get the neighborhood's name.

In [23]:
df_neigh_tor.loc[0, 'Neighborhood']

'Harbourfront'

Get the neighborhood's latitude and longitude values.

In [24]:
neigh_lat = df_neigh_tor.Latitude.iloc[0]
neigh_long = df_neigh_tor.Longitude.iloc[0]

print ('Latitude for the neighborhood {} are {}, {}.'.format(df_neigh_tor.loc[0, 'Neighborhood'], neigh_lat, neigh_long))

Latitude for the neighborhood Harbourfront are 43.6542599, -79.3606359.


#### Let's get the top 100 venues that are in Harbourfront within a radius of 500 meters.

In [25]:
#let's create the GET request URL. Name your URL url.

radius = 500
LIMIT = 100

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neigh_lat, neigh_long, VERSION, radius, LIMIT)

Send the GET request and examine the resutls

In [26]:
import requests                                #need to import the requests libraryto send requests
import json                                    #to work with json format
from pandas.io.json import json_normalize      # tranform JSON file into a pandas dataframe

In [27]:
results = requests.get(url).json()
#results

In [28]:
# function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Cleaning the json and structure it into a pandas dataframe

In [29]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Toronto Cooper Koo Family Cherry St YMCA Centre,Gym / Fitness Center,43.653191,-79.357947
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149


In [30]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

49 venues were returned by Foursquare.


#### Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [31]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues

In [32]:
toronto_venues = getNearbyVenues(names=df_neigh_tor['Neighborhood'],
                                   latitudes=df_neigh_tor['Latitude'],
                                   longitudes=df_neigh_tor['Longitude']
                                  )


Harbourfront
Regent Park
Ryerson
Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide
King
Richmond
Dovercourt Village
Dufferin
Harbourfront East
Toronto Islands
Union Station
Little Portugal
Trinity
The Danforth West
Riverdale
Design Exchange
Toronto Dominion Centre
Brockton
Exhibition Place
Parkdale Village
The Beaches West
India Bazaar
Commerce Court
Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North
Forest Hill West
High Park
The Junction South
North Toronto West
The Annex
North Midtown
Yorkville
Parkdale
Roncesvalles
Davisville
Harbord
University of Toronto
Runnymede
Swansea
Moore Park
Summerhill East
Chinatown
Grange Park
Kensington Market
Deer Park
Forest Hill SE
Rathnelly
South Hill
Summerhill West
CN Tower
Bathurst Quay
Island airport
Harbourfront West
King and Spadina
Railway Lands
South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown
St. James Town
First Canadian Place
Underground city


In [33]:
print(toronto_venues.shape)
toronto_venues.head()

(3285, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


In [34]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,100,100,100,100,100,100
Bathurst Quay,14,14,14,14,14,14
Berczy Park,57,57,57,57,57,57
Brockton,21,21,21,21,21,21
Business Reply Mail Processing Centre 969 Eastern,19,19,19,19,19,19
CN Tower,14,14,14,14,14,14
Cabbagetown,45,45,45,45,45,45
Central Bay Street,84,84,84,84,84,84
Chinatown,100,100,100,100,100,100
Christie,15,15,15,15,15,15


In [35]:
t_v_count = toronto_venues.groupby('Venue Category').count()
t_v_count.sort_values(by = 'Neighborhood', ascending = False)

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Coffee Shop,276,276,276,276,276,276
Café,170,170,170,170,170,170
Restaurant,95,95,95,95,95,95
Bakery,85,85,85,85,85,85
Bar,82,82,82,82,82,82
Italian Restaurant,82,82,82,82,82,82
Hotel,73,73,73,73,73,73
Pizza Place,71,71,71,71,71,71
Park,58,58,58,58,58,58
Gym,49,49,49,49,49,49


Let's find out how many unique categories can be curated from all the returned venues

In [36]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))
print (toronto_venues['Venue Category'].count())

There are 239 uniques categories.
3285


#### Analyze Each Neighborhood

In [37]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

In [38]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

toronto_onehot.head()

Unnamed: 0,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
toronto_onehot.shape

(3285, 239)

Let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [40]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Adelaide,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.030000,0.000000,...,0.000000,0.00000,0.00,0.010000,0.000000,0.000000,0.000000,0.010000,0.01,0.000000
1,Bathurst Quay,0.000000,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.000000,0.000000,...,0.000000,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
2,Berczy Park,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00000,0.00,0.017544,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
3,Brockton,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
4,Business Reply Mail Processing Centre 969 Eastern,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.052632
5,CN Tower,0.000000,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.000000,0.000000,...,0.000000,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
6,Cabbagetown,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000
7,Central Bay Street,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.011905,0.000000,...,0.000000,0.00000,0.00,0.011905,0.000000,0.011905,0.000000,0.011905,0.00,0.011905
8,Chinatown,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00000,0.00,0.060000,0.000000,0.000000,0.040000,0.010000,0.00,0.000000
9,Christie,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.000000


In [41]:
toronto_grouped.shape

(73, 239)

Let's print each neighborhood along with the top 5 most common venues

In [42]:
num_top_venues = 3

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
         venue  freq
0  Coffee Shop  0.08
1         Café  0.05
2          Bar  0.04


----Bathurst Quay----
              venue  freq
0    Airport Lounge  0.14
1   Airport Service  0.14
2  Airport Terminal  0.14


----Berczy Park----
            venue  freq
0     Coffee Shop  0.09
1    Cocktail Bar  0.05
2  Farmers Market  0.04


----Brockton----
            venue  freq
0     Coffee Shop   0.1
1            Café   0.1
2  Breakfast Spot   0.1


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station  0.11
1         Yoga Studio  0.05
2                Park  0.05


----CN Tower----
              venue  freq
0    Airport Lounge  0.14
1   Airport Service  0.14
2  Airport Terminal  0.14


----Cabbagetown----
                venue  freq
0                Park  0.07
1         Coffee Shop  0.07
2  Italian Restaurant  0.04


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.14
1  Italian Restaurant  0

#### Let's put that into a pandas dataframe. First, let's write a function to sort the venues in descending order

In [43]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Build a dataframe with a number of sorted common vanues for each neighbourhood

In [44]:
num_top_venues = 15

# create columns according to number of top venues

# set indicators of position (1st, 2nd ...)
indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Adelaide,Coffee Shop,Café,Thai Restaurant,Bar,Steakhouse,Breakfast Spot,Gym,Asian Restaurant,Hotel,American Restaurant,Restaurant,Bakery,Pizza Place,Sushi Restaurant,Concert Hall
1,Bathurst Quay,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
2,Berczy Park,Coffee Shop,Cocktail Bar,Steakhouse,Farmers Market,Seafood Restaurant,Beer Bar,Café,Cheese Shop,Bakery,Park,Concert Hall,Greek Restaurant,Comfort Food Restaurant,Nightclub,Restaurant
3,Brockton,Breakfast Spot,Coffee Shop,Café,Italian Restaurant,Pet Store,Climbing Gym,Restaurant,Caribbean Restaurant,Burrito Place,Stadium,Furniture / Home Store,Bar,Bakery,Convenience Store,Performing Arts Venue
4,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Yoga Studio,Garden Center,Farmers Market,Smoke Shop,Fast Food Restaurant,Brewery,Park,Burrito Place,Spa,Restaurant,Garden,Auto Workshop,Recording Studio,Skate Park
5,CN Tower,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
6,Cabbagetown,Park,Coffee Shop,Bakery,Café,Convenience Store,Restaurant,Italian Restaurant,Pizza Place,Pub,Beer Store,Market,Dive Bar,Japanese Restaurant,Pet Store,Jewelry Store
7,Central Bay Street,Coffee Shop,Ice Cream Shop,Italian Restaurant,Sandwich Place,Burger Joint,Café,Bakery,Bubble Tea Shop,Salad Place,Restaurant,Indian Restaurant,Middle Eastern Restaurant,Spa,Chinese Restaurant,Bar
8,Chinatown,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Vietnamese Restaurant,Bar,Mexican Restaurant,Dumpling Restaurant,Bakery,Coffee Shop,Donut Shop,Record Shop,Burger Joint,Arts & Crafts Store,Farmers Market,Comfort Food Restaurant
9,Christie,Café,Grocery Store,Park,Convenience Store,Coffee Shop,Baby Store,Restaurant,Italian Restaurant,Diner,Nightclub,Dim Sum Restaurant,Dessert Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


We'll use k-means algorithm to cluster the neighborhood into 4 clusters (number 4 corresponds with the number of boroughs just to check possible correlation)

In [45]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 15 venues for each neighborhood.

In [46]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_neigh_tor

# merge neighborhoods_venues_sorted with toronto neighborhood data (df_neigh_tor) to add latitude/longitude for each neighborhood
toronto_merged = df_neigh_tor.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,1,Coffee Shop,Café,Park,Pub,...,Restaurant,Breakfast Spot,Mexican Restaurant,Theater,Spa,Italian Restaurant,Beer Store,French Restaurant,Electronics Store,Brewery
1,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,1,Coffee Shop,Café,Park,Pub,...,Restaurant,Breakfast Spot,Mexican Restaurant,Theater,Spa,Italian Restaurant,Beer Store,French Restaurant,Electronics Store,Brewery
2,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,...,Italian Restaurant,Pizza Place,Plaza,Tea Room,Theater,Bookstore,Diner,Ice Cream Shop,Bubble Tea Shop,Fast Food Restaurant
3,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,...,Italian Restaurant,Pizza Place,Plaza,Tea Room,Theater,Bookstore,Diner,Ice Cream Shop,Bubble Tea Shop,Fast Food Restaurant
4,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Italian Restaurant,Restaurant,Café,...,Bakery,Park,Gastropub,Breakfast Spot,Japanese Restaurant,Cocktail Bar,Cosmetics Shop,Pizza Place,Diner,Beer Bar


Try to visualize clustered districts on the map

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

We can see now that whole the area of Central Toronto looks almost similar. Supposedly clasterization can give different results if we include other information for example type of buildings development, schools, residential and commercial realty cost and other into the dataset. But what we have now gives us a quick look at what Toronto looks like for the traveler experience mostly.

### Let's examine each cluster

**Cluster 1 (0)**

In [48]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
59,CN Tower,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
60,Bathurst Quay,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
61,Island airport,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
62,Harbourfront West,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
63,King and Spadina,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
64,Railway Lands,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
65,South Niagara,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Airport,Airport Food Court,Airport Gate,Coffee Shop,Harbor / Marina,Sculpture Garden,Boat or Ferry,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,Event Space


**Cluster 2 (1)**

In [49]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Harbourfront,1,Coffee Shop,Café,Park,Pub,Bakery,Restaurant,Breakfast Spot,Mexican Restaurant,Theater,Spa,Italian Restaurant,Beer Store,French Restaurant,Electronics Store,Brewery
1,Regent Park,1,Coffee Shop,Café,Park,Pub,Bakery,Restaurant,Breakfast Spot,Mexican Restaurant,Theater,Spa,Italian Restaurant,Beer Store,French Restaurant,Electronics Store,Brewery
2,Ryerson,1,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,Café,Italian Restaurant,Pizza Place,Plaza,Tea Room,Theater,Bookstore,Diner,Ice Cream Shop,Bubble Tea Shop,Fast Food Restaurant
3,Garden District,1,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,Café,Italian Restaurant,Pizza Place,Plaza,Tea Room,Theater,Bookstore,Diner,Ice Cream Shop,Bubble Tea Shop,Fast Food Restaurant
4,St. James Town,1,Coffee Shop,Italian Restaurant,Restaurant,Café,Hotel,Bakery,Park,Gastropub,Breakfast Spot,Japanese Restaurant,Cocktail Bar,Cosmetics Shop,Pizza Place,Diner,Beer Bar
5,The Beaches,1,Trail,Pub,Health Food Store,Coffee Shop,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Yoga Studio,Dog Run,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant
6,Berczy Park,1,Coffee Shop,Cocktail Bar,Steakhouse,Farmers Market,Seafood Restaurant,Beer Bar,Café,Cheese Shop,Bakery,Park,Concert Hall,Greek Restaurant,Comfort Food Restaurant,Nightclub,Restaurant
7,Central Bay Street,1,Coffee Shop,Ice Cream Shop,Italian Restaurant,Sandwich Place,Burger Joint,Café,Bakery,Bubble Tea Shop,Salad Place,Restaurant,Indian Restaurant,Middle Eastern Restaurant,Spa,Chinese Restaurant,Bar
8,Christie,1,Café,Grocery Store,Park,Convenience Store,Coffee Shop,Baby Store,Restaurant,Italian Restaurant,Diner,Nightclub,Dim Sum Restaurant,Dessert Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant
9,Adelaide,1,Coffee Shop,Café,Thai Restaurant,Bar,Steakhouse,Breakfast Spot,Gym,Asian Restaurant,Hotel,American Restaurant,Restaurant,Bakery,Pizza Place,Sushi Restaurant,Concert Hall


**Cluster 3 (2)**

In [50]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
32,Roselawn,2,Garden,Yoga Studio,Flower Shop,Fish Market,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


**Cluster 4 (3)**

In [51]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
34,Forest Hill North,3,Trail,Park,Sushi Restaurant,Jewelry Store,Fast Food Restaurant,Filipino Restaurant,Farmers Market,Falafel Restaurant,Doner Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Fish & Chips Shop,Dumpling Restaurant
35,Forest Hill West,3,Trail,Park,Sushi Restaurant,Jewelry Store,Fast Food Restaurant,Filipino Restaurant,Farmers Market,Falafel Restaurant,Doner Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Fish & Chips Shop,Dumpling Restaurant
66,Rosedale,3,Park,Playground,Building,Trail,Dog Run,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


In conclusion, maybe said that almost all the area of the central part of Toronto is represented by Cluster 2 and provide nearly the same experience. Cluster 1 (0) - airport area lies beyond. And Clusters 3 and 4 differ from each other and from cluster 2. Further research can show how do they differ. At first glance, we can say that the last two clusters represented with more recreational venues like parks, trails and gardens and favours a wholesome lifestyle.