#**Toronto Neighbourhood clustering**

##In this notebook, we will look into clustering of neighborhood in Toronto, based on the venues and places in the neighborhood

*The steps involve data from wikipedia and getting data from foursquare api and clustering them using K means clustering algortihm*

In [1]:
#import the require libraries 
import pandas as pd 
from bs4 import BeautifulSoup
import requests
import numpy as np
from google.colab import drive 

#install geopy library
!pip install geopy
from geopy.geocoders import Nominatim 

#install folium 
!pip install folium 
import folium 

import json
from pandas.io.json import json_normalize

drive.mount('drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at drive


###**Get the dataset from the wikipedia website** 

In [11]:
res=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')#request the url

#web scraping using BeautifulSoup
soup=BeautifulSoup(res.content,'lxml')

#get the table data
table=soup.find_all('table')[0]

#read the table into pandas dataframe
df=pd.read_html(str(table))[0]

#get the first five rows
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


###**Replace 'Not assigned' string in Borough to NaN values, so that we can use .dropna() method**

In [12]:
#convert not assigned values to NaN values 
df['Borough'].replace(to_replace='Not assigned',value=np.NaN,inplace=True)

#print the shape of the dataframe
print(df.shape)

#display first five rows
df.head()

(180, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,,Not assigned
1,M2A,,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


##**Check the number of NaN values in the dataframe, there are 77 NaN entries i.e 'Not assigned' value in Borough column**

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Postal Code    180 non-null    object
 1   Borough        103 non-null    object
 2   Neighbourhood  180 non-null    object
dtypes: object(3)
memory usage: 4.3+ KB


In [14]:
#drop borough with nan values
df.dropna(axis=0,inplace=True)

#reset the index
df.reset_index(drop=True,inplace=True)

#display first five rows
df.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


***There were no rows with a assigned borough and not assigned Neighbourhood, hecne dropping all rows with borough as not assigned creates a dataframe with assigned borough and assigned neighbourhood***

In [15]:
#get the count of  Not assigned in neighbourhood column 
df[df['Neighbourhood']=='Not assigned'].head()

#there are no neighbourhood with not assigned as value

Unnamed: 0,Postal Code,Borough,Neighbourhood


In [16]:
df.shape

(103, 3)

##**Get the location dataset from http://cocl.us/Geospatial_data**

In [17]:
#get the location coordinates
location=pd.read_csv('drive/My Drive/Geospatial_Coordinates.csv')

location.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


###**Each location must be given its latitude and longitude position. The location data is downloaded from the http://cocl.us/Geospatial_data website, the resulting dataframe is joined with our postal code dataframe, using the postal code column**

In [18]:
#join the two dataframe
df=df.join(location.set_index('Postal Code'),on='Postal Code')

#get the first five rows
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [19]:
df.shape

(103, 5)

###**Get the neighbourhood data of Toronto alone**##

In [20]:
#get the neighbourhood in toronto 
toronto_neighbourhood=df[df['Borough'].str.contains('Toronto')].reset_index(drop=True)

toronto_neighbourhood.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [21]:
toronto_neighbourhood.shape

(39, 5)

###**Let's create a map of toronto with its neighbourhood**

In [22]:
address='Toronto'

geolocator=Nominatim(user_agent='toronto_explorer')
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [23]:
map_toronto=folium.Map(location=[latitude,longitude],zoom_start=10)

for lat,lon,borough,neighbourhood in zip(df['Latitude'],df['Longitude'],df['Borough'],df['Neighbourhood']):
  label='{},{}'.format(neighbourhood,borough)
  label=folium.Popup(label, parse_html=True)
  folium.CircleMarker(
      [lat,lon],
      radius=5,
      popup=label,
      color='blue',
      fill=True,
      fill_color='#3186cc',
      fill_opacity=0.7,
      parse_html=False).add_to(map_toronto)

map_toronto

###**Let's use foursquare API to search for venues in each neighbourhood**

***First define foursquare credentials***</br>
Client ID</br>
Client secret

In [24]:
CLIENT_ID = 'A1QUV4XTJYU1B4S3VEN3U2TWX2QMURG5TXBCCM53Y3H0DUUP' # your Foursquare ID
CLIENT_SECRET = 'KVYQ3K4KDGHOBAUDO2RCR0HMK3SSMWYU1YH3TAFN4AYWZHQP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: A1QUV4XTJYU1B4S3VEN3U2TWX2QMURG5TXBCCM53Y3H0DUUP
CLIENT_SECRET:KVYQ3K4KDGHOBAUDO2RCR0HMK3SSMWYU1YH3TAFN4AYWZHQP


###Let's explore the first neighbourhood 

In [25]:
latitude=toronto_neighbourhood.loc[0,'Latitude']
longitude=toronto_neighbourhood.loc[0,'Longitude']
radius=500
LIMIT=100

url= 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=A1QUV4XTJYU1B4S3VEN3U2TWX2QMURG5TXBCCM53Y3H0DUUP&client_secret=KVYQ3K4KDGHOBAUDO2RCR0HMK3SSMWYU1YH3TAFN4AYWZHQP&ll=43.6542599,-79.3606359&v=20180605&radius=500&limit=100'

In [26]:
results=requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f1e42d8fa27bf3620f60127'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-54ea41ad498e9a11e9e13308-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/bakery_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d16a941735',
         'name': 'Bakery',
         'pluralName': 'Bakeries',
         'primary': True,
         'shortName': 'Bakery'}],
       'id': '54ea41ad498e9a11e9e13308',
       'location': {'address': '362 King St E',
        'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'Trinity St',
        'distance': 143,
        'formattedAddress': ['362 King St E (Trinity St)',
         'Toronto ON M5A 1K9',
         'Canada'],
        'labeledLatLngs': [{'label': 'display',
 

###Let's get the category of each venue from the results json acquired with Foursquare API

In [28]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [29]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Distribution Center,43.653249,-79.358008
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Impact Kitchen,Restaurant,43.656369,-79.35698


In [30]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

46 venues were returned by Foursquare.


##**Let's get the venues in each neighbourhood**

This function gets the venues in each neighbourhood and appends it to the venues_list list 

In [31]:
def get_nearby_venues(toronto_neighbourhood,radius=500,LIMIT=100):

    venues_list=[]
    for bor in toronto_neighbourhood['Borough']:
        neighbourhood=toronto_neighbourhood[toronto_neighbourhood['Borough']==bor]
        for nam,lat,lon in zip(neighbourhood['Neighbourhood'],neighbourhood['Latitude'],neighbourhood['Longitude']):
            print(nam,bor)

            #API request to get the desired data
            url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID,
                CLIENT_SECRET,
                VERSION,
                lat,
                lon,
                radius,
                LIMIT)
            
            #request to the API
            results = requests.get(url).json()["response"]['groups'][0]['items']

            #add venues to the list
            venues_list.append([(
                bor,
                nam,
                lat,
                lon,
                v['venue']['name'],
                v['venue']['location']['lat'],
                v['venue']['location']['lng'],
                v['venue']['categories'][0]['name']) for v in results ])
        
        #create a dataframe of the venues list
        nearby_venues=pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns=['Borough',
                               'Neighbourhood',
                               'Neighbourhood Latitude',
                               'Neighbourhood Longitude',
                               'Venue',
                               'Venue latitude',
                               'Venue longitude',
                               'Venue Category']
        
        return nearby_venues
            

In [32]:
#get the venue dataframe
toronto_venues=get_nearby_venues(toronto_neighbourhood)

Regent Park, Harbourfront Downtown Toronto
Queen's Park, Ontario Provincial Government Downtown Toronto
Garden District, Ryerson Downtown Toronto
St. James Town Downtown Toronto
Berczy Park Downtown Toronto
Central Bay Street Downtown Toronto
Christie Downtown Toronto
Richmond, Adelaide, King Downtown Toronto
Harbourfront East, Union Station, Toronto Islands Downtown Toronto
Toronto Dominion Centre, Design Exchange Downtown Toronto
Commerce Court, Victoria Hotel Downtown Toronto
University of Toronto, Harbord Downtown Toronto
Kensington Market, Chinatown, Grange Park Downtown Toronto
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport Downtown Toronto
Rosedale Downtown Toronto
Stn A PO Boxes Downtown Toronto
St. James Town, Cabbagetown Downtown Toronto
First Canadian Place, Underground city Downtown Toronto
Church and Wellesley Downtown Toronto


In [33]:
#view the first five rows
toronto_venues.head()

Unnamed: 0,Borough,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue latitude,Venue longitude,Venue Category
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [34]:
toronto_venues.shape

(1236, 8)

###**Let's drop the Borough column and group by the Neighbourhood columns and get the number of venues in the neighbourhood**

In [35]:
#drop the borough column
toronto_venues.drop('Borough',axis=1,inplace=True)

#group by neighbourhood and get the number of venues
toronto_venues.groupby('Neighbourhood').count().reset_index()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue latitude,Venue longitude,Venue Category
0,Berczy Park,58,58,58,58,58,58
1,"CN Tower, King and Spadina, Railway Lands, Har...",15,15,15,15,15,15
2,Central Bay Street,64,64,64,64,64,64
3,Christie,17,17,17,17,17,17
4,Church and Wellesley,76,76,76,76,76,76
5,"Commerce Court, Victoria Hotel",100,100,100,100,100,100
6,"First Canadian Place, Underground city",100,100,100,100,100,100
7,"Garden District, Ryerson",100,100,100,100,100,100
8,"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100
9,"Kensington Market, Chinatown, Grange Park",64,64,64,64,64,64


**Lets find the number of unique categories in the dataframe**

In [36]:
print('There are {} unique categories'.format(len(toronto_venues['Venue Category'].unique())))

There are 206 unique categories


##**Analyze each neighbourhood**

In [37]:
#one hot encoding 
toronto_onehot=pd.get_dummies(toronto_venues[['Venue Category']],prefix="",prefix_sep="")

#add neighbourhood column back to one hot encoded dataframe
toronto_onehot['Neighbourhood']=toronto_venues['Neighbourhood']

#move neighbourhood column to the first column
columns=[toronto_onehot.columns[-1]]+list(toronto_onehot.columns[:-1])
toronto_onehot=toronto_onehot[columns]

#view the first five rows
toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,...,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's group by neighbourhood and by taking mean of frequency of occurence of each category

In [59]:
toronto_grouped=toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,...,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.017241,0.034483,0.0,0.0,0.0,0.017241,0.017241,0.0,0.034483,0.0,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.133333,0.2,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0625,...,0.0,0.0,0.03125,0.0,0.046875,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.015625
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.026316,0.0,0.013158,0.013158,0.0,0.013158,...,0.0,0.013158,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316


*Let's print the top five venues of each neighbourhood*

In [39]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
            venue  freq
0     Coffee Shop  0.09
1  Farmers Market  0.03
2        Beer Bar  0.03
3            Café  0.03
4     Cheese Shop  0.03


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
             venue  freq
0  Airport Service  0.20
1   Airport Lounge  0.13
2            Plane  0.07
3  Harbor / Marina  0.07
4    Boat or Ferry  0.07


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.19
1  Italian Restaurant  0.06
2                Café  0.06
3      Sandwich Place  0.05
4    Department Store  0.03


----Christie----
                venue  freq
0       Grocery Store  0.24
1                Café  0.18
2                Park  0.12
3          Baby Store  0.06
4  Italian Restaurant  0.06


----Church and Wellesley----
                 venue  freq
0          Coffee Shop  0.11
1     Sushi Restaurant  0.05
2  Japanese Restaurant  0.05
3           Restaurant  0.04
4        

###Function to sort the venues in descending order

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

###**Let's create a dataframe with top 10 venues in each neighbourhood**

In [67]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Café,Cocktail Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Beer Bar,Liquor Store
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Harbor / Marina,Bar,Plane,Rental Car Location,Sculpture Garden,Boat or Ferry,Coffee Shop,Airport Terminal
2,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Japanese Restaurant,Department Store,Burger Joint,Bubble Tea Shop,Salad Place,Seafood Restaurant
3,Christie,Grocery Store,Café,Park,Candy Store,Diner,Italian Restaurant,Restaurant,Baby Store,Athletics & Sports,Nightclub
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Yoga Studio,Men's Store,Mediterranean Restaurant,Hotel,Dance Studio


###**Cluster the neighbourhood** 

In [68]:
#import KMeans from sklearn.cluster to cluster the neighbourhood
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


In [69]:
clusters=5 #lets group the neighbourhood into 5 clusters 

#drop the neighbourhod column from the toronto_grouped dataframe 
toronto_clustering=toronto_grouped.drop('Neighbourhood',axis=1) 

#run KMeans
kmeans=KMeans(n_clusters=clusters,random_state=42).fit(toronto_clustering)


In [70]:
#add the cluster labels to the dataframe
neighborhoods_venues_sorted.insert(0,'Cluster labels',kmeans.labels_)

#lets add the labelled neighbourhood dataframe to the toronto_neighbourhood data
toronto_merged=toronto_neighbourhood

#merge to toronto merged
toronto_merged=toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood',drop=True),on='Neighbourhood')

#view to first five columns 
toronto_merged.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4.0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Dessert Shop,Restaurant,Chocolate Shop
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4.0,Coffee Shop,Diner,College Auditorium,Smoothie Shop,Beer Bar,Sandwich Place,Burrito Place,Café,Park,Creperie
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0.0,Clothing Store,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Cosmetics Shop,Bubble Tea Shop,Hotel,Pizza Place,Plaza
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0.0,Coffee Shop,Café,Clothing Store,Restaurant,American Restaurant,Cocktail Bar,Cosmetics Shop,Department Store,Gym,Moroccan Restaurant
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,,,,,,,,,,,
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0.0,Coffee Shop,Café,Cocktail Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Beer Bar,Liquor Store
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0.0,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Japanese Restaurant,Department Store,Burger Joint,Bubble Tea Shop,Salad Place,Seafood Restaurant
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,3.0,Grocery Store,Café,Park,Candy Store,Diner,Italian Restaurant,Restaurant,Baby Store,Athletics & Sports,Nightclub
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0.0,Coffee Shop,Café,Hotel,Restaurant,Clothing Store,Gym,Steakhouse,Deli / Bodega,Thai Restaurant,Cosmetics Shop
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,,,,,,,,,,,


###**NaN values in the above dataframe tells that, foursquare API doesnt have data for those Neighbourhoods**

*Let's drop the rows with NaN entries*

In [72]:
#drop the rows with NaN entries
toronto_merged=toronto_merged.dropna(axis=0).reset_index(drop=True)


#view the dataframe
toronto_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4.0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Dessert Shop,Restaurant,Chocolate Shop
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4.0,Coffee Shop,Diner,College Auditorium,Smoothie Shop,Beer Bar,Sandwich Place,Burrito Place,Café,Park,Creperie
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0.0,Clothing Store,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Cosmetics Shop,Bubble Tea Shop,Hotel,Pizza Place,Plaza
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0.0,Coffee Shop,Café,Clothing Store,Restaurant,American Restaurant,Cocktail Bar,Cosmetics Shop,Department Store,Gym,Moroccan Restaurant
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0.0,Coffee Shop,Café,Cocktail Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Beer Bar,Liquor Store
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0.0,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Japanese Restaurant,Department Store,Burger Joint,Bubble Tea Shop,Salad Place,Seafood Restaurant
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,3.0,Grocery Store,Café,Park,Candy Store,Diner,Italian Restaurant,Restaurant,Baby Store,Athletics & Sports,Nightclub
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0.0,Coffee Shop,Café,Hotel,Restaurant,Clothing Store,Gym,Steakhouse,Deli / Bodega,Thai Restaurant,Cosmetics Shop
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,0.0,Coffee Shop,Aquarium,Café,Hotel,Fried Chicken Joint,Brewery,Scenic Lookout,Restaurant,Bakery,Bar
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,0.0,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Seafood Restaurant,Salad Place,Japanese Restaurant,Italian Restaurant,Concert Hall


In [74]:
#Let's drop the postal code column
toronto_merged.drop('Postal Code',axis=1,inplace=True)

In [81]:
toronto_merged['Cluster labels']=toronto_merged['Cluster labels'].astype(int)
#view the dataframe
toronto_merged

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Dessert Shop,Restaurant,Chocolate Shop
1,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4,Coffee Shop,Diner,College Auditorium,Smoothie Shop,Beer Bar,Sandwich Place,Burrito Place,Café,Park,Creperie
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Cosmetics Shop,Bubble Tea Shop,Hotel,Pizza Place,Plaza
3,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Clothing Store,Restaurant,American Restaurant,Cocktail Bar,Cosmetics Shop,Department Store,Gym,Moroccan Restaurant
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Café,Cocktail Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Beer Bar,Liquor Store
5,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Japanese Restaurant,Department Store,Burger Joint,Bubble Tea Shop,Salad Place,Seafood Restaurant
6,Downtown Toronto,Christie,43.669542,-79.422564,3,Grocery Store,Café,Park,Candy Store,Diner,Italian Restaurant,Restaurant,Baby Store,Athletics & Sports,Nightclub
7,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Coffee Shop,Café,Hotel,Restaurant,Clothing Store,Gym,Steakhouse,Deli / Bodega,Thai Restaurant,Cosmetics Shop
8,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,0,Coffee Shop,Aquarium,Café,Hotel,Fried Chicken Joint,Brewery,Scenic Lookout,Restaurant,Bakery,Bar
9,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,0,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Seafood Restaurant,Salad Place,Japanese Restaurant,Italian Restaurant,Concert Hall


###**Let's visualize the clustering using folium**

In [83]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(clusters)
ys = [i + x + (i*x)**2 for i in range(clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

###**Let's analyse each neighbourhood and assign each one a name**

In [89]:
food_court=toronto_merged.loc[toronto_merged['Cluster labels']==0,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

#the cluster 0 has many restaurants and cafes in it, so lets name this cluster as food_court
food_court

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Garden District, Ryerson",Clothing Store,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Cosmetics Shop,Bubble Tea Shop,Hotel,Pizza Place,Plaza
3,St. James Town,Coffee Shop,Café,Clothing Store,Restaurant,American Restaurant,Cocktail Bar,Cosmetics Shop,Department Store,Gym,Moroccan Restaurant
4,Berczy Park,Coffee Shop,Café,Cocktail Bar,Seafood Restaurant,Farmers Market,Bakery,Restaurant,Cheese Shop,Beer Bar,Liquor Store
5,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Japanese Restaurant,Department Store,Burger Joint,Bubble Tea Shop,Salad Place,Seafood Restaurant
7,"Richmond, Adelaide, King",Coffee Shop,Café,Hotel,Restaurant,Clothing Store,Gym,Steakhouse,Deli / Bodega,Thai Restaurant,Cosmetics Shop
8,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Aquarium,Café,Hotel,Fried Chicken Joint,Brewery,Scenic Lookout,Restaurant,Bakery,Bar
9,"Toronto Dominion Centre, Design Exchange",Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Seafood Restaurant,Salad Place,Japanese Restaurant,Italian Restaurant,Concert Hall
10,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym,American Restaurant,Japanese Restaurant,Deli / Bodega,Italian Restaurant,Seafood Restaurant
11,"University of Toronto, Harbord",Café,Restaurant,Bar,Japanese Restaurant,Bookstore,Sandwich Place,Bakery,Yoga Studio,Italian Restaurant,Beer Bar
12,"Kensington Market, Chinatown, Grange Park",Café,Coffee Shop,Vegetarian / Vegan Restaurant,Mexican Restaurant,Vietnamese Restaurant,Bar,Pizza Place,Park,Dessert Shop,Grocery Store


In [98]:
#the cluster 1 has airport and harbour, lets name this as airport_harbour
airport_harbour=toronto_merged.loc[toronto_merged['Cluster labels']==1,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

airport_harbour

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Harbor / Marina,Bar,Plane,Rental Car Location,Sculpture Garden,Boat or Ferry,Coffee Shop,Airport Terminal


In [96]:
#cluster 2 has parks,play ground, restaurants, and other stores, lets name it as entertainment zone
entertainment_zone=toronto_merged.loc[toronto_merged['Cluster labels']==2,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

entertainment_zone

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Rosedale,Park,Playground,Trail,Dance Studio,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store


In [101]:
#cluster 3 has many stores including grocery stores, park, restaurant, baby store etc., So lets name it as a residential area
residential_area=toronto_merged.loc[toronto_merged['Cluster labels']==3,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

residential_area

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Christie,Grocery Store,Café,Park,Candy Store,Diner,Italian Restaurant,Restaurant,Baby Store,Athletics & Sports,Nightclub


In [104]:
#cluster 4 seems have some educational spots and areas for students, lets name it as educational_area
educational_area=toronto_merged.loc[toronto_merged['Cluster labels']==4,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

educational_area

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Dessert Shop,Restaurant,Chocolate Shop
1,"Queen's Park, Ontario Provincial Government",Coffee Shop,Diner,College Auditorium,Smoothie Shop,Beer Bar,Sandwich Place,Burrito Place,Café,Park,Creperie
