# Capstone Project :  Toronto Neighbourhood data Analysis

Goal : Anayse the Toronto neighbourhood data by applying segmentation and clustering and to familiarise with location data provider Foursquare and gain experience using RESTful AIPs to leverage data and use Folium library to generate maps of geospatial data .

In [1]:
import numpy as np
import pandas as pd
import requests
import folium
import json 
from sklearn.cluster import KMeans
import matplotlib.cm as cm

import matplotlib.colors as colors

In [2]:
print("Hello Capstone Project Course!")

Hello Capstone Project Course!


## Retrive data from Wikipedia

Read data from Wikipedia:

In [59]:
import requests
neighbour_url = requests.get('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=942851379')


Scrape Wikipedia page using Beautifulsoup

In [60]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(neighbour_url.text,'lxml')


In [61]:
neighbour_table = soup.find_all('table')[0]#,{'class':"wikitable sortable"})
#neighbour_table

Create data frame from html

In [62]:
df = pd.read_html(str(neighbour_table))
df=pd.DataFrame(df[0]) 
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
...,...,...,...
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West
285,M8Z,Etobicoke,South of Bloor


## Data Wrangling

Drop the rows with Borough not assigned

In [63]:
df.replace('Not assigned', np.nan, inplace=True)
df.dropna(subset=["Borough"], axis=0, inplace=True)

# reset index, because we droped rows
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West
206,M8Z,Etobicoke,Mimico NW
207,M8Z,Etobicoke,The Queensway West
208,M8Z,Etobicoke,Royal York South West


Check for Missing Values:

In [64]:
missing_data = df.isnull()
for column in missing_data.columns.values.tolist():
    print(column)
    print (missing_data[column].value_counts())
    print("")    

Postcode
False    210
Name: Postcode, dtype: int64

Borough
False    210
Name: Borough, dtype: int64

Neighbourhood
False    210
Name: Neighbourhood, dtype: int64



Size of New Dataset:

In [65]:
df.shape

(210, 3)

### Adding geographical information

Insert columns for Latitude and Longitude

In [66]:
import pgeocode

# retrieve the latitude/longitude from a postal code in Canada 'ca'
nomi_ca = pgeocode.Nominatim('ca')

latitude = []
longitude = []

for index, row in df.iterrows():
    location = nomi_ca.query_postal_code(row[0])  # row[0] represents Postal Code value
    latitude.append(location.latitude)
    longitude.append(location.longitude)
    
# we put the result of the loop in new columns 'latitude' and 'longitude'
df['Latitude'] = latitude
df['Longitude'] = longitude


# pb with Canada Post Gateway Processing Centre > need to do the query manually
df.loc[df['Neighbourhood'] == "Canada Post Gateway Processing Centre", ['Latitude', 'Longitude']] = [43.636966,-79.615819]


In [66]:
df

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.3300
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,Harbourfront,43.6555,-79.3626
3,M6A,North York,Lawrence Heights,43.7223,-79.4504
4,M6A,North York,Lawrence Manor,43.7223,-79.4504
...,...,...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West,43.6256,-79.5231
206,M8Z,Etobicoke,Mimico NW,43.6256,-79.5231
207,M8Z,Etobicoke,The Queensway West,43.6256,-79.5231
208,M8Z,Etobicoke,Royal York South West,43.6256,-79.5231


In [67]:
missing_data = df.isnull()
for column in missing_data.columns.values.tolist():
    print(column)
    print (missing_data[column].value_counts())
    print("")    

Postcode
False    210
Name: Postcode, dtype: int64

Borough
False    210
Name: Borough, dtype: int64

Neighbourhood
False    210
Name: Neighbourhood, dtype: int64

Latitude
False    210
Name: Latitude, dtype: int64

Longitude
False    210
Name: Longitude, dtype: int64



## Explore Toronto Neighbourhood

In [68]:
from geopy import Nominatim # convert an address into latitude and longitude values

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
t_latitude= location.latitude
t_longitude = location.longitude


In [13]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[t_latitude, t_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(map_toronto)  
    
map_toronto


#### Define Foursquare credentials and Version:

## Explore a Location

Explore the location called North York segment and cluster only neighbourhood in North York

In [69]:
ny_data = df[df['Borough'] == 'North York'].reset_index(drop=True)
ny_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M6A,North York,Lawrence Heights,43.7223,-79.4504
3,M6A,North York,Lawrence Manor,43.7223,-79.4504
4,M3B,North York,Don Mills North,43.745,-79.359


In [70]:
address = 'North York, ON'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
ny_latitude= location.latitude
ny_longitude = location.longitude


In [16]:
# create map of Parkwood using latitude and longitude values
map_NorthYork = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(map_NorthYork)  
    
map_NorthYork



### Define foursquare credentials

In [71]:
CLIENT_ID = 'VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR' # your Foursquare ID
CLIENT_SECRET = '5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR
CLIENT_SECRET:5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX


In [72]:
ny_data.loc[0, 'Neighbourhood']

'Parkwoods'

## Exploring Venues in Parkwoods

Retrieve top 100 venue from foursquare within 500 metre radius within North York.

In [73]:
neighbourhood_latitude = ny_data.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = ny_data.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = ny_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Parkwoods are 43.7545, -79.33.


In [55]:

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?&client_id=VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR&client_secret=5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX&v=20180605&ll=43.7545,-79.33&radius=500&limit=100'

In [74]:
results = requests.get(url,"none").json()
results

{'meta': {'code': 200, 'requestId': '60b6b60b430b510f7ec2cfc8'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.7590000045, 'lng': -79.32378161085641},
   'sw': {'lat': 43.7499999955, 'lng': -79.33621838914358}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 329,
        'cc': 'CA',
        'city': 'To

Extracting Venue catagory Type 

In [75]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [77]:
from pandas.io.json import json_normalize
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,KFC,Fast Food Restaurant,43.754387,-79.333021
2,Variety Store,Food & Drink Shop,43.751974,-79.333114


In [78]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

3 venues were returned by Foursquare.


Retrive the venue deails from json file to dataframe_filtered.

Function to repeat the process of exploring the venues and catagorise them for each Borough

In [80]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [81]:
NorthYork_venues = getNearbyVenues(names=ny_data['Neighbourhood'],
                                   latitudes=ny_data['Latitude'],
                                   longitudes=ny_data['Longitude']
                                  )

Parkwoods
Victoria Village
Lawrence Heights
Lawrence Manor
Don Mills North
Glencairn
Flemingdon Park
Don Mills South
Hillcrest Village
Bathurst Manor
Downsview North
Wilson Heights
Fairview
Henry Farm
Oriole
Northwood Park
York University
Bayview Village
CFB Toronto
Downsview East
Silver Hills
York Mills
Downsview West
Downsview
North Park
Upwood Park
Humber Summit
Newtonbrook
Willowdale
Downsview Central
Bedford Park
Lawrence Manor East
Emery
Humberlea
Willowdale South
Downsview Northwest
York Mills West
Willowdale West


Finds out unique caatgories of venues in North York

In [82]:
print('There are {} uniques categories.'.format(len(NorthYork_venues['Venue Category'].unique())))

There are 117 uniques categories.


## Neighbourhood Analysis 

In [85]:
# one hot encoding
NorthYork_onehot = pd.get_dummies(NorthYork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
NorthYork_onehot['Neighbourhood'] = NorthYork_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [NorthYork_onehot.columns[-1]] + list(NorthYork_onehot.columns[:-1])
NorthYork_onehot = NorthYork_onehot[fixed_columns]

NorthYork_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,African Restaurant,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,...,Supplement Shop,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Women's Store
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [86]:
NorthYork_grouped = NorthYork_onehot.groupby('Neighbourhood').mean().reset_index()
NorthYork_grouped

Unnamed: 0,Neighbourhood,Accessories Store,African Restaurant,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,...,Supplement Shop,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Women's Store
0,Bathurst Manor,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
2,Bedford Park,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,...,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0
3,CFB Toronto,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Don Mills South,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
6,Downsview,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Downsview Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Downsview East,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Downsview North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Displays each neighbourhood with most common Venues 

In [87]:
num_top_venues = 5

for hood in NorthYork_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = NorthYork_grouped[NorthYork_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bathurst Manor----
                       venue  freq
0              Deli / Bodega  0.14
1                Pizza Place  0.14
2                Coffee Shop  0.14
3              Grocery Store  0.14
4  Middle Eastern Restaurant  0.14


----Bayview Village----
               venue  freq
0              Trail  0.25
1               Park  0.25
2        Flower Shop  0.25
3        Gas Station  0.25
4  Accessories Store  0.00


----Bedford Park----
                venue  freq
0      Sandwich Place  0.09
1  Italian Restaurant  0.09
2         Coffee Shop  0.09
3          Restaurant  0.09
4           Juice Bar  0.04


----CFB Toronto----
         venue  freq
0      Airport   0.2
1  Coffee Shop   0.2
2         Park   0.2
3   Food Court   0.2
4   Shoe Store   0.2


----Don Mills North----
                       venue  freq
0                       Pool   0.5
1                       Park   0.5
2  Middle Eastern Restaurant   0.0
3                   Platform   0.0
4                Pizza Place   0.0


--

Sorth the Venues in descending order

In [88]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creates a dataframe that displays top 10 venues for each neighbourhood

In [90]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = NorthYork_grouped['Neighbourhood']

for ind in np.arange(NorthYork_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(NorthYork_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bathurst Manor,Deli / Bodega,Pizza Place,Coffee Shop,Grocery Store,Middle Eastern Restaurant,Fried Chicken Joint,Mediterranean Restaurant,Accessories Store,Optical Shop,Nightclub
1,Bayview Village,Trail,Park,Flower Shop,Gas Station,Accessories Store,Miscellaneous Shop,Pizza Place,Piano Bar,Pharmacy,Pet Store
2,Bedford Park,Sandwich Place,Italian Restaurant,Coffee Shop,Restaurant,Juice Bar,Liquor Store,Indian Restaurant,Grocery Store,Greek Restaurant,Pharmacy
3,CFB Toronto,Airport,Coffee Shop,Park,Food Court,Shoe Store,Accessories Store,Miscellaneous Shop,Pizza Place,Piano Bar,Pharmacy
4,Don Mills North,Pool,Park,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Optical Shop,Nightclub


## Cluster Neighbourhoods

Clusters the neighbourhood into 5 clusters by running k-means

In [91]:
# set number of clusters
kclusters = 5

NorthYork_grouped_clustering = NorthYork_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NorthYork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 1, 2, 0, 4, 2, 2], dtype=int32)

Creates a dataframe containing cluster as well as top 10 venues for each neighbourhood

In [109]:
# add clustering labels

#neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

NorthYork_merged = ny_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
NorthYork_merged = NorthYork_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

NorthYork_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.7545,-79.33,2,Park,Fast Food Restaurant,Food & Drink Shop,Accessories Store,Miscellaneous Shop,Pizza Place,Piano Bar,Pharmacy,Pet Store,Optical Shop
1,M4A,North York,Victoria Village,43.7276,-79.3148,2,Pizza Place,Portuguese Restaurant,Coffee Shop,Intersection,Park,Hockey Arena,Accessories Store,Miscellaneous Shop,Piano Bar,Pharmacy
2,M6A,North York,Lawrence Heights,43.7223,-79.4504,2,Clothing Store,Coffee Shop,Women's Store,Restaurant,Sushi Restaurant,Toy / Game Store,Bakery,Shoe Store,Sandwich Place,Men's Store
3,M6A,North York,Lawrence Manor,43.7223,-79.4504,2,Clothing Store,Coffee Shop,Women's Store,Restaurant,Sushi Restaurant,Toy / Game Store,Bakery,Shoe Store,Sandwich Place,Men's Store
4,M3B,North York,Don Mills North,43.745,-79.359,1,Pool,Park,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Optical Shop,Nightclub


Visualize the resulting clusters

In [113]:
# create map
map_clusters = folium.Map(location=[t_latitude, t_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NorthYork_merged['Latitude'], NorthYork_merged['Longitude'], NorthYork_merged['Neighbourhood'], NorthYork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Evaluste the Cluster

Examine the cluster and find the venue types which descriminates the cluster

Cluster 1

In [114]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 0, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,North York,0,Bakery,Basketball Court,Accessories Store,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Park
24,North York,0,Bakery,Basketball Court,Accessories Store,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Park
25,North York,0,Bakery,Basketball Court,Accessories Store,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Park


Cluster 2

In [115]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 1, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,North York,1,Pool,Park,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Optical Shop,Nightclub
27,North York,1,Playground,Piano Bar,Park,Middle Eastern Restaurant,Platform,Pizza Place,Pharmacy,Pet Store,Optical Shop,Nightclub
28,North York,1,Playground,Piano Bar,Park,Middle Eastern Restaurant,Platform,Pizza Place,Pharmacy,Pet Store,Optical Shop,Nightclub
36,North York,1,Convenience Store,Park,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Optical Shop,Nightclub


Cluster 3

In [116]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 2, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,2,Park,Fast Food Restaurant,Food & Drink Shop,Accessories Store,Miscellaneous Shop,Pizza Place,Piano Bar,Pharmacy,Pet Store,Optical Shop
1,North York,2,Pizza Place,Portuguese Restaurant,Coffee Shop,Intersection,Park,Hockey Arena,Accessories Store,Miscellaneous Shop,Piano Bar,Pharmacy
2,North York,2,Clothing Store,Coffee Shop,Women's Store,Restaurant,Sushi Restaurant,Toy / Game Store,Bakery,Shoe Store,Sandwich Place,Men's Store
3,North York,2,Clothing Store,Coffee Shop,Women's Store,Restaurant,Sushi Restaurant,Toy / Game Store,Bakery,Shoe Store,Sandwich Place,Men's Store
5,North York,2,Latin American Restaurant,Bakery,Grocery Store,Mediterranean Restaurant,Ice Cream Shop,Fast Food Restaurant,Gas Station,Japanese Restaurant,Pizza Place,Pet Store
6,North York,2,Trail,Park,River,Gym,Accessories Store,Mexican Restaurant,Piano Bar,Pharmacy,Pet Store,Optical Shop
7,North York,2,Trail,Park,River,Gym,Accessories Store,Mexican Restaurant,Piano Bar,Pharmacy,Pet Store,Optical Shop
8,North York,2,Park,Residential Building (Apartment / Condo),Bus Stop,Accessories Store,Playground,Pizza Place,Piano Bar,Pharmacy,Pet Store,Optical Shop
9,North York,2,Deli / Bodega,Pizza Place,Coffee Shop,Grocery Store,Middle Eastern Restaurant,Fried Chicken Joint,Mediterranean Restaurant,Accessories Store,Optical Shop,Nightclub
10,North York,2,Deli / Bodega,Pizza Place,Coffee Shop,Grocery Store,Middle Eastern Restaurant,Fried Chicken Joint,Mediterranean Restaurant,Accessories Store,Optical Shop,Nightclub


Cluster 4

In [117]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 3, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,North York,3,Pool,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Park,Optical Shop,Nightclub
21,North York,3,Pool,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Park,Optical Shop,Nightclub
22,North York,3,Vietnamese Restaurant,Pool,Accessories Store,Middle Eastern Restaurant,Pizza Place,Piano Bar,Pharmacy,Pet Store,Park,Optical Shop


Cluster 5

In [118]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 4, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,North York,4,Baseball Field,Accessories Store,Middle Eastern Restaurant,Platform,Pizza Place,Piano Bar,Pharmacy,Pet Store,Park,Optical Shop
