# Battle Of Neighbourhood - Capstone Project

### The recomender system for opening a new pizza place in the City of Toronto 

Main objective is to compare two boroughs North York and Downtown Toronto and find the optimum location for opening Pizza place business. 


In [78]:
import numpy as np
import pandas as pd
import requests
import folium
import json 
from pandas.io.json import json_normalize

from sklearn.cluster import KMeans
import matplotlib.cm as cm


import matplotlib.colors as colors

### Retrieve Data From Wikipedia

Read data from Wikipedia for the List of postal codes of Canada and scrape the page using BeautifulSoup, and create a dataframe with relavent information.

In [79]:
import requests
from bs4 import BeautifulSoup

neighbour_url = requests.get('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=942851379')
soup = BeautifulSoup(neighbour_url.text,'lxml')

neighbour_table = soup.find_all('table')[0]

Create data frame from html.

In [80]:
df = pd.read_html(str(neighbour_table))
df=pd.DataFrame(df[0]) 


## Data Wrangling

Drop the column with Borough not assigned

In [81]:
df.replace('Not assigned', np.nan, inplace=True)
df.dropna(subset=["Borough"], axis=0, inplace=True)

# reset index, because we droped rows
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West
206,M8Z,Etobicoke,Mimico NW
207,M8Z,Etobicoke,The Queensway West
208,M8Z,Etobicoke,Royal York South West


Combine the neighbourhood with same postcode

In [82]:
df = df.groupby(['Postcode','Borough'], sort=False).agg(', '.join)
df.reset_index(inplace=True)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


### Adding Geographical Information

Adding columns for longitude and lattitude

In [83]:
import pgeocode

# retrieve the latitude/longitude from a postal code in Canada 'ca'
nomi_ca = pgeocode.Nominatim('ca')

latitude = []
longitude = []


for index, row in df.iterrows():
    location = nomi_ca.query_postal_code(row[0])  # row[0] represents Postal Code value
    latitude.append(location.latitude)
    longitude.append(location.longitude)
    
# we put the result of the loop in new columns 'latitude' and 'longitude'
df['Latitude'] = latitude
df['Longitude'] = longitude


# pb with Canada Post Gateway Processing Centre > need to do the query manually
df.loc[df['Neighbourhood'] == "Canada Post Gateway Processing Centre", ['Latitude', 'Longitude']] = [43.636966,-79.615819]



In [84]:
df.head()



Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,Harbourfront,43.6555,-79.3626
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.7223,-79.4504
4,M7A,Downtown Toronto,Queen's Park,43.6641,-79.3889


### Explore Toronto Neighbourhood

In [85]:
from geopy import Nominatim # convert an address into latitude and longitude values

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
t_latitude= location.latitude
t_longitude = location.longitude






In [86]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[t_latitude, t_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(map_toronto)  
    
map_toronto






### Explore the Location North York

In [87]:
ny_data = df[df['Borough'] == 'North York'].reset_index(drop=True)
ny_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M6A,North York,"Lawrence Heights, Lawrence Manor",43.7223,-79.4504
3,M3B,North York,Don Mills North,43.745,-79.359
4,M6B,North York,Glencairn,43.7081,-79.4479


### Lets visualise the area North York

In [88]:
address = 'North York, ON'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
ny_latitude= location.latitude
ny_longitude = location.longitude

In [89]:
# create map of North York using latitude and longitude values
map_NorthYork = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(ny_data['Latitude'], ny_data['Longitude'], df['Borough'], ny_data['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(map_NorthYork)  
    
map_NorthYork


### Define Foursquare credentials:

In [90]:
CLIENT_ID = 'VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR' # your Foursquare ID
CLIENT_SECRET = '5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX' # your Foursquare Secret
ACCESS_TOKEN = 'ALJCTAFNXUMUZMJMTQ5GHBYLVCUM2RTWLJBQJQADBIXA2WBE'
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
print('ACCESS_TOKEN: '+ ACCESS_TOKEN)

Your credentails:
CLIENT_ID: VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR
CLIENT_SECRET:5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX
ACCESS_TOKEN: ALJCTAFNXUMUZMJMTQ5GHBYLVCUM2RTWLJBQJQADBIXA2WBE


### Search for a Specific Venue Type - Pizza place

In [91]:
address = 'North York, ON'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
ny_latitude= location.latitude
ny_longitude = location.longitude

Search for pizza place in North york in 1000metre radius

In [92]:
search_query = 'pizza'
radius = 1000
print(search_query + ' .... OK!')

pizza .... OK!


In [93]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, ny_latitude, ny_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR&client_secret=5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX&ll=43.7543263,-79.44911696639593&oauth_token=ALJCTAFNXUMUZMJMTQ5GHBYLVCUM2RTWLJBQJQADBIXA2WBE&v=20180605&query=pizza&radius=1000&limit=100'

In [94]:
results = requests.get(url).json()
#results

Exctract pizza place information from json file 

In [95]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe

  dataframe = json_normalize(venues)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.neighborhood
0,50f9bbcc5d24acebc25936af,Domino's Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793075,False,820 Sheppard Ave W,43.753127,-79.450926,"[{'label': 'display', 'lat': 43.75312660212406...",197,M3H 2T1,CA,Toronto,ON,Canada,"[820 Sheppard Ave W, Toronto ON M3H 2T1, Canada]",
1,4bfd9166b68d0f47f2f8e857,Pizza Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793075,False,618 SHEPPARD AVENUE WEST,43.755311,-79.441126,"[{'label': 'display', 'lat': 43.75531145041057...",651,M3H 2S1,CA,North York,ON,Canada,"[618 SHEPPARD AVENUE WEST, North York ON M3H 2...",Bathurst Manor
2,4f651fefe4b041039e5be9a9,Pizza e Pazzi,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1624793075,False,,43.749567,-79.456981,"[{'label': 'display', 'lat': 43.74956700597397...",824,,CA,,,Canada,[Canada],
3,4eea53e2754a186843dd5c77,Double Double Pizza and Chicken,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793075,False,871 Sheppard Ave W,43.752048,-79.454026,"[{'label': 'display', 'lat': 43.75204769507766...",469,M3H,CA,North York,ON,Canada,"[871 Sheppard Ave W, North York ON M3H, Canada]",
4,4d26693c467d6ea8e3d7b395,Pizza Nova,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793075,False,520 Wilson Heights Blvd.,43.750856,-79.456392,"[{'label': 'display', 'lat': 43.75085593685064...",701,M3H 2V6,CA,Toronto,ON,Canada,"[520 Wilson Heights Blvd., Toronto ON M3H 2V6,...",


### Filter the dataframe with information required:

In [96]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Domino's Pizza,Pizza Place,820 Sheppard Ave W,43.753127,-79.450926,"[{'label': 'display', 'lat': 43.75312660212406...",197,M3H 2T1,CA,Toronto,ON,Canada,"[820 Sheppard Ave W, Toronto ON M3H 2T1, Canada]",,50f9bbcc5d24acebc25936af
1,Pizza Pizza,Pizza Place,618 SHEPPARD AVENUE WEST,43.755311,-79.441126,"[{'label': 'display', 'lat': 43.75531145041057...",651,M3H 2S1,CA,North York,ON,Canada,"[618 SHEPPARD AVENUE WEST, North York ON M3H 2...",Bathurst Manor,4bfd9166b68d0f47f2f8e857
2,Pizza e Pazzi,Italian Restaurant,,43.749567,-79.456981,"[{'label': 'display', 'lat': 43.74956700597397...",824,,CA,,,Canada,[Canada],,4f651fefe4b041039e5be9a9
3,Double Double Pizza and Chicken,Pizza Place,871 Sheppard Ave W,43.752048,-79.454026,"[{'label': 'display', 'lat': 43.75204769507766...",469,M3H,CA,North York,ON,Canada,"[871 Sheppard Ave W, North York ON M3H, Canada]",,4eea53e2754a186843dd5c77
4,Pizza Nova,Pizza Place,520 Wilson Heights Blvd.,43.750856,-79.456392,"[{'label': 'display', 'lat': 43.75085593685064...",701,M3H 2V6,CA,Toronto,ON,Canada,"[520 Wilson Heights Blvd., Toronto ON M3H 2V6,...",,4d26693c467d6ea8e3d7b395


In [97]:
dataframe_filtered.name

0                     Domino's Pizza
1                        Pizza Pizza
2                      Pizza e Pazzi
3    Double Double Pizza and Chicken
4                         Pizza Nova
Name: name, dtype: object

### Lets Visualise the Pizza places nearby:

In [98]:
venues_map = folium.Map(location=[ny_latitude, ny_longitude], zoom_start=13) # generate map centred around North York
# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [ny_latitude, ny_longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Lets explore the venues in North York neighbourhood

In [99]:
neighborhood_latitude = ny_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = ny_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = ny_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7545, -79.33.


In [100]:
# type your answer here

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR&client_secret=5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX&v=20180605&ll=43.7545,-79.33&radius=500&limit=100'

In [101]:
results = requests.get(url).json()
#results

In [102]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [103]:
# Create dataframe of North York venues
NorthYork_venues = getNearbyVenues(names=ny_data['Neighbourhood'],
                                   latitudes=ny_data['Latitude'],
                                   longitudes=ny_data['Longitude']
                                  )

Parkwoods
Victoria Village
Lawrence Heights, Lawrence Manor
Don Mills North
Glencairn
Flemingdon Park, Don Mills South
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Fairview, Henry Farm, Oriole
Northwood Park, York University
Bayview Village
CFB Toronto, Downsview East
Silver Hills, York Mills
Downsview West
Downsview, North Park, Upwood Park
Humber Summit
Newtonbrook, Willowdale
Downsview Central
Bedford Park, Lawrence Manor East
Emery, Humberlea
Willowdale South
Downsview Northwest
York Mills West
Willowdale West


In [104]:
print(NorthYork_venues.shape)
NorthYork_venues.head()

(293, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.7545,-79.33,KFC,43.754387,-79.333021,Fast Food Restaurant
1,Parkwoods,43.7545,-79.33,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.7545,-79.33,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.7276,-79.3148,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.7276,-79.3148,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [105]:
#NorthYork_venues.groupby('Neighborhood').count()
NorthYork_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Airport,1,1,1,1,1,1
American Restaurant,3,3,3,3,3,3
Arts & Crafts Store,1,1,1,1,1,1
Asian Restaurant,2,2,2,2,2,2
Auto Garage,1,1,1,1,1,1
...,...,...,...,...,...,...
Toy / Game Store,4,4,4,4,4,4
Trail,3,3,3,3,3,3
Video Game Store,1,1,1,1,1,1
Vietnamese Restaurant,2,2,2,2,2,2


In [106]:
print('There are {} uniques categories.'.format(len(NorthYork_venues['Venue Category'].unique())))

There are 106 uniques categories.


### Analysis of Neighbourhood in North York

In [107]:
# one hot encoding
NorthYork_onehot = pd.get_dummies(NorthYork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
NorthYork_onehot['Neighborhood'] = NorthYork_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [NorthYork_onehot.columns[-1]] + list(NorthYork_onehot.columns[:-1])
NorthYork_onehot = NorthYork_onehot[fixed_columns]

NorthYork_onehot.head()

Unnamed: 0,Neighborhood,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Garage,Bakery,Bank,Bar,Baseball Field,...,Sports Bar,Supplement Shop,Sushi Restaurant,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Women's Store
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [108]:
NorthYork_grouped = NorthYork_onehot.groupby('Neighborhood').mean().reset_index()
NorthYork_grouped.head()

Unnamed: 0,Neighborhood,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Garage,Bakery,Bank,Bar,Baseball Field,...,Sports Bar,Supplement Shop,Sushi Restaurant,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Women's Store
0,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0
3,"CFB Toronto, Downsview East",0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Lets list out the neighbourhoods with most frequently visited Pizza places in ascendin order.

In [109]:
ny_pizzaplace = NorthYork_grouped.filter(items=['Neighborhood', 'Pizza Place']).sort_values(by=['Pizza Place'],ascending=False)

ny_pizzaplace.head(10)

Unnamed: 0,Neighborhood,Pizza Place
0,"Bathurst Manor, Downsview North, Wilson Heights",0.2
12,Glencairn,0.2
20,Victoria Village,0.166667
6,Downsview Northwest,0.105263
2,"Bedford Park, Lawrence Manor East",0.08
21,Willowdale South,0.058824
14,Humber Summit,0.0
22,Willowdale West,0.0
19,"Silver Hills, York Mills",0.0
18,Parkwoods,0.0


Most frequntly visited Pizza places are in Bathurst Manor, Downsview North, Wilson Heights and Glencairn

In [110]:
ny_compet = NorthYork_grouped.filter(items=['Neighborhood', 'Restaurant','Food & Drink Shop'])

ny_compet

Unnamed: 0,Neighborhood,Restaurant,Food & Drink Shop
0,"Bathurst Manor, Downsview North, Wilson Heights",0.0,0.0
1,Bayview Village,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.08,0.0
3,"CFB Toronto, Downsview East",0.0,0.0
4,Don Mills North,0.0,0.0
5,Downsview Central,0.0,0.0
6,Downsview Northwest,0.0,0.0
7,Downsview West,0.0,0.0
8,"Downsview, North Park, Upwood Park",0.0,0.0
9,"Emery, Humberlea",0.0,0.0


Displays Neighbourhoods with most common Venues

In [111]:
num_top_venues = 5

for hood in NorthYork_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = NorthYork_grouped[NorthYork_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bathurst Manor, Downsview North, Wilson Heights----
                      venue  freq
0               Pizza Place   0.2
1               Coffee Shop   0.2
2  Mediterranean Restaurant   0.2
3             Grocery Store   0.2
4       Fried Chicken Joint   0.2


----Bayview Village----
         venue  freq
0  Gas Station   0.2
1        Trail   0.2
2         Park   0.2
3      Dog Run   0.2
4  Flower Shop   0.2


----Bedford Park, Lawrence Manor East----
            venue  freq
0  Sandwich Place  0.08
1      Restaurant  0.08
2     Coffee Shop  0.08
3     Pizza Place  0.08
4   Grocery Store  0.04


----CFB Toronto, Downsview East----
         venue  freq
0      Airport   0.2
1   Food Court   0.2
2  Coffee Shop   0.2
3         Park   0.2
4   Shoe Store   0.2


----Don Mills North----
                venue  freq
0                Pool   0.5
1                Park   0.5
2             Airport   0.0
3  Mexican Restaurant   0.0
4         Pizza Place   0.0


----Downsview Central----
              

Sort the Venues in Descending order

In [112]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creates a dataframe that generates top 10 Venues

In [113]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighborhood'] = NorthYork_grouped['Neighborhood']

for ind in np.arange(NorthYork_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(NorthYork_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted#.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Downsview North, Wilson Heights",Pizza Place,Coffee Shop,Mediterranean Restaurant,Grocery Store,Fried Chicken Joint,Airport,Middle Eastern Restaurant,Pharmacy,Pet Store,Park
1,Bayview Village,Gas Station,Trail,Park,Dog Run,Flower Shop,Mexican Restaurant,Pharmacy,Pet Store,Nightclub,Moving Target
2,"Bedford Park, Lawrence Manor East",Sandwich Place,Restaurant,Coffee Shop,Pizza Place,Grocery Store,Butcher,Fast Food Restaurant,Pet Store,Pub,Comfort Food Restaurant
3,"CFB Toronto, Downsview East",Airport,Food Court,Coffee Shop,Park,Shoe Store,Latin American Restaurant,Miscellaneous Shop,Pizza Place,Pharmacy,Pet Store
4,Don Mills North,Pool,Park,Airport,Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Nightclub,Moving Target,Movie Theater
5,Downsview Central,Baseball Field,Airport,Mexican Restaurant,Platform,Pizza Place,Pharmacy,Pet Store,Park,Nightclub,Moving Target
6,Downsview Northwest,Grocery Store,Shopping Mall,Pizza Place,Discount Store,Liquor Store,Gas Station,Vietnamese Restaurant,Pharmacy,Fast Food Restaurant,Beer Store
7,Downsview West,Pool,Airport,Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Park,Nightclub,Moving Target,Movie Theater
8,"Downsview, North Park, Upwood Park",Trail,Bakery,Basketball Court,Airport,Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Park,Nightclub
9,"Emery, Humberlea",Discount Store,Latin American Restaurant,Construction & Landscaping,Coffee Shop,Grocery Store,Nightclub,Café,Airport,Movie Theater,Platform


### Cluster the Neighbourhoods

In [114]:
# set number of clusters
kclusters = 4

NorthYork_grouped_clustering = NorthYork_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NorthYork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 2, 1, 3, 0, 1, 0, 0], dtype=int32)

In [115]:
# add clustering labels

neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

NorthYork_merged = ny_data

# merge NorthYork_grouped with NorthYork_data to add latitude/longitude for each neighborhood
NorthYork_merged = NorthYork_merged.join(neighbourhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

NorthYork_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.7545,-79.33,2,Park,Fast Food Restaurant,Food & Drink Shop,Airport,Middle Eastern Restaurant,Pizza Place,Pharmacy,Pet Store,Nightclub,Moving Target
1,M4A,North York,Victoria Village,43.7276,-79.3148,2,Intersection,Pizza Place,Portuguese Restaurant,Hockey Arena,Coffee Shop,Park,Airport,Middle Eastern Restaurant,Pharmacy,Pet Store
2,M6A,North York,"Lawrence Heights, Lawrence Manor",43.7223,-79.4504,0,Clothing Store,Coffee Shop,Restaurant,Cosmetics Shop,Women's Store,Bakery,Toy / Game Store,Sushi Restaurant,Shoe Store,Food Court
3,M3B,North York,Don Mills North,43.745,-79.359,1,Pool,Park,Airport,Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Nightclub,Moving Target,Movie Theater
4,M6B,North York,Glencairn,43.7081,-79.4479,0,Pizza Place,Grocery Store,Bakery,Fish Market,Fast Food Restaurant,Ice Cream Shop,Asian Restaurant,Latin American Restaurant,Gas Station,Pharmacy


In [116]:
NorthYork_merged['Cluster Labels'].value_counts()


0    12
2     8
1     3
3     1
Name: Cluster Labels, dtype: int64

### Cluster Evaluation

As a result of clustering neighbourhoods, we can see that restaurants and pizza places come under the cluster labelled 0.
So we can evaluate that cluster to get better insight into the neighbourhood venues.

### Cluster 1

In [117]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 0, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,North York,0,Clothing Store,Coffee Shop,Restaurant,Cosmetics Shop,Women's Store,Bakery,Toy / Game Store,Sushi Restaurant,Shoe Store,Food Court
4,North York,0,Pizza Place,Grocery Store,Bakery,Fish Market,Fast Food Restaurant,Ice Cream Shop,Asian Restaurant,Latin American Restaurant,Gas Station,Pharmacy
7,North York,0,Pizza Place,Coffee Shop,Mediterranean Restaurant,Grocery Store,Fried Chicken Joint,Airport,Middle Eastern Restaurant,Pharmacy,Pet Store,Park
8,North York,0,Clothing Store,Fast Food Restaurant,Coffee Shop,Cosmetics Shop,Restaurant,Food Court,Bank,Baseball Field,Japanese Restaurant,Toy / Game Store
9,North York,0,Sports Bar,Middle Eastern Restaurant,Mediterranean Restaurant,Sandwich Place,Airport,Mexican Restaurant,Pharmacy,Pet Store,Park,Nightclub
10,North York,0,Gas Station,Trail,Park,Dog Run,Flower Shop,Mexican Restaurant,Pharmacy,Pet Store,Nightclub,Moving Target
14,North York,0,Trail,Bakery,Basketball Court,Airport,Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Park,Nightclub
15,North York,0,Furniture / Home Store,Auto Garage,Home Service,Rental Car Location,Business Service,Airport,Pizza Place,Pharmacy,Pet Store,Park
18,North York,0,Sandwich Place,Restaurant,Coffee Shop,Pizza Place,Grocery Store,Butcher,Fast Food Restaurant,Pet Store,Pub,Comfort Food Restaurant
19,North York,0,Discount Store,Latin American Restaurant,Construction & Landscaping,Coffee Shop,Grocery Store,Nightclub,Café,Airport,Movie Theater,Platform


### Cluster 2

In [118]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 1, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,North York,1,Pool,Park,Airport,Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Nightclub,Moving Target,Movie Theater
12,North York,1,Pool,Cafeteria,Middle Eastern Restaurant,Platform,Pizza Place,Pharmacy,Pet Store,Park,Nightclub,Moving Target
13,North York,1,Pool,Airport,Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Park,Nightclub,Moving Target,Movie Theater


### Cluster 3

In [119]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 2, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,2,Park,Fast Food Restaurant,Food & Drink Shop,Airport,Middle Eastern Restaurant,Pizza Place,Pharmacy,Pet Store,Nightclub,Moving Target
1,North York,2,Intersection,Pizza Place,Portuguese Restaurant,Hockey Arena,Coffee Shop,Park,Airport,Middle Eastern Restaurant,Pharmacy,Pet Store
5,North York,2,Trail,Park,Gym,River,Airport,Mexican Restaurant,Pharmacy,Pet Store,Nightclub,Moving Target
6,North York,2,Japanese Restaurant,Park,Residential Building (Apartment / Condo),Mexican Restaurant,Pizza Place,Pharmacy,Pet Store,Nightclub,Moving Target,Movie Theater
11,North York,2,Airport,Food Court,Coffee Shop,Park,Shoe Store,Latin American Restaurant,Miscellaneous Shop,Pizza Place,Pharmacy,Pet Store
16,North York,2,Park,Seafood Restaurant,Business Service,Playground,Airport,Middle Eastern Restaurant,Pizza Place,Pharmacy,Pet Store,Nightclub
22,North York,2,Convenience Store,Park,Plaza,Platform,Pizza Place,Pharmacy,Pet Store,Nightclub,Moving Target,Movie Theater
23,North York,2,Coffee Shop,Park,Locksmith,Bookstore,Airport,Middle Eastern Restaurant,Platform,Pizza Place,Pharmacy,Pet Store


### Cluster 4

In [120]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 3, NorthYork_merged.columns[[1] + list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,North York,3,Baseball Field,Airport,Mexican Restaurant,Platform,Pizza Place,Pharmacy,Pet Store,Park,Nightclub,Moving Target


Its seen that pizza places are one of the most common venue in North York neighbourhood, shows the popularity of Pizza places in the borough.

Visualise resulting Clusters

In [121]:
# create map
map_clusters = folium.Map(location=[t_latitude, t_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NorthYork_merged['Latitude'], NorthYork_merged['Longitude'], NorthYork_merged['Neighbourhood'], NorthYork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Clustering the venues in North York shows that the number of Pizza Places coming under fisrt 3 most common Venues are 5,and also North york is borough with a lot of popular venues and eateries to have healthy competetion to start a pizza place business.

## Analysis of Downtown toronto

In [122]:
dt_data = df[df['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
dt_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.6555,-79.3626
1,M7A,Downtown Toronto,Queen's Park,43.6641,-79.3889
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.6572,-79.3783
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756
4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754


In [123]:
address = 'Downtown Toronto, ON'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
dt_latitude= location.latitude
dt_longitude = location.longitude

In [124]:
search_query = 'pizza'
radius = 1000
print(search_query + ' .... OK!')

pizza .... OK!


In [125]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, dt_latitude, dt_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR&client_secret=5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX&ll=43.6563221,-79.3809161&oauth_token=ALJCTAFNXUMUZMJMTQ5GHBYLVCUM2RTWLJBQJQADBIXA2WBE&v=20180605&query=pizza&radius=1000&limit=100'

In [126]:
results = requests.get(url).json()
#results

In [127]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe

  dataframe = json_normalize(venues)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,venuePage.id,location.neighborhood
0,4b2438f6f964a520126424e3,Pizza Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,260 CHURCH STREET,43.656513,-79.377242,"[{'label': 'display', 'lat': 43.65651263631174...",296,M5B 1Z2,CA,Toronto,ON,Canada,"[260 CHURCH STREET, Toronto ON M5B 1Z2, Canada]",,,
1,4af5d885f964a520b2fd21e3,Amato Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,429A Yonge St,43.660215,-79.382571,"[{'label': 'display', 'lat': 43.66021482917061...",453,,CA,Toronto,ON,Canada,"[429A Yonge St (at College St), Toronto ON, Ca...",at College St,,
2,552ff1d1498e5f41b0ccb3bd,Mamma's Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,127 Yonge St,43.650891,-79.378632,"[{'label': 'display', 'lat': 43.650891, 'lng':...",631,M5C 1W4,CA,Toronto,ON,Canada,"[127 Yonge St, Toronto ON M5C 1W4, Canada]",,,
3,59f28358acb00b73e36783d6,Domino's Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,67 Richmond St E,43.652456,-79.374938,"[{'label': 'display', 'lat': 43.65245635983421...",645,M5C 1N9,CA,Toronto,ON,Canada,"[67 Richmond St E, Toronto ON M5C 1N9, Canada]",,,
4,4ca62112f47ea14380845d21,Pizza 2 Go,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,65 Front St. W,43.64609,-79.379776,"[{'label': 'display', 'lat': 43.64608978691595...",1142,M5J,CA,Toronto,ON,Canada,"[65 Front St. W (Union Station), Toronto ON M5...",Union Station,,
5,4b5fdb7ff964a520ebce29e3,Express Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,447 Church St.,43.663787,-79.380224,"[{'label': 'display', 'lat': 43.66378730691813...",832,M4Y 2C5,CA,Toronto,ON,Canada,"[447 Church St. (at Alexander St.), Toronto ON...",at Alexander St.,,
6,4cdd8afb78ddf04dfbf29498,St. Lawrence Pizza and Pasta,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1624793089,False,93 Front St. E.,43.648378,-79.371578,"[{'label': 'display', 'lat': 43.64837838784134...",1160,M5E 1C3,CA,Toronto,ON,Canada,[93 Front St. E. (St Lawrence Market (Upper Le...,St Lawrence Market (Upper Level 36),,
7,5968973e93bd637ca76c60c8,241 Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,142 Parliament St.,43.654622,-79.364211,"[{'label': 'display', 'lat': 43.6546222, 'lng'...",1358,M5A 2Z1,CA,Toronto,ON,Canada,"[142 Parliament St., Toronto ON M5A 2Z1, Canada]",,,
8,4b2d3727f964a52047d124e3,Pizzaiolo Gourmet Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,270 Adelaide St W,43.647839,-79.390293,"[{'label': 'display', 'lat': 43.64783852702828...",1209,M5H 1X6,CA,Toronto,ON,Canada,"[270 Adelaide St W (at John St), Toronto ON M5...",at John St,,
9,5615b6c4498e3c32c67ad78f,Blaze Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1624793089,False,"10 Dundas Street East, #124",43.656518,-79.380015,"[{'label': 'display', 'lat': 43.656518, 'lng':...",75,M5B 2G9,CA,Toronto,ON,Canada,"[10 Dundas Street East, #124, Toronto ON M5B 2...",,,


In [128]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,Pizza Pizza,Pizza Place,260 CHURCH STREET,43.656513,-79.377242,"[{'label': 'display', 'lat': 43.65651263631174...",296,M5B 1Z2,CA,Toronto,ON,Canada,"[260 CHURCH STREET, Toronto ON M5B 1Z2, Canada]",,,4b2438f6f964a520126424e3
1,Amato Pizza,Pizza Place,429A Yonge St,43.660215,-79.382571,"[{'label': 'display', 'lat': 43.66021482917061...",453,,CA,Toronto,ON,Canada,"[429A Yonge St (at College St), Toronto ON, Ca...",at College St,,4af5d885f964a520b2fd21e3
2,Mamma's Pizza,Pizza Place,127 Yonge St,43.650891,-79.378632,"[{'label': 'display', 'lat': 43.650891, 'lng':...",631,M5C 1W4,CA,Toronto,ON,Canada,"[127 Yonge St, Toronto ON M5C 1W4, Canada]",,,552ff1d1498e5f41b0ccb3bd
3,Domino's Pizza,Pizza Place,67 Richmond St E,43.652456,-79.374938,"[{'label': 'display', 'lat': 43.65245635983421...",645,M5C 1N9,CA,Toronto,ON,Canada,"[67 Richmond St E, Toronto ON M5C 1N9, Canada]",,,59f28358acb00b73e36783d6
4,Pizza 2 Go,Pizza Place,65 Front St. W,43.64609,-79.379776,"[{'label': 'display', 'lat': 43.64608978691595...",1142,M5J,CA,Toronto,ON,Canada,"[65 Front St. W (Union Station), Toronto ON M5...",Union Station,,4ca62112f47ea14380845d21
5,Express Pizza,Pizza Place,447 Church St.,43.663787,-79.380224,"[{'label': 'display', 'lat': 43.66378730691813...",832,M4Y 2C5,CA,Toronto,ON,Canada,"[447 Church St. (at Alexander St.), Toronto ON...",at Alexander St.,,4b5fdb7ff964a520ebce29e3
6,St. Lawrence Pizza and Pasta,Italian Restaurant,93 Front St. E.,43.648378,-79.371578,"[{'label': 'display', 'lat': 43.64837838784134...",1160,M5E 1C3,CA,Toronto,ON,Canada,[93 Front St. E. (St Lawrence Market (Upper Le...,St Lawrence Market (Upper Level 36),,4cdd8afb78ddf04dfbf29498
7,241 Pizza,Pizza Place,142 Parliament St.,43.654622,-79.364211,"[{'label': 'display', 'lat': 43.6546222, 'lng'...",1358,M5A 2Z1,CA,Toronto,ON,Canada,"[142 Parliament St., Toronto ON M5A 2Z1, Canada]",,,5968973e93bd637ca76c60c8
8,Pizzaiolo Gourmet Pizza,Pizza Place,270 Adelaide St W,43.647839,-79.390293,"[{'label': 'display', 'lat': 43.64783852702828...",1209,M5H 1X6,CA,Toronto,ON,Canada,"[270 Adelaide St W (at John St), Toronto ON M5...",at John St,,4b2d3727f964a52047d124e3
9,Blaze Pizza,Pizza Place,"10 Dundas Street East, #124",43.656518,-79.380015,"[{'label': 'display', 'lat': 43.656518, 'lng':...",75,M5B 2G9,CA,Toronto,ON,Canada,"[10 Dundas Street East, #124, Toronto ON M5B 2...",,,5615b6c4498e3c32c67ad78f


In [129]:
dataframe_filtered.name

0                                Pizza Pizza
1                                Amato Pizza
2                              Mamma's Pizza
3                             Domino's Pizza
4                                 Pizza 2 Go
5                              Express Pizza
6               St. Lawrence Pizza and Pasta
7                                  241 Pizza
8                    Pizzaiolo Gourmet Pizza
9                                Blaze Pizza
10                                 JZs Pizza
11                                Pizza Shab
12                               Pizza Pizza
13                              Boston Pizza
14                           Colombo's Pizza
15                    Vinnie's Pizza & Pasta
16                                Pizza Nova
17                             Mamma's Pizza
18                                 Pizza Hut
19                               Pizza Pizza
20                          Fantastico Pizza
21                              Pizza Studio
22        

### Lets Visualise the Pizza shops in Downtown Toronto

In [130]:
venues_map = folium.Map(location=[dt_latitude, dt_longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [dt_latitude, dt_longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Lets Explore the nearby venues

In [131]:
nh_latitude = dt_data.loc[0, 'Latitude'] # neighborhood latitude value
nh_longitude = dt_data.loc[0, 'Longitude'] # neighborhood longitude value

nh_name = dt_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(nh_name, 
                                                               nh_latitude, 
                                                               nh_longitude))

Latitude and longitude values of Harbourfront are 43.6555, -79.3626.


In [132]:

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    nh_latitude, 
    nh_longitude, 
    radius, 
    LIMIT)
url




'https://api.foursquare.com/v2/venues/explore?&client_id=VZ3YOHRX44O4NYLOUDUUKVHKP1VSZRBJWQNEBWAUXWSLMZNR&client_secret=5FEODWZZY32QK41CN0USSL3UBZNVOEBN43ILYUS4Y1Y3I4GX&v=20180605&ll=43.6555,-79.3626&radius=500&limit=100'

In [133]:
results = requests.get(url).json()
#results

In [134]:


# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Tandem Coffee,Coffee Shop,43.653559,-79.361809
1,Roselle Desserts,Bakery,43.653447,-79.362017
2,Souvlaki Express,Greek Restaurant,43.655584,-79.364438
3,Berkeley Church,Event Space,43.655123,-79.365873
4,Figs Breakfast & Lunch,Breakfast Spot,43.655675,-79.364503


In [135]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [136]:
# Create dataframe of North York venues
dt_venues = getNearbyVenues(names=dt_data['Neighbourhood'],
                                   latitudes=dt_data['Latitude'],
                                   longitudes=dt_data['Longitude']
                                  )

Harbourfront
Queen's Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown, St. James Town
First Canadian Place, Underground city
Church and Wellesley


In [137]:
print(dt_venues.shape)
dt_venues.head()

(1108, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.6555,-79.3626,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,Harbourfront,43.6555,-79.3626,Roselle Desserts,43.653447,-79.362017,Bakery
2,Harbourfront,43.6555,-79.3626,Souvlaki Express,43.655584,-79.364438,Greek Restaurant
3,Harbourfront,43.6555,-79.3626,Berkeley Church,43.655123,-79.365873,Event Space
4,Harbourfront,43.6555,-79.3626,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot


There are 1108 different venues in Downtown Toronto.

In [138]:
dt_venues.groupby('Venue Category').size()

Venue Category
Adult Boutique            2
Afghan Restaurant         1
American Restaurant      11
Art Gallery               9
Arts & Crafts Store       2
                         ..
Vietnamese Restaurant     5
Wine Bar                  8
Wine Shop                 1
Wings Joint               2
Yoga Studio               3
Length: 162, dtype: int64

### Analysis of neighbourhood venues

In [139]:
# one hot encoding
dt_onehot = pd.get_dummies(dt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighborhood'] = dt_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])
dt_onehot = dt_onehot[fixed_columns]

dt_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,...,Theater,Theme Restaurant,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [140]:
dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()
dt_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Theater,Theme Restaurant,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.016949,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.033898
3,"Cabbagetown, St. James Town",0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.017544,0.0,0.0


In [141]:
dt_pizzaplace = dt_grouped.filter(items=['Neighborhood', 'Pizza Place']).sort_values(by=['Pizza Place'],ascending=False)

dt_pizzaplace.head(10)

Unnamed: 0,Neighborhood,Pizza Place
3,"Cabbagetown, St. James Town",0.058824
4,Central Bay Street,0.052632
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.033898
16,"Ryerson, Garden District",0.03
0,"Adelaide, King, Richmond",0.02
10,"First Canadian Place, Underground city",0.02
8,"Commerce Court, Victoria Hotel",0.02
5,"Chinatown, Grange Park, Kensington Market",0.019231
7,Church and Wellesley,0.014925
18,Stn A PO Boxes 25 The Esplanade,0.012658


Areas in Downtown Toronto where pizza places are most frequently visited are Cabbahe Town, St. James Town and Central Bay Street

In [142]:
num_top_venues = 5

for hood in dt_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dt_grouped[dt_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
              venue  freq
0       Coffee Shop  0.13
1              Café  0.08
2    Sandwich Place  0.06
3  Asian Restaurant  0.04
4        Restaurant  0.04


----Berczy Park----
            venue  freq
0  Sandwich Place  0.07
1     Coffee Shop  0.05
2    Cocktail Bar  0.05
3           Hotel  0.05
4        Beer Bar  0.04


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
                venue  freq
0         Coffee Shop  0.08
1  Italian Restaurant  0.07
2         Wings Joint  0.03
3              Bakery  0.03
4   French Restaurant  0.03


----Cabbagetown, St. James Town----
                venue  freq
0         Coffee Shop  0.12
1                Café  0.09
2         Pizza Place  0.06
3  Italian Restaurant  0.06
4  Chinese Restaurant  0.06


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.23
1       Sandwich Place  0.07
2     Sushi Restaurant  0.05
3  Japane

Sort the Venues in descending order

In [143]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creates a dataframe that displays top 10 venues for each neighbourhood

In [144]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#Creates a dataframe that generates top 10 Venues

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
nh_venues_sorted = pd.DataFrame(columns=columns)
nh_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']

for ind in np.arange(dt_grouped.shape[0]):
    nh_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

nh_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Sandwich Place,Asian Restaurant,Restaurant,Sushi Restaurant,Gym,Hotel,Japanese Restaurant,Bank
1,Berczy Park,Sandwich Place,Coffee Shop,Cocktail Bar,Hotel,Beer Bar,Bakery,Café,Japanese Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Coffee Shop,Italian Restaurant,Wings Joint,Bakery,French Restaurant,Gym / Fitness Center,Park,Pub,Bar,Bank
3,"Cabbagetown, St. James Town",Coffee Shop,Café,Pizza Place,Italian Restaurant,Chinese Restaurant,Restaurant,Bakery,Caribbean Restaurant,Sandwich Place,Convenience Store
4,Central Bay Street,Coffee Shop,Sandwich Place,Sushi Restaurant,Japanese Restaurant,Pizza Place,Italian Restaurant,Restaurant,Café,Middle Eastern Restaurant,Shoe Store


### Cluster Neighbourhoods

Clusters the neighbourhood into 5 clusters by running k-means

In [145]:
# set number of clusters
kclusters = 5

dt_grouped_clustering = dt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 0, 2, 2, 0, 4, 0, 2, 2], dtype=int32)

In [146]:
# add clustering labels

nh_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dt_merged = dt_data

# merge NorthYork_grouped with NorthYork_data to add latitude/longitude for each neighborhood
dt_merged = dt_merged.join(nh_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

dt_merged#.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.6555,-79.3626,2,Coffee Shop,Event Space,Beer Store,Electronics Store,Thai Restaurant,Bakery,Pub,Restaurant,Dance Studio,Greek Restaurant
1,M7A,Downtown Toronto,Queen's Park,43.6641,-79.3889,0,Sushi Restaurant,Gym,Burrito Place,Park,Coffee Shop,College Cafeteria,Ramen Restaurant,College Theater,Persian Restaurant,Dance Studio
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.6572,-79.3783,2,Coffee Shop,Clothing Store,Sandwich Place,Café,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Pizza Place,Bank,Cosmetics Shop
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,0,Coffee Shop,Café,Italian Restaurant,Cocktail Bar,Restaurant,Clothing Store,Beer Bar,Japanese Restaurant,Farmers Market,Gastropub
4,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,0,Sandwich Place,Coffee Shop,Cocktail Bar,Hotel,Beer Bar,Bakery,Café,Japanese Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant
5,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,2,Coffee Shop,Sandwich Place,Sushi Restaurant,Japanese Restaurant,Pizza Place,Italian Restaurant,Restaurant,Café,Middle Eastern Restaurant,Shoe Store
6,M6G,Downtown Toronto,Christie,43.6683,-79.4205,4,Grocery Store,Café,Coffee Shop,Park,Athletics & Sports,Baby Store,Museum,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant
7,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.6496,-79.3833,2,Coffee Shop,Café,Sandwich Place,Asian Restaurant,Restaurant,Sushi Restaurant,Gym,Hotel,Japanese Restaurant,Bank
8,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.623,-79.3936,1,Park,Harbor / Marina,Café,Music Venue,Moroccan Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant
9,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.6469,-79.3823,2,Coffee Shop,Café,Hotel,Sandwich Place,Gym,Japanese Restaurant,Pharmacy,Deli / Bodega,Salad Place,Restaurant


In [147]:
dt_merged['Cluster Labels'].value_counts()


2    10
0     6
1     1
3     1
4     1
Name: Cluster Labels, dtype: int64

### Cluster Evaluation

### Cluster 1

In [148]:
dt_merged.loc[dt_merged['Cluster Labels'] == 0, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,0,Sushi Restaurant,Gym,Burrito Place,Park,Coffee Shop,College Cafeteria,Ramen Restaurant,College Theater,Persian Restaurant,Dance Studio
3,Downtown Toronto,0,Coffee Shop,Café,Italian Restaurant,Cocktail Bar,Restaurant,Clothing Store,Beer Bar,Japanese Restaurant,Farmers Market,Gastropub
4,Downtown Toronto,0,Sandwich Place,Coffee Shop,Cocktail Bar,Hotel,Beer Bar,Bakery,Café,Japanese Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant
12,Downtown Toronto,0,Café,Vegetarian / Vegan Restaurant,Bakery,Coffee Shop,Mexican Restaurant,Burger Joint,Gaming Cafe,Vietnamese Restaurant,Art Gallery,Noodle House
13,Downtown Toronto,0,Coffee Shop,Italian Restaurant,Wings Joint,Bakery,French Restaurant,Gym / Fitness Center,Park,Pub,Bar,Bank
18,Downtown Toronto,0,Sushi Restaurant,Japanese Restaurant,Restaurant,Gay Bar,Coffee Shop,Fast Food Restaurant,Mediterranean Restaurant,Men's Store,Indian Restaurant,Burrito Place


### Cluster 2

In [149]:
dt_merged.loc[dt_merged['Cluster Labels'] == 1, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Downtown Toronto,1,Park,Harbor / Marina,Café,Music Venue,Moroccan Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant


### Cluster 3

In [150]:
dt_merged.loc[dt_merged['Cluster Labels'] == 2, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,2,Coffee Shop,Event Space,Beer Store,Electronics Store,Thai Restaurant,Bakery,Pub,Restaurant,Dance Studio,Greek Restaurant
2,Downtown Toronto,2,Coffee Shop,Clothing Store,Sandwich Place,Café,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Pizza Place,Bank,Cosmetics Shop
5,Downtown Toronto,2,Coffee Shop,Sandwich Place,Sushi Restaurant,Japanese Restaurant,Pizza Place,Italian Restaurant,Restaurant,Café,Middle Eastern Restaurant,Shoe Store
7,Downtown Toronto,2,Coffee Shop,Café,Sandwich Place,Asian Restaurant,Restaurant,Sushi Restaurant,Gym,Hotel,Japanese Restaurant,Bank
9,Downtown Toronto,2,Coffee Shop,Café,Hotel,Sandwich Place,Gym,Japanese Restaurant,Pharmacy,Deli / Bodega,Salad Place,Restaurant
10,Downtown Toronto,2,Coffee Shop,Café,Sandwich Place,Hotel,Bank,Japanese Restaurant,Restaurant,Asian Restaurant,Sushi Restaurant,Deli / Bodega
11,Downtown Toronto,2,Coffee Shop,Café,Sandwich Place,Japanese Restaurant,Pub,Bakery,Bar,Gym,Beer Store,French Restaurant
15,Downtown Toronto,2,Coffee Shop,Sandwich Place,Restaurant,Gym,Bank,Hotel,Deli / Bodega,Japanese Restaurant,Hotel Bar,Park
16,Downtown Toronto,2,Coffee Shop,Café,Pizza Place,Italian Restaurant,Chinese Restaurant,Restaurant,Bakery,Caribbean Restaurant,Sandwich Place,Convenience Store
17,Downtown Toronto,2,Coffee Shop,Café,Sandwich Place,Hotel,Bank,Japanese Restaurant,Restaurant,Asian Restaurant,Sushi Restaurant,Deli / Bodega


### Cluster 4

In [151]:
dt_merged.loc[dt_merged['Cluster Labels'] == 3, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,3,Grocery Store,Playground,Park,Candy Store,Yoga Studio,Moroccan Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant


### Cluster 5

In [152]:
dt_merged.loc[dt_merged['Cluster Labels'] == 4, dt_merged.columns[[1] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Downtown Toronto,4,Grocery Store,Café,Coffee Shop,Park,Athletics & Sports,Baby Store,Museum,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant


Taking closer look into each cluster, Pizza places are not coming under first 5 Most common Venues

### Visualise the resulting Cluster

In [153]:
# create map
map_clusters = folium.Map(location=[t_latitude, t_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighbourhood'], dt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Clustering the most most common Venues in Downtown Toronto shows that number Pizza places that  comes under the catagory first 3 most common venues is only 1, very high competition as there are aroud 41 pizza places and numourous eateries.

## Conclusion

North York is borough in Toronto with 289 venues and only 5 Pizza Places, with frequency of Pizza places occur in 1st, 3rd and 4th most common venues, and having good number of restaurants around for a healthy competition.
Downtown Toronto at the same time has 1108 venues with 50 pizza places with frequency of occurrence is hardly 2 in first five most common venue.

So with the help of k-means clustering its concluded that North York is the better choice for opening up a new pizza place as it is identified to be one of the most popular venue in the borough.
