# Capstone Project : To find the best neighborhood to open a restaurant in Toronto

#### Week 2 : Final Code

Author : Avinandan Mukherjee

This notebook contains the necessary codes to load the dataset, develop the dataframe and then build respective models based on analysis and exploration to achive the desired result. The aim of this project is to build a recommender system for ABC Company what will considered to be a success on where they can open a retaurant which can operate smoothly with a steady revenue in that neighbourhood with close proxiity to neareast suppliers.

## 1. Load all datasets & explore Toronto - Neighbourhoods, Population & Geospatial

### A. Toronto Neighbourhoods broken by Postal Code

In [1]:
# Load the required libraries
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

# Found the table using beautifulsoup and used Pandas to read it in. 
res = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))

# WRANGLE/Transform The Data by conerting into a dataframe
data = pd.DataFrame(df[0])

# Rename the columns as instructed
data = data.rename(columns={0:'Postal Code', 1:'Borough', 2:'Neighbourhood'})

# Get rid of the first row which contained the table headers from the webpage
data = data.iloc[1:]

# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
data = data[~data['Borough'].str.contains('Not assigned')]

# Since more than one neighborhood can exist in one postal code area. 
#These multiple values of neighborhoods will be combined into one row with 
#the neighborhoods separated with a comma for the same postal code.
df_TorontoNPC=data.groupby(['Postal Code', 'Borough']).apply(lambda group: ', '.join(group['Neighbourhood']))


# Convert the Series back into a DataFrame and put the 'Neighbourhood' column label back in
df_TorontoNPC=df_TorontoNPC.to_frame().reset_index()
df_TorontoNPC = df_TorontoNPC.rename(columns={0:'Neighbourhood'})

# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
df_TorontoNPC.loc[df_TorontoNPC.Neighbourhood == 'Not assigned', 'Neighbourhood' ] = df_TorontoNPC.Borough

# Display the DataFrame
df_TorontoNPC.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Since data might not to be available to us in csv format everytime.In these scenarios we have to extract data from the url directly. Using the python library beautifulsoup we can easily extract any tables into csv format from the url. Followed by required Data cleaning as needed.

### B. Merging Toronto Geospatial coordinates with Postal Codes

In [2]:
#Load Toronto geospatial cooridinates
!wget -O to_geo_space.csv http://cocl.us/Geospatial_data
print('Data Downloaded !')

#Read into dataframe
gf = pd.read_csv('to_geo_space.csv')

#Merge the Toronto data with geo cooridinate data
df_TorontoGeo = pd.merge(df_TorontoNPC, gf, on='Postal Code', how='inner')

#Rename required column to match and for reporting convenience
df_TorontoGeo = df_TorontoGeo.rename(columns={'Postal Code':'PostalCode'})

#Display the new dataframe
df_TorontoGeo.head()

--2021-01-13 21:14:13--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 169.63.96.176, 169.63.96.194
Connecting to cocl.us (cocl.us)|169.63.96.176|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2021-01-13 21:14:13--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|169.63.96.176|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2021-01-13 21:14:14--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.29.197
Connecting to ibm.box.com (ibm.box.com)|107.152.29.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2021-01-13 21:14:14--  https://ibm.box.com/public/static/9afzr83p

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### C. Toronto Neighbourhood Population by Postal Code

In [3]:
import ssl
#to avoid url certification error in python 3.6 and above
ssl._create_default_https_context = ssl._create_unverified_context

# Load this data from Statistics Canada portal using the URL below
df_TorontoPop = pd.read_csv('https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/File.cfm?T=1201&SR=1&RPP=9999&PR=0&CMA=0&CSD=0&S=22&O=A&Lang=Eng&OFT=CSV',encoding= 'unicode_escape')
print('Data Downloaded !')

# Rename the columns appropiatley
df_TorontoPop = df_TorontoPop.rename(columns={'Geographic code':'PostalCode', 'Geographic name':'PostalCod2', 'Province or territory':'Province', 'Incompletely enumerated Indian reserves and Indian settlements, 2016':'Incomplete', 'Population, 2016':'Population_2016', 'Total private dwellings, 2016':'TotalPrivDwellings', 'Private dwellings occupied by usual residents, 2016':'PrivDwellingsOccupied'})
df_TorontoPop= df_TorontoPop.drop(columns=['PostalCod2', 'Province', 'Incomplete', 'TotalPrivDwellings', 'PrivDwellingsOccupied'])

# Get rid of the first row & display the dataframe
df_TorontoPop = df_TorontoPop.iloc[1:]
df_TorontoPop.head()

Data Downloaded !


Unnamed: 0,PostalCode,Population_2016
1,A0A,46587.0
2,A0B,19792.0
3,A0C,12587.0
4,A0E,22294.0
5,A0G,35266.0


### D. Merging data of Toronto for Neighbourhood Population with Postal Code - Demographic Exploration 

In [57]:
#Merge the Toronto Population data with geospatial postalcode data
df_TorontoGeo
df_TorontoGeo1 = pd.merge(df_TorontoPop, df_TorontoGeo, on='PostalCode', how='right')

# Sorting on population
df_TorontoGeo1 = df_TorontoGeo1.sort_values(by=['Population_2016'], ascending=False)

# Display the new dataframe
df_TorontoGeo1 = df_TorontoGeo1.rename(columns={'Neighbourhood':'Neighborhood'})
df_TorontoGeo1.head()

Unnamed: 0,PostalCode,Population_2016,Borough,Neighborhood,Latitude,Longitude
22,M2N,75897.0,North York,"Willowdale, Willowdale East",43.77012,-79.408493
0,M1B,66108.0,Scarborough,"Malvern, Rouge",43.806686,-79.194353
18,M2J,58293.0,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
100,M9V,55959.0,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
14,M1V,54680.0,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577


__Analysis:__ Here we see the list of Toronto Neighbourhoods Population with their geospatial cooerdinates. Hence, answering one of our KPI. ABC Company can plan which could be a target neighbourhood to open a restaurant.

## Load all datasets & explore Toronto - Avg Neighbourhood Income with Canadian National Average

### A. Average Net Income of Canadain Population

From this URL: 
<https://www150.statcan.gc.ca/n1/daily-quotidien/180313/t001a-eng.htm>


In [6]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Family Type,2016 Median Net Income
0,Economic families and unattached individuals,57000
1,Economic families,78400
2,Senior families,57800
3,Non-senior families,84800
4,Two-parent families with children,94500



We can see that in 2016, Canadian families & unattched individuals had a median post-tax income of $57,000.

## Explore & analyze Toronto - Neighbourhood Crime Rates

### A. Toronto Neighbourhoods by Crime rate - Segmenting & Clustering

In [7]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighbourhood,Longitude,Latitude,PremiseType,Offence,EventUnique_Id,OccurrenceDate,ReportedDate,Division,Hood_ID,ObjectId
0,University (79),-79.405228,43.656982,Commercial,Assault,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z,D14,79,7001
1,Tam O'Shanter-Sullivan (118),-79.307907,43.778732,Commercial,Assault,GO-20151417245,2015-08-15T21:45:00.000Z,2015-08-17T22:11:00.000Z,D42,118,7002
2,Woburn (137),-79.225029,43.765942,Apartment,Break and Enter,GO-20151421107,2015-08-16T16:00:00.000Z,2015-08-18T14:40:00.000Z,D43,137,7003
3,Centennial Scarborough (133),-79.140823,43.778648,Other,Break and Enter,GO-20152167714,2015-11-26T13:00:00.000Z,2015-12-18T13:38:00.000Z,D43,133,7004
4,Taylor-Massey (61),-79.288361,43.691235,Commercial,Assault,GO-20152169954,2015-12-18T19:50:00.000Z,2015-12-18T19:55:00.000Z,D55,61,7005


In [10]:
# Expolring Toronto Crime data on commercial PremiseType by Slicing & Dicing

df_TorCrime1 = df_TorCrime[df_TorCrime['PremiseType'] == 'Commercial']
df_TorCrime1.head()

Unnamed: 0,Neighbourhood,Longitude,Latitude,PremiseType,Offence,EventUnique_Id,OccurrenceDate,ReportedDate,Division,Hood_ID,ObjectId
0,University (79),-79.405228,43.656982,Commercial,Assault,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z,D14,79,7001
1,Tam O'Shanter-Sullivan (118),-79.307907,43.778732,Commercial,Assault,GO-20151417245,2015-08-15T21:45:00.000Z,2015-08-17T22:11:00.000Z,D42,118,7002
4,Taylor-Massey (61),-79.288361,43.691235,Commercial,Assault,GO-20152169954,2015-12-18T19:50:00.000Z,2015-12-18T19:55:00.000Z,D55,61,7005
10,West Humber-Clairville (1),-79.61132,43.71069,Commercial,Assault - Resist/ Prevent Seiz,GO-20151976877,2015-11-18T05:35:00.000Z,2015-11-18T05:42:00.000Z,D23,1,7011
12,Downsview-Roding-CFB (26),-79.508636,43.720917,Commercial,Assault,GO-20152061875,2015-12-01T22:19:00.000Z,2015-12-01T22:19:00.000Z,D31,26,7013


In [12]:
# Merging Toronto GeopSpatial Data with Crime occurence, followed by Slicing Dicing & Data Cleaning
df_TorNbCrime = pd.merge(df_TorontoGeo1, df_TorCrime1, how='right', on=["Neighbourhood"])

# Drop unnwanted columns
df_TorNbCrime = df_TorNbCrime.drop(columns=['PostalCode', 'Population_2016', 'Borough', 'Latitude_x', 'Longitude_x', 'Division', 'Hood_ID', 'ObjectId'])
#Rename column name as needed
df_TorNbCrime = df_TorNbCrime.rename(columns={'Longitude_y':'Longitude', 'Latitude_y':'Latitude'})
# Slice & Dice data for commercial PremiseType
df_TorNbCrime = df_TorNbCrime[df_TorNbCrime['PremiseType'] == 'Commercial']
#Display dataframe
df_TorNbCrime.head()

Unnamed: 0,Neighbourhood,Longitude,Latitude,PremiseType,Offence,EventUnique_Id,OccurrenceDate,ReportedDate
0,University (79),-79.405228,43.656982,Commercial,Assault,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z
1,University (79),-79.409767,43.661518,Commercial,Break and Enter,GO-20152066735,2015-12-02T00:00:00.000Z,2015-12-02T16:34:00.000Z
2,University (79),-79.407471,43.665916,Commercial,Assault Bodily Harm,GO-20151692140,2015-09-06T01:30:00.000Z,2015-10-01T01:32:00.000Z
3,University (79),-79.405228,43.656982,Commercial,Assault,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z
4,University (79),-79.405228,43.656982,Commercial,Assault Peace Officer,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z


In [13]:
# Print the result
print('The Toronto Crime by Postal Code dateframe has {} Offence in {} Neighbourhood.'.format(
        len(df_TorNbCrime['Offence'].unique()),
        df_TorNbCrime.shape[0]
    )
)

The Toronto Crime by Postal Code dateframe has 42 Offence in 41081 Neighbourhood.


__Analysis:__ After clustering the neighbourhoods by crime and postal code in commercial PremiseType, we flist out the locations that ABC company should probably try to avoid.

### B. Toronto Neighbourhoods by Crime and Postal Code - Data Mining

In [14]:
# Building dataset for Toronto Net Income with Canadian National Average

# Build dataframe without redundant data and merging all the values
cols_to_use = df_TorontoGeo.columns.difference(df_TorNbCrime.columns)
df_TorNetInc = pd.merge(df_TorNbCrime, df_TorontoGeo, how='outer', left_index=True, right_index=True)
df_TorNetInc.head()


Unnamed: 0,Neighbourhood_x,Longitude_x,Latitude_x,PremiseType,Offence,EventUnique_Id,OccurrenceDate,ReportedDate,PostalCode,Borough,Neighbourhood_y,Latitude_y,Longitude_y
0,University (79),-79.405228,43.656982,Commercial,Assault,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,University (79),-79.409767,43.661518,Commercial,Break and Enter,GO-20152066735,2015-12-02T00:00:00.000Z,2015-12-02T16:34:00.000Z,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,University (79),-79.407471,43.665916,Commercial,Assault Bodily Harm,GO-20151692140,2015-09-06T01:30:00.000Z,2015-10-01T01:32:00.000Z,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,University (79),-79.405228,43.656982,Commercial,Assault,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z,M1G,Scarborough,Woburn,43.770992,-79.216917
4,University (79),-79.405228,43.656982,Commercial,Assault Peace Officer,GO-20152165447,2015-12-18T03:58:00.000Z,2015-12-18T03:59:00.000Z,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Toronto Neighbourhood Restaurants Analysis - Evaluating Competetion

### A. Toronto Restaurant venues by Neighbourhood to check the competetion - Segmenting & Clustering


In [15]:
# Utilizing FourSquare API to explore the neighbourhood and segment them
CLIENT_ID = 'RFQMDMYZKGBYV33YTXLFQXXBTBTNEX1KJPNFYDI5MYDC5MB1' # your Foursquare ID
CLIENT_SECRET = 'GKAQZ1XFWP0C1SC0QS0BHNNAGLDRCB2NRVXN3RBAMOTHPYXC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 150

In [52]:
# Building the required dataset
df_TorontoGeoOnly = df_TorontoGeo.drop('PostalCode',1)
df_TorontoGeoOnly = df_TorontoGeoOnly.rename(columns={'Neighbourhood':'Neighborhood'})
df_TorontoGeoOnly.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,Scarborough,Woburn,43.770992,-79.216917
4,Scarborough,Cedarbrae,43.773136,-79.239476


In [20]:
# Exploring the first neighbourhood in our Toronto demographic dataframe
df_TorontoGeoOnly.loc[0, 'Neighbourhood']

'Malvern, Rouge'

In [21]:
# Get the  first neighborhood's latitude and longitude values
neighbourhood_latitude = df_TorontoGeoOnly.loc[0, 'Latitude'] # neighbourhood latitude value
neighbourhood_longitude = df_TorontoGeoOnly.loc[0, 'Longitude'] # neighbourhood longitude value

neighbourhood_name = df_TorontoGeoOnly.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Malvern, Rouge are 43.806686299999996, -79.19435340000001.


Using __Foursquare API__ to get a list of all the venues in all the boroughs and store them in the DataFrame with respective geospatial coordinates.

In [22]:
#Function to store the venues in the dataframe
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
# Retrieve list of Toronto venues
df_TorontoVenues =getNearbyVenues(names =df_TorontoGeoOnly['Neighbourhood'],
                              latitudes =df_TorontoGeoOnly['Latitude'],
                              longitudes =df_TorontoGeoOnly['Longitude'],
                                 )

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

In [24]:
#Display the new Venues dataframe
df_TorontoVenues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Great Shine Window Cleaning,43.783145,-79.157431,Home Service
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


In [26]:
# List the count of nuber of venues by Neighbourhood
df_TorontoVenues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",7,7,7,7,7,7
"Bathurst Manor, Wilson Heights, Downsview North",22,22,22,22,22,22
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",21,21,21,21,21,21
...,...,...,...,...,...,...
"Willowdale, Willowdale East",34,34,34,34,34,34
"Willowdale, Willowdale West",4,4,4,4,4,4
Woburn,4,4,4,4,4,4
Woodbine Heights,7,7,7,7,7,7


In [27]:
# List all the values of restaurants from Venue Catagory in the data frame
df_TorVCategory = df_TorontoVenues['Venue Category'].tolist()
df_TorVCategory

['Fast Food Restaurant',
 'Home Service',
 'Bar',
 'Bank',
 'Electronics Store',
 'Restaurant',
 'Mexican Restaurant',
 'Rental Car Location',
 'Medical Center',
 'Intersection',
 'Breakfast Spot',
 'Coffee Shop',
 'Coffee Shop',
 'Korean BBQ Restaurant',
 'Convenience Store',
 'Caribbean Restaurant',
 'Hakka Restaurant',
 'Thai Restaurant',
 'Bank',
 'Athletics & Sports',
 'Bakery',
 'Gas Station',
 'Fried Chicken Joint',
 'Playground',
 'Pizza Place',
 'Department Store',
 'Coffee Shop',
 'Discount Store',
 'Chinese Restaurant',
 'Hobby Shop',
 'Ice Cream Shop',
 'Intersection',
 'Bus Line',
 'Metro Station',
 'Bus Line',
 'Bakery',
 'Bakery',
 'Soccer Field',
 'Motel',
 'American Restaurant',
 'Café',
 'General Entertainment',
 'Skating Rink',
 'College Stadium',
 'Chinese Restaurant',
 'Indian Restaurant',
 'Indian Restaurant',
 'Vietnamese Restaurant',
 'Pet Store',
 'Light Rail Station',
 'Bakery',
 'Sandwich Place',
 'Middle Eastern Restaurant',
 'Shopping Mall',
 'Auto Garage',

In [28]:
# List of venues were returned for each Venue Catagory in desending order
df_TorontoVenues.groupby('Venue Category').sum().reset_index()
df_TorontoVenues.sort_values('Venue Category', ascending=False)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1472,"University of Toronto, Harbord",43.662696,-79.400049,Sivananda Yoga Centre,43.662754,-79.402951,Yoga Studio
450,Studio District,43.659526,-79.340923,Spirit Loft Yoga,43.663548,-79.341333,Yoga Studio
647,Church and Wellesley,43.665860,-79.383160,The Yoga Sanctuary,43.661499,-79.383636,Yoga Studio
636,Church and Wellesley,43.665860,-79.383160,Bikram Yoga Yonge,43.668205,-79.385780,Yoga Studio
1842,"Little Portugal, Trinity",43.647927,-79.419750,YogaSpace,43.647607,-79.420133,Yoga Studio
...,...,...,...,...,...,...,...
1564,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Gate 8,43.631536,-79.394570,Airport Gate
1562,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Café,43.631132,-79.396139,Airport Food Court
264,Downsview,43.737473,-79.464763,Toronto Downsview Airport (YZD),43.738883,-79.470111,Airport
1559,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Toronto City Airport (YTZ) (Billy...,43.631683,-79.396033,Airport


In [29]:
# Toronto Neighbourhood grouped by with existing Restaurants in Venue Category
df_TorResto= df_TorontoVenues[df_TorontoVenues['Venue Category'].str.contains('Restaurant')]
df_TorResto = df_TorResto.rename(columns={'Neighborhood':'Neighbourhood'})
df_TorResto

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
5,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Sail Sushi,43.765951,-79.191275,Restaurant
6,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.190720,Mexican Restaurant
13,Woburn,43.770992,-79.216917,Korean Grill House,43.770812,-79.214502,Korean BBQ Restaurant
15,Cedarbrae,43.773136,-79.239476,Drupati's Roti & Doubles,43.775222,-79.241678,Caribbean Restaurant
...,...,...,...,...,...,...,...
2076,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,McDonald's,43.630002,-79.518198,Fast Food Restaurant
2098,Westmount,43.696319,-79.532242,Mayflower Chinese Food,43.692753,-79.531566,Chinese Restaurant
2102,Westmount,43.696319,-79.532242,2 Bros Cuisine,43.692499,-79.531698,Middle Eastern Restaurant
2116,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437,McDonald's,43.741757,-79.584230,Fast Food Restaurant


__Analysis:__ Here we see the list of Toronto Neighbourhoods with their Restaurant categories. Hence, ABC Company can get the idea existing market players and possible business scope to cater the existing population.

In [32]:
# Lets pick the restaurant from Venue Category
print('Unique Venue Categories:')
df_TRestoList = list(df_TorontoVenues['Venue Category'].unique())
df_TRestoList

Unique Venue Categories:


['Fast Food Restaurant',
 'Home Service',
 'Bar',
 'Bank',
 'Electronics Store',
 'Restaurant',
 'Mexican Restaurant',
 'Rental Car Location',
 'Medical Center',
 'Intersection',
 'Breakfast Spot',
 'Coffee Shop',
 'Korean BBQ Restaurant',
 'Convenience Store',
 'Caribbean Restaurant',
 'Hakka Restaurant',
 'Thai Restaurant',
 'Athletics & Sports',
 'Bakery',
 'Gas Station',
 'Fried Chicken Joint',
 'Playground',
 'Pizza Place',
 'Department Store',
 'Discount Store',
 'Chinese Restaurant',
 'Hobby Shop',
 'Ice Cream Shop',
 'Bus Line',
 'Metro Station',
 'Soccer Field',
 'Motel',
 'American Restaurant',
 'Café',
 'General Entertainment',
 'Skating Rink',
 'College Stadium',
 'Indian Restaurant',
 'Vietnamese Restaurant',
 'Pet Store',
 'Light Rail Station',
 'Sandwich Place',
 'Middle Eastern Restaurant',
 'Shopping Mall',
 'Auto Garage',
 'Latin American Restaurant',
 'Lounge',
 'Italian Restaurant',
 'Noodle House',
 'Pharmacy',
 'Park',
 'Cosmetics Shop',
 'Camera Store',
 'Golf Co

In [26]:
# Unique Categories of existing restaurants in Toronto
(df_TorResto['Venue Category'].count())
print('There are {} uniques categories of restaurants in Toronto neighbourhood.'.format(len(df_TorResto['Venue Category'].unique())))

There are 46 uniques categories of restaurants in Toronto neighbourhood.


### B. Neighbourhood Restaurants by Category by Postal Code

In [33]:
#Merge the Toronto Restaurant data with geospatial postalcode data
df_RestoPC = pd.merge(df_TorontoNPC, df_TorResto, on='Neighbourhood', how='inner')

# Display the new dataframe
df_RestoPC

Unnamed: 0,Postal Code,Borough,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Sail Sushi,43.765951,-79.191275,Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.190720,Mexican Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,Korean Grill House,43.770812,-79.214502,Korean BBQ Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,Drupati's Roti & Doubles,43.775222,-79.241678,Caribbean Restaurant
...,...,...,...,...,...,...,...,...,...
487,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,McDonald's,43.630002,-79.518198,Fast Food Restaurant
488,M9P,Etobicoke,Westmount,43.696319,-79.532242,Mayflower Chinese Food,43.692753,-79.531566,Chinese Restaurant
489,M9P,Etobicoke,Westmount,43.696319,-79.532242,2 Bros Cuisine,43.692499,-79.531698,Middle Eastern Restaurant
490,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437,McDonald's,43.741757,-79.584230,Fast Food Restaurant


In [34]:
# Count of Restaurants in Toronto by Venue Category
df_TorRestoVC = df_RestoPC.groupby("Venue Category")['Venue Category'].count()
df_TorRestoVC

Venue Category
American Restaurant                26
Asian Restaurant                   11
Belgian Restaurant                  2
Brazilian Restaurant                2
Cajun / Creole Restaurant           1
Caribbean Restaurant                9
Chinese Restaurant                 16
Colombian Restaurant                2
Comfort Food Restaurant             7
Cuban Restaurant                    2
Dim Sum Restaurant                  2
Doner Restaurant                    1
Eastern European Restaurant         3
Ethiopian Restaurant                2
Falafel Restaurant                  2
Fast Food Restaurant               30
Filipino Restaurant                 1
French Restaurant                  11
German Restaurant                   1
Gluten-free Restaurant              4
Greek Restaurant                   14
Hakka Restaurant                    1
Indian Restaurant                  13
Italian Restaurant                 44
Japanese Restaurant                44
Korean BBQ Restaurant              

__Analysis:__ Here we see the Most famous reataurant type among Toronto Neighbourhoods. This can help to understand the existing competetion and challenges to open the type of restaurant for ABC Company.

### C. Count of Neighbour to add Restaurants as only Venue Category

In [35]:
# Here we manually pick out restaurants or 'features' from the unique venue list and that we want to examine for similiarity during clustering
rest_list = ['Steakhouse', 'Coffee Shop', 'Café', 'Ramen Restaurant', 'Indonesian Restaurant', 'Restaurant', 'Japanese Restaurant', 
             'Fast Food Restaurant', 'Sushi Restaurant', 'Vietnamese Restaurant', 'Pizza Place', 'Sandwich Place', 'Middle Eastern Restaurant', 
             'Burger Joint', 'American Restaurant', 'Food Court', 'Wings Joint', 'Burrito Place', 'Asian Restaurant', 'Deli / Bodega', 
             'Greek Restaurant', 'Fried Chicken Joint', 'Airport Food Court', 'Chinese Restaurant', 'Breakfast Spot', 'Mexican Restaurant',
             'Indian Restaurant', 'Latin American Restaurant', 'Bar', 'Pub', 'Italian Restaurant', 'French Restaurant', 'Ice Cream Shop', 
             'Caribbean Restaurant', 'Gastropub', 'Thai Restaurant', 'Cajun / Creole Restaurant', 'Diner', 'Dim Sum Restaurant', 'Seafood Restaurant', 
             'Food & Drink Shop', 'Noodle House', 'Food', 'Fish & Chips Shop', 'Falafel Restaurant', 'Gourmet Shop', 'Vegetarian / Vegan Restaurant', 
             'South American Restaurant', 'Korean Restaurant', 'Cuban Restaurant', 'New American Restaurant', 'Malay Restaurant', 'Mac & Cheese Joint',
             'Bistro', 'Southern / Soul Food Restaurant', 'Tapas Restaurant',  'Sports Bar', 'Polish Restaurant', 'Ethiopian Restaurant', 
             'Creperie', 'Sake Bar', 'Persian Restaurant', 'Afghan Restaurant','Mediterranean Restaurant', 'BBQ Joint', 'Jewish Restaurant', 
             'Comfort Food Restaurant',  'Hakka Restaurant', 'Food Truck', 'Taiwanese Restaurant',  'Snack Place', 'Eastern European Restaurant', 
             'Dumpling Restaurant', 'Belgian Restaurant', 'Arepa Restaurant', 'Taco Place', 'Doner Restaurant', 'Filipino Restaurant', 
             'Hotpot Restaurant', 'Poutine Place', 'Salad Place',  'Portuguese Restaurant', 'Modern European Restaurant', 'Empanada Restaurant', 
             'Irish Pub', 'Molecular Gastronomy Restaurant', 'German Restaurant', 'Brazilian Restaurant', 'Gluten-free Restaurant', 'Soup Place']

rest_pd = pd.DataFrame(rest_list)
#rest_pd
#rename the coloumns so the match
rest_pd = rest_pd.rename(columns={0:'Venue Category'})

#Join the 2 dataframes as instructed
TO_new = pd.merge(df_TorontoVenues, rest_pd, on='Venue Category', how='right')

# display the new dataframe
#TO_new

TO_new.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,2,2,2,2,2,2
"Alderwood, Long Branch",5,5,5,5,5,5
"Bathurst Manor, Wilson Heights, Downsview North",12,12,12,12,12,12
Bayview Village,3,3,3,3,3,3
"Bedford Park, Lawrence Manor East",16,16,16,16,16,16
...,...,...,...,...,...,...
"Wexford, Maryvale",2,2,2,2,2,2
"Willowdale, Willowdale East",21,21,21,21,21,21
"Willowdale, Willowdale West",2,2,2,2,2,2
Woburn,2,2,2,2,2,2


### D.OneHot Encoding to Count Restaurants

In [36]:
# one hot encoding
TO_new_onehot = pd.get_dummies(TO_new[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
TO_new_onehot['Neighborhood'] = TO_new['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [TO_new_onehot.columns[-1]] + list(TO_new_onehot.columns[:-1])
TO_new_onehot = TO_new_onehot[fixed_columns]

TO_new_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Food Court,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bar,Belgian Restaurant,Bistro,...,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Clarks Corners, Tam O'Shanter, Sullivan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Steeles West, L'Amoreaux West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Steeles West, L'Amoreaux West",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Hillcrest Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [46]:
#Analyzing each neighbourhood of Toronto
TO_grouped = TO_new_onehot.groupby('Neighborhood').mean().reset_index()
TO_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport Food Court,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bar,Belgian Restaurant,Bistro,...,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0


## Clustering Neighbourhoods

### A. Using Silhoutte Score find optimal size of cluster for data segmentation

In [38]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np

TO_grouped_clustering = TO_grouped.drop('Neighborhood', 1)

# Use silhouette score to find optimal number of clusters to segment the data
kclusters = np.arange(2,10)
results = {}
for size in kclusters:
    model = KMeans(n_clusters = size).fit(TO_grouped_clustering)
    predictions = model.predict(TO_grouped_clustering)
    results[size] = silhouette_score(TO_grouped_clustering, predictions)

best_size = max(results, key=results.get)
best_size

2

### B. Run K-means & segment data into clusters for labels

In [47]:
#import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = best_size


# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(TO_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [48]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = TO_grouped['Neighborhood']

for ind in np.arange(TO_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(TO_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Breakfast Spot,Latin American Restaurant,Food,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant
1,"Alderwood, Long Branch",Pizza Place,Pub,Coffee Shop,Sandwich Place,Falafel Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Pizza Place,Sandwich Place,Diner,Middle Eastern Restaurant,Chinese Restaurant,Restaurant,Deli / Bodega,Fried Chicken Joint,Sushi Restaurant
3,Bayview Village,Chinese Restaurant,Japanese Restaurant,Café,Wings Joint,Fish & Chips Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Italian Restaurant,Coffee Shop,Pizza Place,Comfort Food Restaurant,Café,Pub,Indian Restaurant,Restaurant,Sushi Restaurant


### C. Merging Toronto data with geo spatial data 

In [75]:
#Merge the Toronto data with geo cooridinate data and make sure it's the right shape
TO_labels = pd.merge(df_TorontoGeo1, TO_grouped, on='Neighborhood', how='right')

TO_labels = TO_labels.drop(columns=['Steakhouse', 'Coffee Shop', 'Café', 'Ramen Restaurant', 'Indonesian Restaurant', 'Restaurant', 'Japanese Restaurant', 
             'Fast Food Restaurant', 'Sushi Restaurant', 'Vietnamese Restaurant', 'Pizza Place', 'Sandwich Place', 'Middle Eastern Restaurant', 
             'Burger Joint', 'American Restaurant', 'Food Court', 'Wings Joint', 'Burrito Place', 'Asian Restaurant', 'Deli / Bodega', 
             'Greek Restaurant', 'Fried Chicken Joint', 'Airport Food Court', 'Chinese Restaurant', 'Breakfast Spot', 'Mexican Restaurant',
             'Indian Restaurant', 'Latin American Restaurant', 'Bar', 'Pub', 'Italian Restaurant', 'French Restaurant', 'Ice Cream Shop', 
             'Caribbean Restaurant', 'Gastropub', 'Thai Restaurant', 'Cajun / Creole Restaurant', 'Diner', 'Dim Sum Restaurant', 'Seafood Restaurant', 
             'Food & Drink Shop', 'Noodle House', 'Food', 'Fish & Chips Shop', 'Falafel Restaurant', 'Gourmet Shop', 'Vegetarian / Vegan Restaurant', 
             'South American Restaurant', 'Korean Restaurant', 'Cuban Restaurant', 'New American Restaurant', 'Malay Restaurant', 'Mac & Cheese Joint',
             'Bistro', 'Southern / Soul Food Restaurant', 'Tapas Restaurant',  'Sports Bar', 'Polish Restaurant', 'Ethiopian Restaurant', 
             'Creperie', 'Sake Bar', 'Persian Restaurant', 'Afghan Restaurant','Mediterranean Restaurant', 'BBQ Joint', 'Jewish Restaurant', 
             'Comfort Food Restaurant',  'Hakka Restaurant', 'Food Truck', 'Taiwanese Restaurant',  'Snack Place', 'Eastern European Restaurant', 
             'Dumpling Restaurant', 'Belgian Restaurant', 'Arepa Restaurant', 'Taco Place', 'Doner Restaurant', 'Filipino Restaurant', 
             'Hotpot Restaurant', 'Poutine Place', 'Salad Place',  'Portuguese Restaurant', 'Modern European Restaurant', 'Empanada Restaurant', 
             'Irish Pub', 'Molecular Gastronomy Restaurant', 'German Restaurant', 'Brazilian Restaurant', 'Gluten-free Restaurant', 'Soup Place'])
TO_labels.head()

Unnamed: 0,PostalCode,Population_2016,Borough,Neighborhood,Latitude,Longitude
0,M2N,75897.0,North York,"Willowdale, Willowdale East",43.77012,-79.408493
1,M1B,66108.0,Scarborough,"Malvern, Rouge",43.806686,-79.194353
2,M2J,58293.0,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
3,M9V,55959.0,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
4,M5V,49195.0,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.39442


### D. Add K-means Labels

In [71]:
TO_merged = TO_labels

In [73]:
TO_labels.shape

(83, 6)

In [74]:
TO_merged.shape

(83, 6)

In [69]:

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
TO_merged = TO_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
TO_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Population_2016,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M2N,75897.0,North York,"Willowdale, Willowdale East",43.77012,-79.408493,Ramen Restaurant,Pizza Place,Coffee Shop,Restaurant,Café,Sandwich Place,Japanese Restaurant,Fast Food Restaurant,Korean Restaurant,Middle Eastern Restaurant
1,M1B,66108.0,Scarborough,"Malvern, Rouge",43.806686,-79.194353,Fast Food Restaurant,Wings Joint,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant
2,M2J,58293.0,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,Coffee Shop,Fast Food Restaurant,Restaurant,Japanese Restaurant,Sandwich Place,Burrito Place,Burger Joint,Food Court,Chinese Restaurant,Bar
3,M9V,55959.0,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437,Fast Food Restaurant,Fried Chicken Joint,Japanese Restaurant,Sandwich Place,Pizza Place,Wings Joint,Diner,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant
4,M5V,49195.0,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.39442,Airport Food Court,Coffee Shop,Wings Joint,Food,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant


### E. Find the centroid of the highest cluster density.

Now that we know that __Cluster 2__ is the best cluster from the Silhoutte Score & K-means clustering.

In [90]:
# Find the geographic center of the most dense or like cluster.
Cluster_0_coorid = TO_merged[['Latitude', 'Longitude']]
Cluster_0_coorid = list(Cluster_0_coorid.values) 
lat = []
long = []



for l in Cluster_0_coorid:
  lat.append(l[0])
  long.append(l[1])



Blatitude = sum(lat)/len(lat)
Blongitude = sum(long)/len(long)
print(Blatitude)
print(Blongitude)

43.70293337108434
-79.38917756144578


Install reverse-geocode to reverse lookup the coordinates.

In [106]:
# Intstall reverse geocoder to reverse lookup the cooridinates
!pip install reverse-geocoder
from opencage.geocoder import OpenCageGeocode
from pprint import pprint
import reverse_geocode

results =reverse_geocode.search(Cluster_0_coorid)

key = 'vxXPRPMeAFSbts6O9lGp'
geocoder = OpenCageGeocode(key)

results = geocoder.reverse_geocode(Blatitude, Blongitude)
pprint(results)

Collecting reverse-geocoder
  Downloading reverse_geocoder-1.5.1.tar.gz (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 14.0 MB/s eta 0:00:01
Building wheels for collected packages: reverse-geocoder
  Building wheel for reverse-geocoder (setup.py) ... [?25ldone
[?25h  Created wheel for reverse-geocoder: filename=reverse_geocoder-1.5.1-py3-none-any.whl size=2268088 sha256=203f7f1f8081a504f10553689d41b072aa23f89f67b4d6269516ab9ad0d532d1
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/34/6e/70/5423639428a2cac8ea7eb467214a4254b549b381f306a9c790
Successfully built reverse-geocoder
Installing collected packages: reverse-geocoder
Successfully installed reverse-geocoder-1.5.1


NotAuthorizedError: Your API key is not authorized. You may have entered it incorrectly.

## Visualizing the Map of Toronto with our analysis

### A. Using geopy library to get latitude and longitude of Toronto

In [80]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


In [81]:
!pip install folium
print('Folium ready to import')

Collecting folium
  Downloading folium-0.12.0-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.5 MB/s  eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.0
Folium ready to import


In [86]:
# Geo Coordinates of Toronto
address = 'Toronto, ON'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [83]:
toronto_lat,toronto_long =43.6534817, -79.3839347 # Lat & Long values copied from above

In [84]:
import folium
# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[toronto_lat,toronto_long], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_TorontoGeo['Latitude'], df_TorontoGeo['Longitude'], df_TorontoGeo['Borough'], df_TorontoGeo['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_Toronto)

map_Toronto

## Result

Now obtain the the Best Neighbourhood to open the restaurant in Tonronto.

In [99]:
#Obtain the popupstring of the best location
popstring = TO_labels[TO_labels['PostalCode'].str.contains('M4S')]

def str_join(*args):
    return ''.join(map(str, args))

popstring_new = str_join('The Best Neighbourhood to locate a Restaurant in Toronto is in: ', popstring['Neighborhood'].values,  ' in ' ,  popstring['Borough'].values)

print(popstring_new)

The Best Neighbourhood to locate a Restaurant in Toronto is in: ['Davisville'] in ['Central Toronto']
