# Finding Best Place to open a Cafe in Toronto Canada

### Background

One of my friends who was having Subway restaurant had to close his Subway business during May 2020 due to Covid-19 Pandemic which led to reduced sales in his business. Now he is planning to open a Cafe business in Toronto, Canada region. In this project I am planning to help my friend and other business personnel in finding a best location to open Cafe in Toronto using data science concepts that I learned in IBM Data Science Professional Course.

### Introduction/Business Problem

Toronto is capital city of the Ontario, Canada with a population of more than 6 Million in 2016 and with area of 243.3 sq mi. Toronto is international center for business, finance, arts, and culture, and is recognized as one of the most multicultural cities in the world. Toronto economy is diversified with technology, design, financial services, life sciences, education, arts, fashion, aerospace, environmental innovation, food services, and tourism. Starting a Cafe business in such a multicultural and diversified place is not an easy task. We need to consider several factors such as accessibility, visibility, target customers before opening Cafe to be successful in business. Places near business centers, malls, areas with a lot of foot traffic, and tourist attractions will guarantee the steady flow of customers that we need to make a good profit. So, finding a location is one of the most important things in starting a cafe. The search could take months if we start searching and analyzing manually. We can reduce this time to few hours/ days by using machine learning techniques and Four-square location data to find best suitable location to open a Cafe. In this project, we will find the best and most suitable location to open a Cafe in Toronto, Canada for business personal or entrepreneurs.

### Target Audience

- This project is aimed at Business personnel who wants to open a Cafe. 
- The analysis will help entrepreneurs to obtain necessary information in finding the best location for opening a Cafe

### Data Section

 We will use the following data for this project:
1. Toronto data that contains Borough, Neighborhoods along with there latitudes and longitudes
   - Data Source: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
   - Description: This data set contains the required information such as postal code, borough and the name of the    neighbourhoods in city of Toronto
2. Geographical location of the neighbourhoods
   - Data Source: https://cocl.us/Geospatial_data
   - Description:This dataset provides the Geographical coordinates of the neighbourhoods for the respective Postal Codes
3. Venue Data using Foursquare API
   - Data Source: https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}
   - Description: We will use Client_ID, Client_Secret, version details to get all venues for each neighborhood and group data by name of the neighborhood.

In [1]:
# import necessary libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import requests
import json
from bs4 import BeautifulSoup
import matplotlib.cm as cm
import matplotlib.colors as colors

%matplotlib inline
print('Packages installed  :)')

Packages installed  :)


In [2]:
# Get the neighborhood data that contains Borough, Neighborhoods along with there latitudes and longitude using beautiful soup 
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
result = requests.get(url)
data_html = BeautifulSoup(result.content)

In [3]:
# Read the data into a Pandas Dataframe
soup = BeautifulSoup(str(data_html))
neigh = soup.find('table')
table_str = str(neigh.extract())

### Dataset 1

#### Example of Toronto dataset that contains Borough, Neighborhoods along with there latitudes and longitudes

In [4]:
df = pd.read_html(table_str)[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [5]:
df_dropna = df[df.Borough != 'Not assigned'].reset_index(drop=True)
df_dropna.rename(columns={'Postal Code' : 'PostalCode','Neighbourhood' : 'Neighborhood'}, inplace=True)
df = df_dropna

In [6]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [7]:
df_grouped = df.groupby(['Borough', 'PostalCode'], as_index=False).agg(lambda x:','.join(x))
df_grouped.head()

Unnamed: 0,Borough,PostalCode,Neighborhood
0,Central Toronto,M4N,Lawrence Park
1,Central Toronto,M4P,Davisville North
2,Central Toronto,M4R,"North Toronto West, Lawrence Park"
3,Central Toronto,M4S,Davisville
4,Central Toronto,M4T,"Moore Park, Summerhill East"


In [8]:
# Check if we still have any Neighborhoods that are Not Assigned
df_grouped.loc[df_grouped['Borough'].isin(["Not assigned"])]

Unnamed: 0,Borough,PostalCode,Neighborhood


In [9]:
df = df_grouped

In [10]:
df.shape

(103, 3)

In [11]:
df = df[['PostalCode', 'Borough', 'Neighborhood']]

In [12]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M4N,Central Toronto,Lawrence Park
1,M4P,Central Toronto,Davisville North
2,M4R,Central Toronto,"North Toronto West, Lawrence Park"
3,M4S,Central Toronto,Davisville
4,M4T,Central Toronto,"Moore Park, Summerhill East"


### Dataset 2

#### Example dataset of Geographical location of the neighbourhoods

In [13]:
#Geographical location of the neighbourhoods
geo_url = "https://cocl.us/Geospatial_data"

geo_df = pd.read_csv(geo_url)
geo_df.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
geo_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
# Merge the above two datasets
df = pd.merge(df, geo_df, on='PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316


In [15]:
# list the count of neighborhoods in each borough
df.groupby('Borough').count()['Neighborhood']

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
East York            5
Etobicoke           12
Mississauga          1
North York          24
Scarborough         17
West Toronto         6
York                 5
Name: Neighborhood, dtype: int64

In [16]:
df_toronto = df
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316


In [17]:
# Create unique list of Boroughs
boroughs = df_toronto['Borough'].unique().tolist()

In [18]:
# Find the geographical coordinates such as latitude and longitude of Toronto 
lat_toronto = df_toronto['Latitude'].mean()
lon_toronto = df_toronto['Longitude'].mean()
print('The geographical coordinates of Toronto are {}, {}'.format(lat_toronto, lon_toronto))

The geographical coordinates of Toronto are 43.70460773398059, -79.39715291165047


In [19]:
# Assign some random color for each borough
borough_color = {}
for borough in boroughs:
    borough_color[borough]= '#%02X%02X%02X' % tuple(np.random.choice(range(256), size=3)) 

In [20]:
map_toronto = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=10.5)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], 
                                           df_toronto['Longitude'],
                                           df_toronto['Borough'], 
                                           df_toronto['Neighborhood']):
    label_text = borough + ' - ' + neighborhood
    label = folium.Popup(label_text)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=borough_color[borough],
        fill_color=borough_color[borough],
        fill_opacity=0.8).add_to(map_toronto)  
    
map_toronto

In [21]:
# Provide your Foursquare API client Id and Secret
CLIENT_ID = '1AC0Z0EE5WGSAYTMBL22OXS3HBLXIRT21Y3HRWAEDRO3YLC0' # Foursquare ID
CLIENT_SECRET = 'TRG0TEUNS3GSATKAD0TEWMA2Q1QQPY2B3QOAJA2BW1IG2XFF' # Foursquare Secret
VERSION = 20200514 # Foursquare API version

print('Credentials Stored')

Credentials Stored


In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 500 # define radius
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


### Dataset 3

#### Get Venue Data using Foursquare API

In [23]:
#Get venue details for all neighborhoods
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                latitudes=df_toronto['Latitude'],
                                longitudes=df_toronto['Longitude'])

Lawrence Park
Davisville North
North Toronto West, Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Queen's Park, Ontario Provincial Government
The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Business reply mail Processing Centre, South Central Letter 

In [24]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,HYC Design Inc.,43.726793,-79.391681,Business Service
2,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
3,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
4,Davisville North,43.712751,-79.390197,Homeway Restaurant & Brunch,43.712641,-79.391557,Breakfast Spot
