# APPLIED DATA SCIENCE CAPSTONE PROJECT

This notebook is the capstone project for IBM Data Science professional Certificate

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# 1.0 INTRODUCTION /BUSINESS PROBLEM

### 1.1 Background

Toronto is a major Canadian commercial city  and it has a large population with significant African migrant. It's population of African origin is projected to increase significantly as Canada overtakes USA as the preferred country in immigration. Therefore, as African population increases, there is corresponding increase in demand for african dishes.  As more people relocate to live and working in canada with their family, this study evaluations the best borough where new African restaurant can be sited . The choice of location is usually influenced by several factors but the study will use number of already existing restaurant to rank suitable of locations.

### 1.2 Problem

Data is needed to categorize all the restaurants in different borough of Toronto. Their location, the type of restaurant and number of restuarant in a neighborhood. This project will leverge on Foursqauare location data and machine learning to find appropriate neighborhoods in Toronto were new African restaurant can be located. 

### 1.3 Interest

This project is for those who plan to state restaurant in Toronto. It will help them make choice of neighborhood to site their business based by recommending neighborhood with less concentration of restaurants. This will guide them in determing the best neighborhood that have the poosibility of high demands and less competition for african dishes.

# 2.0 DATA REQUIREMENT

### 2.1 Data Source

The data required for this problem and their sources are:

1. The Borough in Toronto and their neighborhood, their latitude and longitude which was scrapped from wikipedia
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

2. The location of Restaurants in Toronto neighborhood shall be gotten through foursquare API

By using API, all the venues in each Borough of Toronto. Then filter was used to get restaurants in different districts.

### 2.2 Data cleaning and Feature Extraction

By using Foursquare API, location data and clustering method to group neighbohood based on the presence of restaurant.
The project will find the latitude & longitude of Restaurant locations, using specific, well known address and Google Maps geocoding API.

# Importing Relevant Package

In [5]:
import sys
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import lxml.html as lh


# Scraping wikipedia

In [6]:
url= 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [7]:
response = requests.get(url)
response.raise_for_status()

## Parsing data between <tr>  and </tr> of html

In [8]:
doc = lh.fromstring(response.content)
tr_elements = doc.xpath('//tr')

In [9]:
#Check the length of the first 12 rows
[len(T) for T in tr_elements[:12]]

[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]

### Creating Header and empty List

In [10]:
tr_elements = doc.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print ('%d:"%s"'%(i,name))
    col.append((name,[]))

1:"Postal Code
"
2:"Borough
"
3:"Neighborhood
"


### Storing data from second row

In [11]:
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 3, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1

In [12]:
[len(C) for (title,C) in col]

[181, 181, 181]

## Create the data frame

In [13]:
import pandas as pd
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)

In [14]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A\n,Not assigned\n,Not assigned\n
1,M2A\n,Not assigned\n,Not assigned\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


In [15]:
df = df.replace('\n','', regex=True)
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [16]:
df.columns.values
cols= ['Post Code', 'Borough','Neighborhood']
df.columns =cols
df.head()

Unnamed: 0,Post Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [17]:
df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,Post Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [18]:
df = df[df.Borough != 'Not assigned'].reset_index(drop=True)
df.head()

Unnamed: 0,Post Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


# Create Dataframe with latitude and Longitude

In [19]:
geo_file = 'https://cocl.us/Geospatial_data'
geo_file

'https://cocl.us/Geospatial_data'

In [20]:
df_geo = pd.read_csv(geo_file) 
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [21]:
col= ['Post Code', 'Latitude', 'Longitude']
df_geo.columns = col
df_geo.head()

Unnamed: 0,Post Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [22]:
df_Toronto = pd.merge(df, df_geo, on='Post Code')
df_Toronto.head()

Unnamed: 0,Post Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


# Finding Geolocation

In [23]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [24]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


In [25]:
map_Toronto= folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, Borough, Neighborhood in zip(df_Toronto['Latitude'], df_Toronto['Longitude'], df_Toronto['Borough'], df_Toronto['Neighborhood']):
    label = '{}, {}'.format(Neighborhood, Borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

# Exploring Downtown Toronto Borough

In [26]:
Tor_data = df_Toronto[df_Toronto['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
Tor_data.head()

Unnamed: 0,Post Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [27]:
address = 'Downtown Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6541737, -79.38081164513409.


In [28]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Tor_data['Latitude'], Tor_data['Longitude'], Tor_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

In [29]:
Tor_data.to_pickle('./locations.pkl') 

In [30]:
df_tor = df_Toronto.copy()
df_tor.head()

Unnamed: 0,Post Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [31]:
df_tor.to_pickle('./docation.pkl')

# Foursquare credential

In [48]:
# The code was removed by Watson Studio for sharing.

# Locating Restaurant in Toronto

In [33]:
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

african_restaurant_categories = ['4bf58dd8d48988d1c8941735', '4bf58dd8d48988d10a941735', '5bae9231bedf3950379f89e1']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'dine-in','taverna', 'steakhouse', 'african village', 'nigerian restaurant']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Toronto', '')
    address = address.replace(', Canada', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20200710'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [47]:
#!pip install shapely
#!conda install -c conda-forge shapely --yes
import shapely.geometry

#!pip install pyproj
#!conda install -c conda-forge pyproj --yes
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Toronto longitude={}, latitude={}'.format(longitude, latitude))
x, y = lonlat_to_xy(longitude, latitude)
print('Toronto UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Toronto longitude={}, latitude={}'.format(lo, la))

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - shapely


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    shapely-1.7.0              |   py36h3d6ee9d_3         435 KB  conda-forge
    geos-3.8.1                 |       he1b5a44_0         1.0 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         1.4 MB

The following NEW packages will be INSTALLED:

    geos:    3.8.1-he1b5a44_0     conda-forge
    shapely: 1.7.0-py36h3d6ee9d_3 conda-forge


Downloading and Extracting Packages
shapely-1.7.0        | 435 KB    | ##################################### | 100% 
geos-3.8.1           | 1.0 MB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving



In [49]:
import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    african_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, client_id, client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_african = is_restaurant(venue_categories, specific_filter=african_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_african, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_african:
                    african_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, african_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
african_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('african_restaurants_350.pkl', 'rb') as f:
        african_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, african_restaurants, location_restaurants = get_restaurants(df_tor['Latitude'], df_tor['Longitude'])
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('african_restaurants_350.pkl', 'wb') as f:
        pickle.dump(african_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Restaurant data loaded.


In [36]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of African restaurants:', len(african_restaurants))
print('Percentage of African restaurants: {:.2f}%'.format(len(african_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 437
Total number of African restaurants: 2
Percentage of African restaurants: 0.46%
Average number of restaurants in neighborhood: 4.087378640776699


In [37]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4f3ecce6e4b0587016b6f30d', 'Portugril', 43.72581876267242, -79.31278541470671, '1733 Eglinton Avenue East (Bermondsey) ON', 224, False, -5299923.88428061, 10498066.317945193)
('4d689350b6f46dcb77ee15b2', 'The Frig', 43.72705130603407, -79.31741760908679, 'Canada', 197, False, -5299669.902404442, 10498578.961337013)
('4bc39c914cdfc9b6f29c9721', 'Souvlaki Express', 43.65558391537734, -79.36443816909016, '348 Queen street east (at Parliament St) ON M5A 1T1', 339, False, -5310441.393101435, 10505249.187492639)
('51c085d3498eadedb67ba6cd', 'Flame Shack', 43.656844075440546, -79.35891727496157, '506 Queen St E (Sumach St) ON M5A 1V2', 319, False, -5310311.086117742, 10504589.317697058)
('4a8355bff964a520d3fa1fe3', 'Mercatto', 43.660390911898546, -79.38766421192705, '101 College St ON M5G', 272, False, -5309380.328848878, 10507847.587933414)
('4ecd63d96da162f1bb0f11c4', 'Sushi Box', 43.66295954544153, -79.38657996066198, '891 Bay street ON M5S

In [38]:
ap_Toronto = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker([latitude, longitude], popup='Toronto').add_to(map_Toronto)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_african = res[6]
    color = 'red'  if is_african else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_Toronto)
map_Toronto

In [39]:
print('List of african restaurants')
print('---------------------------')
for r in list(african_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(african_restaurants))

List of african restaurants
---------------------------
('4c4e2474f53d0f47f18b13a6', 'Ethiopiques', 43.65651274304155, -79.377077748846, '227 Church St. (at Dundas St. E) ON M5B 1Y7', 166, True, -5310132.554085254, 10506693.023237238)
('4a72360ef964a52098da1fe3', 'Ethiopian House', 43.6665987043968, -79.38566906742606, '4 Irwin Ave. (at Yonge St) ON', 218, True, -5308418.311056371, 10507507.873198124)
...
Total: 2


# METHODOLOGY

The focus of this project was to locate areas in Toronto where African Restaurant can be opened. The choice location must have low restaurant cluster  and should be close to downtown Toronto.
The required data was collected. All the restaurant in all Borough of Toronto were identified using Foursquare categorization and African Restaurant were then specified.
Prime venue for African restaurant was located using Foursquare API. Data exploration involved calculation of cluster density to identity area with low number of restaurants.

# Result Analysis

In [40]:
# Data Frame icluding Restaurant Area

In [41]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_tor['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_tor.head(10)

Average number of restaurants in every area with radius=300m: 4.087378640776699


Unnamed: 0,Post Code,Borough,Neighborhood,Latitude,Longitude,Restaurants in area
0,M3A,North York,Parkwoods,43.753259,-79.329656,0
1,M4A,North York,Victoria Village,43.725882,-79.315572,2
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,0
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0
7,M3B,North York,Don Mills,43.745906,-79.352188,0
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,0
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,26


# Downtown Toronto Restaurants distribution

In [42]:
df_data_downtown = df_tor[df_tor['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
df_data_downtown

Unnamed: 0,Post Code,Borough,Neighborhood,Latitude,Longitude,Restaurants in area
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,26
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,12
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,3
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,14
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,31
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,15
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,30


# Downtown Toronto Boroughs with low Restaurant Distribution

In [43]:
df_data_downtown_low = df_data_downtown[df_data_downtown['Restaurants in area'] <=5].reset_index(drop=True)
df_data_downtown_low

Unnamed: 0,Post Code,Borough,Neighborhood,Latitude,Longitude,Restaurants in area
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2
2,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,3
3,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2
4,M5S,Downtown Toronto,"University of Toronto, Harbord",43.662696,-79.400049,4
5,M5V,Downtown Toronto,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.39442,0
6,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,0


# Map of possible Location for Restaurant in Downtown Toronto

In [44]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_data_downtown_low['Latitude'], df_data_downtown_low['Longitude'], df_data_downtown_low['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

In [45]:
df_tor_medium = df_tor[df_tor['Restaurants in area'] >5]
df_tor_medium

Unnamed: 0,Post Code,Borough,Neighborhood,Latitude,Longitude,Restaurants in area
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,26
13,M3C,North York,Don Mills,43.7259,-79.340923,8
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,12
23,M4G,East York,Leaside,43.70906,-79.363452,6
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,14
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,31
33,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,14
36,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,15
37,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,18
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,17


In [46]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_data_downtown_low['Latitude'], df_data_downtown_low['Longitude'], df_data_downtown_low['Restaurants in area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='Green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

# Discussion

Based on the result analysis of the Restaurant distribution in Toronto, seven neighborhoods were found to have low restaurant distribution in Downtown Toronto.
The purppose of this project was to use location data to find boroughs in Toronto that have low number of restaurants were African retaurant can be sited. Downtown Toronto was found to have highest number of restaurant  but there were some borough with low restaurant. These borough are therefore recommended for further analysis.
It is expected that these locations would be studied to check if they meet other criterias beside the lack of competitors.  
.

# Conclusion

Purpose of this project was to locate borough in Toronto with low number of restaurants where investor could locate an african restuarant.Foursquare APi was used to identify all the restaurant in Toronto. The boroughs were then further divided to areas with low and high number of restaurants.The region of low number of restaurant in Downtown Toronto was further analysed to identify optimal locations.

The investor is expected to carry out feasibility studies to determine suitability of recommended locations to select those that match with his other business parameters.

The study showed that Foursquare API may not have adequately identified African Restaurant as google of African restaurant identified more than the two restaurant Foursquare API returned. This maybe due to biase or African restaurant in Toronto may have a low standard that made them not to be classified properly.