# Capstone Project - Restaurant Theme in Downtown Toronto (Week 4)

## 1. Introduction

### 1.1 The Client

The problem as outlined by the client is as follows. They are a national restaurant chain who have grown in size considerably due to their ability to listen to customer preferences. Their current strategy is based on a feedback system as follows:

1. Surveys are collected about which style of restaurant people would like to see added to the area they live in. 
2. A restaurant is then themed on the highest chosen.

This current strategy requires additional staffing to conduct the surveys and often takes weeks or months to complete before a desicsion can be made. As the company is in a position to grow even faster, this process is bottleneck in the companys' progression and as such, a new strategy for quickly determining the most sucessful theme of restaurant to open is vital. Herein, we propose a strategy based on collecting data and utilizing location data to determine this parameter instantly. 

### 1.2 Strategy

The purpose of this project is to determine which theme of restaurant will be most sucessful in the Downtown Toronto area based on the selection of themes that are available in the whole of Toronto, Ontario. The most abundant styles of restaurant in Toronto will be compared with the styles available in the Downtown Toronto area. If a theme or selection of themes that are abundant in Toronto are less so in the Downtown Toronto area, this will help to guide the client when making a descision on which style of restaurant to open. 

The top 3 restaurant themes will be exctracted from the whole of Toronto and compared with the Downtown area. Determining the number of restaurants with these themes from the Toronto area, the 3 lowest abundant resturants will be dtermined to help the client make a descision based on their knowledge of start up costs for each of those 3 restaurant themes. 

Once the descision has been made on which style of restaurant will be opened, location data will be used to determine placement of the restaurant in the neighborhood to minimise rivalry.  

### 1.3 Data and Tools

The following data will be used for the investigation:

1. Number of each theme of restaurant in Toronto
2. Number of each theme of restaurant in the Downtown Toronto area
3. Placement of restaurant themes

The data will be scraped from available data on wikipedia and sorted for analysis. This will be achieved with commonly used python libraries such as**BeautifulSoup**, **numpy** and **pandas** for data analysis and with **foursquare API** for location analysis.

## 2. Data Collection and Analysis

### 2.1 Data Collection

Firstly, the relevant libraries are imported.

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

A webpage containing postcode and neighborhood data is scraped from a wikipedia page. This will the basis for the main dataframe.

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
headers = {'User-Agent':'Mozilla/5.0'}
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

PCtable = soup.find('table', {'class':'wikitable sortable'}).tbody
rows = PCtable.find_all('tr')
columns = [v.text.replace('\n', '') for v in rows[0].find_all('th')]
print(columns)

['Postal Code', 'Borough', 'Neighborhood']


Collecting data and placing cell values into correct cells to create the dataframe. 

In [7]:
df = pd.DataFrame(columns=columns)

for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    if len(tds) ==2:
          values = [tds[0].text, tds[1].text, tds[2].text.replace('\n','')]
    else:
        values = [td.text.replace('\n','') for td in tds]
           
    
            
    df = df.append(pd.Series(values, index=columns), ignore_index=True)
    
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Cleaning the data to remove any cells without a value

In [11]:
df = df[df['Borough'] != 'Not assigned']

In [12]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Coordinates obtained from geopy and saved as a csv file are added to the dataframe to provide location data for boroughs and neighborhoods

In [13]:
coor_df = pd.read_csv('Downloads/Geospatial_Coordinates.csv')
coor_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Both dataframes are sorted in preparation for merging

In [14]:
coord_df = coor_df.sort_values(['Postal Code'], ascending=True)
df = df.sort_values(['Postal Code'], ascending=True)

Dataframe sizes are now checked to ensure they are the same before merging

In [16]:
coor_df.shape

(103, 3)

In [17]:
df.shape

(103, 3)

And they are. So now, the dataframes are merged.

In [18]:
tor_df = pd.merge(df, coor_df)

In [20]:
tor_df.tail()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
102,M9W,Etobicoke,"Northwest, West Humber - Clairville",43.706748,-79.594054


We now have a dataframe which contains the coordinates for the neighborhoods. Now, the coordinates for Downtown Toronto are collected.

Relevant libraries are imported

In [23]:
import numpy as np 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
 
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 

Using geopy, the coordinates are found for Toronto, Ontario.

In [347]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="tor-explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('Coordinates for Toronto, Ontario are:', latitude, longitude)

Coordinates for Toronto, Ontario are: 43.6534817 -79.3839347


A map can now plotted which displays all of the neighborhoods in Toronto, Ontario.

In [348]:
tor_map = folium.Map(location=[latitude, longitude], zoom_start=9)

for lat, lng, borough, neighborhood in zip(tor_df['Latitude'], tor_df['Longitude'], tor_df['Borough'], tor_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tor_map)  
    
tor_map

Next, the foursquare API is used to segment and categorize the boroughs in Toronto.

Defining client credentials as follows.

In [349]:
CLIENT_ID = 'MGXVVFJWUPVOXV2EYNF3L2VGUWBF4IUTNX2NO41E2WDYGXVE'
CLIENT_SECRET = 'T4FWPMF4P5PUQWPAJVEXBZ0QK5XIE5UETBD2QVCRZXSUPUDO' 
VERSION = '20180604'

Extracting Boroughs coordinates.

In [350]:
tor_latitude = tor_df.loc[50, 'Latitude'] 
tor_longitude = tor_df.loc[50, 'Longitude'] 
tor_name = tor_df.loc[50, 'Borough'] 

print('Latitude and longitude values of {} are {}, {}.'.format(tor_name, latitude, longitude))

Latitude and longitude values of Downtown Toronto are 43.6534817, -79.3839347.


Here, the radius from the coordinates and limit of returned values is defined for our search to give to most popular venues in the area. 

In [358]:
limit = 100
radius = 1500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, latitude, longitude, radius, limit)

url 

'https://api.foursquare.com/v2/venues/explore?&client_id=MGXVVFJWUPVOXV2EYNF3L2VGUWBF4IUTNX2NO41E2WDYGXVE&client_secret=T4FWPMF4P5PUQWPAJVEXBZ0QK5XIE5UETBD2QVCRZXSUPUDO&v=20180604&ll=43.6534817,-79.3839347&radius=1500&limit=100'

In [359]:
results = requests.get(url).json()

A function is now created to extract the category of venues.

In [360]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Creating a dataframe containing near by venues.

In [361]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print('Number of venues: {}'.format(nearby_venues.shape[0]))

Number of venues: 100


  nearby_venues = json_normalize(venues)


In [362]:
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Indigo,Bookstore,43.653515,-79.380696
3,UNIQLO ユニクロ,Clothing Store,43.65591,-79.380641
4,CF Toronto Eaton Centre,Shopping Mall,43.65454,-79.380677


In [363]:
nearby_rests = nearby_venues[nearby_venues['categories'].str.contains("Restaurant")]
nearby_rests.count(axis=0)

name          21
categories    21
lat           21
lng           21
dtype: int64

We can see from the dataset that we have a total of 21 restaurants in the surrounding area. 

In [364]:
nearby_rests['categories'].value_counts()

Japanese Restaurant              5
Restaurant                       4
Thai Restaurant                  3
Italian Restaurant               2
Ramen Restaurant                 1
American Restaurant              1
Mediterranean Restaurant         1
Vegetarian / Vegan Restaurant    1
New American Restaurant          1
French Restaurant                1
Middle Eastern Restaurant        1
Name: categories, dtype: int64

We can also see that the most popular restaurants are:

1. Japanese
2. General restaurant
3. Thai

Next, the data is compared with the number and themes of restaurants in Downtown Toronto. 

The same steps above are applied to a dataset containing the same information but just for Downtown Toronto. 

Creating the dataframe with all data for Downtown Toronto for the analysis.

In [365]:
dt_df = tor_df[tor_df['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
dt_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


Coordinates for Downtown Toronto

In [366]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print(latitude, longitude)

43.6541737 -79.38081164513409


Map for Downtown Toronto

In [367]:
dt_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(dt_df['Latitude'], dt_df['Longitude'], dt_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(dt_map)  
    
dt_map

In [397]:
address = 'Downtown Toronto, Ontario'

geolocator = Nominatim(user_agent="tor-explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('Coordinates for Downtown Toronto, Ontario are:', latitude, longitude)

Coordinates for Downtown Toronto, Ontario are: 43.6563221 -79.3809161


In [398]:
dt_latitude = dt_df.loc[0, 'Latitude'] 
dt_longitude = dt_df.loc[0, 'Longitude'] 
dt_name = dt_df.loc[0, 'Borough'] 

In [399]:
limit = 100
radius = 1500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, latitude, longitude, radius, limit)

url 

'https://api.foursquare.com/v2/venues/explore?&client_id=MGXVVFJWUPVOXV2EYNF3L2VGUWBF4IUTNX2NO41E2WDYGXVE&client_secret=T4FWPMF4P5PUQWPAJVEXBZ0QK5XIE5UETBD2QVCRZXSUPUDO&v=20180604&ll=43.6563221,-79.3809161&radius=1500&limit=100'

In [400]:
results = requests.get(url).json()

In [401]:
venues = results['response']['groups'][0]['items']   
nearby_venues_dt = json_normalize(venues) 
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_dt = nearby_venues_dt.loc[:, filtered_columns]
nearby_venues_dt['venue.categories'] = nearby_venues_dt.apply(get_category_type, axis=1)
nearby_venues_dt.columns = [col.split(".")[-1] for col in nearby_venues_dt.columns]

print('Number of venues: {}'.format(nearby_venues_dt.shape[0]))

Number of venues: 100


  nearby_venues_dt = json_normalize(venues)


In [405]:
nearby_venues_dt.head()

Unnamed: 0,name,categories,lat,lng
0,UNIQLO ユニクロ,Clothing Store,43.65591,-79.380641
1,Ed Mirvish Theatre,Theater,43.655102,-79.379768
2,Blaze Pizza,Pizza Place,43.656518,-79.380015
3,CF Toronto Eaton Centre,Shopping Mall,43.65454,-79.380677
4,Burrito Boyz,Burrito Place,43.656265,-79.378343


In [406]:
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Indigo,Bookstore,43.653515,-79.380696
3,UNIQLO ユニクロ,Clothing Store,43.65591,-79.380641
4,CF Toronto Eaton Centre,Shopping Mall,43.65454,-79.380677


In [403]:
nearby_rests_dt = nearby_venues_dt[nearby_venues_dt['categories'].str.contains("Restaurant")]
nearby_rests_dt.count(axis=0)

name          20
categories    20
lat           20
lng           20
dtype: int64

In [411]:
nearby_rests_dt['categories'].value_counts()

Japanese Restaurant              4
Italian Restaurant               4
Thai Restaurant                  3
Ramen Restaurant                 1
Restaurant                       1
American Restaurant              1
Mediterranean Restaurant         1
Theme Restaurant                 1
German Restaurant                1
New American Restaurant          1
Vegetarian / Vegan Restaurant    1
Middle Eastern Restaurant        1
Name: categories, dtype: int64

In [409]:
nearby_rests['categories'].value_counts()

Japanese Restaurant              5
Restaurant                       4
Thai Restaurant                  3
Italian Restaurant               2
Ramen Restaurant                 1
American Restaurant              1
Mediterranean Restaurant         1
Vegetarian / Vegan Restaurant    1
New American Restaurant          1
French Restaurant                1
Middle Eastern Restaurant        1
Name: categories, dtype: int64

From these two datasets we can see that the most abundant restaurants in Toronto and Downtown Toronto are Japanese restaurants which is no surprise. Follwing this, the next most abundant in Toronto is a general food restaurant which there seems to be only 1 of in Downtown Toronto. Finally, the Thai restaurant is the third most popular in Toronto which is also true for downtown Toronto. So, based on the findings for the top 3 most popular restaurants in both areas, there appears to be a gap in the market for a general food restaurant in the Downtown Toronto area so this is what will be suggested to the client. 

To further assist the client, the location of the general restaurant ion the Downtown Toronto area will be plotted on a map so the client can make an informaed descision about where to place the new restaurant.

In [428]:
general_rest = nearby_rests_dt[nearby_rests_dt['categories'] == 'Restaurant']
general_rest.head()

Unnamed: 0,name,categories,lat,lng
26,GEORGE Restaurant,Restaurant,43.653346,-79.374445


In [433]:
dt_map = folium.Map(location=[latitude, longitude], zoom_start=14)

# add markers to map
for lat, lng, label in zip(general_rest['lat'], general_rest['lng'], general_rest['categories']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(dt_map)  
    
dt_map

The only general restaurant located in Downtown Toronto is now located on the map so the client can now make a better descision about the location of their new restaurant. 