# Research on area to open restaurant

## 1. Problem statement

A friend is trying to open a restaurant in Toronto, Canada. But he does not know what area would be suitable in order to generate maximum profit, he asked you to help him. Using the data taken from wikipedia, you can process the information to figure out which areas is the best.

## 2. Approach

1. Find the place where there are many entertainment infrastructure 
2. Among those areas, choose the location with least amount of restaurant

## 3. General steps

1. Collect names of attraction spots in Toronto
2. Get data about entertainment spots in Toronto
3. Get data about restaurant spots in Toronto
4. Display both information on the map 
5. Decide the area with many entertainment spots but not a lot of restaurants

# 4. Implementation

In [None]:
!pip install beautifulsoup4
!pip install lxml
!conda install -c conda-forge folium=0.5.0 --yes
!pip install geocoder
!conda install -c conda-forge geopy --yes

In [1]:
import pandas as pd 
import numpy as np
import folium
from bs4 import BeautifulSoup
import requests
from geopy.geocoders import Nominatim
import urllib
import json

import random # library for random number generation

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

In [2]:
#attraction list taken from wikipedia: https://en.wikipedia.org/wiki/List_of_tourist_attractions_in_Toronto
attractions = ['CN Tower', "Ripley's Aquarium", "Rogers Centre", "Toronto's City Hall", "Yonge-Dundas Square", "The Toronto Islands","St. Michael's Cathedral", "Distillery District"]

In [3]:
CLIENT_ID = 'I0DGF3DDNDCTFAASVYGRS05UULSTCKNRZ33GI00FCV2POXD5' # your Foursquare ID
CLIENT_SECRET = 'ATOS3KS5Y021WB3JBZUYX5Q2HG3OTL3QSZRJIJECAKXGE0G1' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: I0DGF3DDNDCTFAASVYGRS05UULSTCKNRZ33GI00FCV2POXD5
CLIENT_SECRET:ATOS3KS5Y021WB3JBZUYX5Q2HG3OTL3QSZRJIJECAKXGE0G1


In [4]:
toronto_latitude = 43.6532
toronto_longitude = -79.3832
radius = 5000
LIMIT = 1

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [6]:
attractions_table = pd.DataFrame()
for attraction in attractions: 
    search_query = attraction
    radius = 10000
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, toronto_latitude, toronto_longitude, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    venues = results['response']['venues']
    # tranform venues into a dataframe
    dataframe = json_normalize(venues)
    
    filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
    dataframe_filtered = dataframe.loc[:, filtered_columns]
    
    # filter the category for each row
    dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

    # clean column names by keeping only last term
    dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

    dataframe_filtered = dataframe_filtered[['name', 'categories', 'lat', 'lng']]
    attractions_table = pd.concat([attractions_table, dataframe_filtered], ignore_index=True)

In [7]:
attractions_table.reset_index(drop=True)

Unnamed: 0,name,categories,lat,lng
0,CN Tower,Monument / Landmark,43.642536,-79.387182
1,Ripley's Aquarium of Canada,Aquarium,43.642104,-79.386252
2,Rogers Centre,Baseball Stadium,43.641753,-79.38715
3,Toronto City Hall,City Hall,43.65314,-79.383967
4,Yonge-Dundas Square,Plaza,43.656054,-79.380495
5,The Ritz-Carlton Toronto,Hotel,43.64533,-79.387089
6,St. Michael's Cathedral,Church,43.655007,-79.377061
7,Ford Focus @ Distillery District,Event Space,43.650966,-79.35869


In [43]:
from decimal import Decimal
#for each location, we find the restaurant within 500m radius 
categoryId = "4d4b7105d754a06374d81259"  #food category
radius = 200
all_frames = [] 

for index, row in attractions_table.iterrows():
    try:
        attraction_lat = Decimal(row['lat'])
        attraction_lon = Decimal(row['lng'])
        LIMIT = 1000
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, attraction_lat, attraction_lon, VERSION, categoryId, radius, LIMIT)
        results = requests.get(url).json()
        venues = results['response']['venues']
        dataframe = json_normalize(venues)
        dataframe.head()
        venues = results['response']['venues']
        dataframe = json_normalize(venues)
        filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
        dataframe_filtered = dataframe.loc[:, filtered_columns]
        dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
        dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
        all_frames.append(dataframe_filtered)
    except:
        print("Error on " + str(index) + " position")
   

In [44]:
all_frames[7].head() 

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,The Sweet Escape Patisserie,Bakery,55 Mill Street,CA,Toronto,Canada,,37,"[55 Mill Street, Toronto ON M5A 3C4, Canada]","[{'label': 'display', 'lat': 43.65063217302609...",43.650632,-79.358709,,M5A 3C4,ON,4ad4c05df964a5204ef620e3
1,SOMA chocolatemaker,Chocolate Shop,"55 Mill Street, Unit #48",CA,Toronto,Canada,The Distillery District,59,"[55 Mill Street, Unit #48 (The Distillery Dist...","[{'label': 'display', 'lat': 43.65062222570758...",43.650622,-79.358127,,M5A 3C4,ON,4b0978e1f964a520cd1723e3
2,El Catrin,Mexican Restaurant,18 Tank House Lane,CA,Toronto,Canada,Distillery District,44,"[18 Tank House Lane (Distillery District), Tor...","[{'label': 'display', 'lat': 43.65060073711699...",43.650601,-79.35892,,M5A 3C4,ON,51ddecee498e1ffd34185d2f
3,Flipside Donuts Cafe & Bar,Donut Shop,12 Case Goods Lane,CA,Toronto,Canada,,104,"[12 Case Goods Lane, Toronto ON M6H 3C4, Canada]","[{'label': 'display', 'lat': 43.650029, 'lng':...",43.650029,-79.358728,,M6H 3C4,ON,5c9c388c1acf11002cc07ba5
4,Archeo,Italian Restaurant,31 Trinity St.,CA,Toronto,Canada,in The Distillery District,68,"[31 Trinity St. (in The Distillery District), ...","[{'label': 'display', 'lat': 43.65066723014277...",43.650667,-79.359431,,M5A 3C4,ON,4ac3e6cef964a520629d20e3


In [61]:
for index, row in attractions_table.iterrows():
    print("Attraction: " + row['name'] + "; Number of restaurant: " + str(all_frames[index]['name'].count()))

Attraction: CN Tower; Number of restaurant: 46
Attraction: Ripley's Aquarium of Canada; Number of restaurant: 42
Attraction: Rogers Centre; Number of restaurant: 29
Attraction: Toronto City Hall; Number of restaurant: 46
Attraction: Yonge-Dundas Square; Number of restaurant: 50
Attraction: The Ritz-Carlton Toronto; Number of restaurant: 50
Attraction: St. Michael's Cathedral; Number of restaurant: 47
Attraction: Ford Focus @ Distillery District; Number of restaurant: 25


In [66]:
venues_map = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=14) # generate map centred around the Conrad Hotel


colors = ['blue', 'red', 'black', 'gray', 'purple', 'green', 'orange', 'pink']

for index, row in attractions_table.iterrows():
    attraction_lat = float(row['lat'])
    attraction_lon = float(row['lng'])
    folium.Marker(
        location=[attraction_lat, attraction_lon],
        popup=row['name'].replace("'",""),
        icon=folium.Icon(color=colors[index])
    ).add_to(venues_map)
    
    # add the restaurants as blue circle markers
    for lat, lng, label in zip(all_frames[index].lat, all_frames[index].lng, all_frames[index].categories):
        folium.features.CircleMarker(
            [lat, lng],
            radius=5,
            color=colors[index],
            popup=label,
            fill = True,
            fill_color=colors[index],
            fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

## 5. Conclusions

From the above results, we can see that the Distillery District has the lowest number of restaurants and hence we should a open a restaurant in this area