
# The city of choice for Coffee


## Introduction

A friend of mine is looking to open a coffee roastery in Canada and wants to know which city between Toronto, Montreal and Vancouver would be the best to start the business. For this we will evaluate the number of coffee shops that are available in these cities. The logic is that the more coffee shops are available the more opportunities would be available to sell the roasted coffee beans. This would mean a larger target group and probably a higher revenue. The target audience is the friend who wants to open his business and ensure that he has enough prospective customers. He should get a list of which customers he could provide coffee beans to and why a specific city would be best. 

## Data

We will primarily use foursquare as the basis for our data. After importing the necessary libraries we will extract the location of the cities in terms of latitude and longitude. This will be done via geopy. The locations will enable us to search for coffee shops in the vicinity of the city centre. The radius in which the search will be done is chosen as 5km. This data will be extracted from the foursquare database. We will then implement the results of all cities in one dataframe. To check that the coffee shops are in the correct area we will show the positions on a folium map. This will be the basis for the evaluation. This data will still have to be looked into for consistency and relevance and thus might have to be cleaned in the evaluation stage.

### Import necessary Libraries

To fulfil the task we will need the following libraries.

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 19.3MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0
Folium installed
Libraries imported.


### Define Foursquare Credentials and Version


In [2]:
CLIENT_ID = 'BW1GHF0SUKO0Y2DZIQYSB4DU3IHXNEIXHN1FVYYCE2O5OYVO' # your Foursquare ID
CLIENT_SECRET = 'HGIW0BQTPFA3QZFTA1DQQRZI5WAWVYREJZSX1JC53EA4MYZN' # your Foursquare Secret
ACCESS_TOKEN = '4SEVI1EPK1YGKIQUVPHPDS3BNVIQ41E3XFATR4MLM2OAFELT' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BW1GHF0SUKO0Y2DZIQYSB4DU3IHXNEIXHN1FVYYCE2O5OYVO
CLIENT_SECRET:HGIW0BQTPFA3QZFTA1DQQRZI5WAWVYREJZSX1JC53EA4MYZN


#### First of all we will need the location of the different cities. For this we will get the different locations of the cities.


In [4]:
VAN_address = 'Vancouver, BC'
TOR_address = 'Toronto, ON'
MON_address = 'Montreal, QC'
# Location of Vancouver
geolocator = Nominatim(user_agent="VAN_agent")
VAN_location = geolocator.geocode(VAN_address)
VAN_latitude = VAN_location.latitude
VAN_longitude = VAN_location.longitude
#Location of Toronto
geolocator = Nominatim(user_agent="TOR_agent")
TOR_location = geolocator.geocode(TOR_address)
TOR_latitude = TOR_location.latitude
TOR_longitude = TOR_location.longitude
#Location of Montreal
geolocator = Nominatim(user_agent="MON_agent")
MON_location = geolocator.geocode(MON_address)
MON_latitude = MON_location.latitude
MON_longitude = MON_location.longitude



print('Location of Vancouver', VAN_latitude , VAN_longitude)
print('Location of Toronto', TOR_latitude, TOR_longitude)
print('Location of Montreal', MON_latitude, MON_longitude)

Location of Vancouver 49.2608724 -123.1139529
Location of Toronto 43.6534817 -79.3839347
Location of Montreal 45.4972159 -73.6103642


<a id="item1"></a>


## 1. Search for coffee shops in a radius of 5kms from the centres of the cities




#### To have the same prerequisites for each city we will assume that the same search radius should give the same chances for each city.

In [5]:
search_query = 'Coffee Shop'
radius = 5000

#### For the different cities we will define an URL each


In [6]:
url_VAN = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VAN_latitude, VAN_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url_TOR = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, TOR_latitude, TOR_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url_MON = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, MON_latitude, MON_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

#### Send the GET Request and examine the results


In [7]:
results_VAN = requests.get(url_VAN).json()
results_TOR = requests.get(url_TOR).json()
results_MON = requests.get(url_MON).json()

#### Get relevant part of JSON and transform it into a _pandas_ dataframe


In [23]:
# assign relevant part of JSON to venues
venues_VAN = results_VAN['response']['venues']
venues_TOR = results_TOR['response']['venues']
venues_MON = results_MON['response']['venues']
# tranform venues into a dataframe
df_VAN = json_normalize(venues_VAN)
df_TOR = json_normalize(venues_TOR)
df_MON = json_normalize(venues_MON)
df_MON.tail()


  
  import sys
  


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.state,location.country,location.formattedAddress,location.address,location.postalCode,location.crossStreet,venuePage.id
25,5b84fefd947c05002cb5078f,Sushi Shop,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1622150763,False,45.500341,-73.562071,"[{'label': 'display', 'lat': 45.500341, 'lng':...",3784,CA,Montréal,QC,Canada,"[800 Rue du Square-Victoria, Place Victoria, M...","800 Rue du Square-Victoria, Place Victoria",H4Z 1A1,,
26,5fe33ee5b336646646e6ecb8,Koodo Shop,"[{'id': '4bf58dd8d48988d1ff941735', 'name': 'M...",v-1622150763,False,45.499505,-73.582556,"[{'label': 'display', 'lat': 45.499505, 'lng':...",2184,CA,Montréal,QC,Canada,"[705 Rue Sainte-Catherine O, Montréal QC H3G 4...",705 Rue Sainte-Catherine O,H3G 4G5,,1360483053.0
27,5b84ffde5b97110025d63f76,Sushi Shop,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1622150763,False,45.507809,-73.565201,"[{'label': 'display', 'lat': 45.507809, 'lng':...",3715,CA,Montréal,QC,Canada,"[150 Saint-Catherine St O, Complexe Desjardins...","150 Saint-Catherine St O, Complexe Desjardins",H2X 3Y2,,
28,4c40823baf052d7f3dde7b79,Montreal Piano Repair Shop,[],v-1622150763,False,45.517279,-73.582425,"[{'label': 'display', 'lat': 45.5172788, 'lng'...",3120,CA,Montréal,QC,Canada,"[61 rue Rachel O (Saint-Urbain), Montréal QC H...",61 rue Rachel O,H2W 1G2,Saint-Urbain,
29,5c849fa5db2aeb002c019a1a,I Thing Shop,"[{'id': '5744ccdfe4b0c0459246b4dc', 'name': 'S...",v-1622150763,False,45.483544,-73.632796,"[{'label': 'display', 'lat': 45.48354439500839...",2319,CA,Laval,QC,Canada,"[1560 Place Kirouac, Laval QC H7G 4X5, Canada]",1560 Place Kirouac,H7G 4X5,,


To have everything nice and visible in one dataframe we will concatenate these together


In [24]:
frames = [df_VAN, df_TOR, df_MON]
df_complete = pd.concat(frames)
df_complete.shape

(90, 19)

#### Define information of interest and filter dataframe

In [19]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df_complete.columns if col.startswith('location.')] + ['id']
df = df_complete.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df['categories'] = df.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df.columns = [column.split('.')[-1] for column in df.columns]

df

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Laura's Coffee Shop,Diner,1945 Manitoba St.,at 4th Ave.,49.267427,-123.106913,"[{'label': 'display', 'lat': 49.267427, 'lng':...",891,V5Y 3A1,CA,Vancouver,BC,Canada,"[1945 Manitoba St. (at 4th Ave.), Vancouver BC...",,4c48639e417b20a19bbfe0a9
1,7 Days Coffee Shop,Café,920 Beatty St.,,49.275102,-123.117491,"[{'label': 'display', 'lat': 49.275102, 'lng':...",1604,,CA,,,Canada,"[920 Beatty St., Canada]",,57196f28498e2aeaefab44b2
2,The Taste & See Coffee Shop,Coffee Shop,1628 West 1st Avenue #128,,49.270256,-123.141433,"[{'label': 'display', 'lat': 49.270256, 'lng':...",2252,,CA,Vancouver,BC,Canada,"[1628 West 1st Avenue #128, Vancouver BC, Canada]",,586453fa0037eb3be739c864
3,Delicatessen Coffee Shop,Sandwich Place,,Davie at Burrard,49.278508,-123.129527,"[{'label': 'display', 'lat': 49.27850795147412...",2265,,CA,Vancouver,BC,Canada,"[Davie at Burrard, Vancouver BC, Canada]",,4eda85d546907c1b42d3711e
4,Eclettico Coffee Shop,Café,,,49.247820,-123.089950,"[{'label': 'display', 'lat': 49.24782, 'lng': ...",2269,V5V 4E8,CA,Vancouver,BC,Canada,"[Vancouver BC V5V 4E8, Canada]",,5cca5b711fa763002ca67636
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25,Sushi Shop,Restaurant,"800 Rue du Square-Victoria, Place Victoria",,45.500341,-73.562071,"[{'label': 'display', 'lat': 45.500341, 'lng':...",3784,H4Z 1A1,CA,Montréal,QC,Canada,"[800 Rue du Square-Victoria, Place Victoria, M...",,5b84fefd947c05002cb5078f
26,Koodo Shop,Miscellaneous Shop,705 Rue Sainte-Catherine O,,45.499505,-73.582556,"[{'label': 'display', 'lat': 45.499505, 'lng':...",2184,H3G 4G5,CA,Montréal,QC,Canada,"[705 Rue Sainte-Catherine O, Montréal QC H3G 4...",,5fe33ee5b336646646e6ecb8
27,Sushi Shop,Restaurant,"150 Saint-Catherine St O, Complexe Desjardins",,45.507809,-73.565201,"[{'label': 'display', 'lat': 45.507809, 'lng':...",3715,H2X 3Y2,CA,Montréal,QC,Canada,"[150 Saint-Catherine St O, Complexe Desjardins...",,5b84ffde5b97110025d63f76
28,Montreal Piano Repair Shop,,61 rue Rachel O,Saint-Urbain,45.517279,-73.582425,"[{'label': 'display', 'lat': 45.5172788, 'lng'...",3120,H2W 1G2,CA,Montréal,QC,Canada,"[61 rue Rachel O (Saint-Urbain), Montréal QC H...",,4c40823baf052d7f3dde7b79


#### Looking at the data it seems like there are also a lot of restaurants, sushi places, etc. that have been added. We would however be only looking for pure coffee shops. Thus we will have to clean the dataframe and remove everything that is not a coffee shop or café


#### Let's visualize the Coffee Shops that are in the cities. This only shows the data not yet an evaluation.

In [20]:
df.name

0             Laura's Coffee Shop
1              7 Days Coffee Shop
2     The Taste & See Coffee Shop
3        Delicatessen Coffee Shop
4           Eclettico Coffee Shop
                 ...             
25                     Sushi Shop
26                     Koodo Shop
27                     Sushi Shop
28     Montreal Piano Repair Shop
29                   I Thing Shop
Name: name, Length: 90, dtype: object

In [26]:
venues_map = folium.Map(location=[TOR_latitude, TOR_longitude], zoom_start=3) # generate map centred aroun Toronto

# add a red circle marker to represent the different cities
folium.CircleMarker(
    [TOR_latitude, TOR_longitude],
    radius=10,
    color='red',
    popup='Toronto',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

folium.CircleMarker(
    [VAN_latitude, VAN_longitude],
    radius=10,
    color='red',
    popup='Vancouver',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

folium.CircleMarker(
    [MON_latitude, MON_longitude],
    radius=10,
    color='red',
    popup='Montreal',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Coffee shops as blue circle markers
for lat, lng, label in zip(df.lat, df.lng, df.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

<a id="item2"></a>


From this data we will review the amount of coffee shops in the nearer vicinity. We will show on a map what the highest density will be per city and then make a recommendation on what city is well suited. We will also have to look at the data if the stated coffee shops are of acceptable quality. Further more a map will be used to show intuitively what the best choice would be. This is end of week 1 of the assignment.