
# The city of choice for Coffee


## Introduction

A friend of mine is looking to open a coffee roastery in Canada and wants to know which city between Toronto, Montreal and Vancouver would be the best to start the business. For this we will evaluate the number of coffee shops that are available in these cities. The logic is that the more coffee shops are available the more opportunities would be available to sell the roasted coffee beans. This would mean a larger target group and probably a higher revenue. The target audience is the friend who wants to open his business and ensure that he has enough prospective customers. He should get a list of which customers he could provide coffee beans to and why a specific city would be best. 

## Data

We will primarily use __foursquare__ as the basis for our data. After importing the necessary libraries we will extract the location of the cities in terms of latitude and longitude. This will be done via __geopy__. The locations will enable us to search for coffee shops in the vicinity of the city centre. The radius in which the search will be done is chosen as __5km__. This data will be extracted from the __foursquare__ database. We will then implement the results of all cities in one __dataframe__. To check that the coffee shops are in the correct area we will show the positions on a __folium__ map. This will be the basis for the evaluation. This data will still have to be looked into for consistency and relevance and thus might have to be cleaned in the evaluation stage.

### Import necessary Libraries

To fulfil the task we will need the following libraries.

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

from sklearn.cluster import KMeans #module for clustering
from sklearn.preprocessing import StandardScaler # module for preprocessing for the cluster analysis

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

import matplotlib.cm as cm # module for visualisation
import matplotlib.colors as colors #module for visualisation
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


### Define Foursquare Credentials and Version


In [3]:
CLIENT_ID = 'BW1GHF0SUKO0Y2DZIQYSB4DU3IHXNEIXHN1FVYYCE2O5OYVO' # your Foursquare ID
CLIENT_SECRET = 'HGIW0BQTPFA3QZFTA1DQQRZI5WAWVYREJZSX1JC53EA4MYZN' # your Foursquare Secret
ACCESS_TOKEN = '4SEVI1EPK1YGKIQUVPHPDS3BNVIQ41E3XFATR4MLM2OAFELT' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BW1GHF0SUKO0Y2DZIQYSB4DU3IHXNEIXHN1FVYYCE2O5OYVO
CLIENT_SECRET:HGIW0BQTPFA3QZFTA1DQQRZI5WAWVYREJZSX1JC53EA4MYZN


#### First of all we will need the location of the different cities. For this we will get the different locations of the cities. For each city we will define the latittude and longitude. The locations are printed as an output.


In [4]:
VAN_address = 'Vancouver, BC'
TOR_address = 'Toronto, ON'
MON_address = 'Montreal, QC'
# Location of Vancouver
geolocator = Nominatim(user_agent="VAN_agent")
VAN_location = geolocator.geocode(VAN_address)
VAN_latitude = VAN_location.latitude
VAN_longitude = VAN_location.longitude
#Location of Toronto
geolocator = Nominatim(user_agent="TOR_agent")
TOR_location = geolocator.geocode(TOR_address)
TOR_latitude = TOR_location.latitude
TOR_longitude = TOR_location.longitude
#Location of Montreal
geolocator = Nominatim(user_agent="MON_agent")
MON_location = geolocator.geocode(MON_address)
MON_latitude = MON_location.latitude
MON_longitude = MON_location.longitude



print('Location of Vancouver', VAN_latitude , VAN_longitude)
print('Location of Toronto', TOR_latitude, TOR_longitude)
print('Location of Montreal', MON_latitude, MON_longitude)

Location of Vancouver 49.2608724 -123.1139529
Location of Toronto 43.6534817 -79.3839347
Location of Montreal 45.4972159 -73.6103642


<a id="item1"></a>


## 1. Search for coffee shops in a radius of 5kms from the centres of the cities




#### To have the same prerequisites for each city we will assume that the same search radius should give the same chances for each city. The radius is set as 5km.

In [5]:
search_query = 'Coffee Shop'
radius = 5000

#### For the different cities we will define an URL each


In [6]:
url_VAN = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VAN_latitude, VAN_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url_TOR = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, TOR_latitude, TOR_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url_MON = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, MON_latitude, MON_longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

#### Send the GET Request and examine the results


In [7]:
results_VAN = requests.get(url_VAN).json()
results_TOR = requests.get(url_TOR).json()
results_MON = requests.get(url_MON).json()

#### Get relevant part of JSON and transform it into a _pandas_ dataframe


In [8]:
# assign relevant part of JSON to venues
venues_VAN = results_VAN['response']['venues']
venues_TOR = results_TOR['response']['venues']
venues_MON = results_MON['response']['venues']
# tranform venues into a dataframe
df_VAN = json_normalize(venues_VAN)
df_TOR = json_normalize(venues_TOR)
df_MON = json_normalize(venues_MON)
df_MON.tail()


  df_VAN = json_normalize(venues_VAN)
  df_TOR = json_normalize(venues_TOR)
  df_MON = json_normalize(venues_MON)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.state,location.country,location.formattedAddress,location.address,location.postalCode,location.crossStreet,venuePage.id
45,5b84fca2c0af57002ca69274,Sushi Shop,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1625002763,False,45.47322,-73.600918,"[{'label': 'display', 'lat': 45.47322, 'lng': ...",2771,CA,Montréal,QC,Canada,"[1001 Boul Decarie, Montréal QC H4A 3J1, Canada]",1001 Boul Decarie,H4A 3J1,,
46,5a590e4c0a464d6ac47a5982,Dumpling Shop,"[{'id': '4bf58dd8d48988d108941735', 'name': 'D...",v-1625002763,False,45.474707,-73.623587,"[{'label': 'display', 'lat': 45.474707, 'lng':...",2709,CA,Montréal,QC,Canada,"[5674 Av de Monkland, Montréal QC H4A 1E4, Can...",5674 Av de Monkland,H4A 1E4,,
47,4cacabc6965c9c74802dccfa,Sushi Shop,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1625002763,False,45.501355,-73.570814,"[{'label': 'display', 'lat': 45.501355, 'lng':...",3120,CA,Montréal,QC,Canada,"[1200 Ave McGill College (coin Cathcart), Mont...",1200 Ave McGill College,H3B 4G7,coin Cathcart,
48,4b1d4ca9f964a520690e24e3,The Body Shop,"[{'id': '4bf58dd8d48988d10c951735', 'name': 'C...",v-1625002763,False,45.502824,-73.571933,"[{'label': 'display', 'lat': 45.502824, 'lng':...",3062,CA,Montréal,QC,Canada,"[4 Place Ville Marie, Montréal QC, Canada]",4 Place Ville Marie,,,
49,4c0fbcc3c6cf76b021c38251,Sushi Shop,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1625002763,False,45.475696,-73.622347,"[{'label': 'display', 'lat': 45.475696, 'lng':...",2571,CA,Montréal,QC,Canada,"[5580 Av Monkland (Marcil), Montréal QC H4A 1C...",5580 Av Monkland,H4A 1C9,Marcil,


To have everything nice and visible in one dataframe we will concatenate these together


In [9]:
frames = [df_VAN, df_TOR, df_MON]
df_complete = pd.concat(frames)
df_complete.shape

(150, 19)

#### Define information of interest and filter dataframe. 
Now the data is filtered according to the categories.

In [10]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df_complete.columns if col.startswith('location.')] + ['id']
df = df_complete.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df['categories'] = df.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df.columns = [column.split('.')[-1] for column in df.columns]

df

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Laura's Coffee Shop,Diner,1945 Manitoba St.,at 4th Ave.,49.267427,-123.106913,"[{'label': 'display', 'lat': 49.267427, 'lng':...",891,V5Y 3A1,CA,Vancouver,BC,Canada,"[1945 Manitoba St. (at 4th Ave.), Vancouver BC...",,4c48639e417b20a19bbfe0a9
1,7 Days Coffee Shop,Café,920 Beatty St.,,49.275102,-123.117491,"[{'label': 'display', 'lat': 49.275102, 'lng':...",1604,,CA,,,Canada,"[920 Beatty St., Canada]",,57196f28498e2aeaefab44b2
2,The Taste & See Coffee Shop,Coffee Shop,1628 West 1st Avenue #128,,49.270256,-123.141433,"[{'label': 'display', 'lat': 49.270256, 'lng':...",2252,,CA,Vancouver,BC,Canada,"[1628 West 1st Avenue #128, Vancouver BC, Canada]",,586453fa0037eb3be739c864
3,Delicatessen Coffee Shop,Sandwich Place,,Davie at Burrard,49.278508,-123.129527,"[{'label': 'display', 'lat': 49.27850795147412...",2265,,CA,Vancouver,BC,Canada,"[Davie at Burrard, Vancouver BC, Canada]",,4eda85d546907c1b42d3711e
4,Eclettico Coffee Shop,Café,,,49.247820,-123.089950,"[{'label': 'display', 'lat': 49.24782, 'lng': ...",2269,V5V 4E8,CA,Vancouver,BC,Canada,"[Vancouver BC V5V 4E8, Canada]",,5cca5b711fa763002ca67636
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45,Sushi Shop,Restaurant,1001 Boul Decarie,,45.473220,-73.600918,"[{'label': 'display', 'lat': 45.47322, 'lng': ...",2771,H4A 3J1,CA,Montréal,QC,Canada,"[1001 Boul Decarie, Montréal QC H4A 3J1, Canada]",,5b84fca2c0af57002ca69274
46,Dumpling Shop,Dumpling Restaurant,5674 Av de Monkland,,45.474707,-73.623587,"[{'label': 'display', 'lat': 45.474707, 'lng':...",2709,H4A 1E4,CA,Montréal,QC,Canada,"[5674 Av de Monkland, Montréal QC H4A 1E4, Can...",,5a590e4c0a464d6ac47a5982
47,Sushi Shop,Restaurant,1200 Ave McGill College,coin Cathcart,45.501355,-73.570814,"[{'label': 'display', 'lat': 45.501355, 'lng':...",3120,H3B 4G7,CA,Montréal,QC,Canada,"[1200 Ave McGill College (coin Cathcart), Mont...",,4cacabc6965c9c74802dccfa
48,The Body Shop,Cosmetics Shop,4 Place Ville Marie,,45.502824,-73.571933,"[{'label': 'display', 'lat': 45.502824, 'lng':...",3062,,CA,Montréal,QC,Canada,"[4 Place Ville Marie, Montréal QC, Canada]",,4b1d4ca9f964a520690e24e3


#### Looking at the data it seems like there are also a lot of restaurants, sushi places, etc. that have been added. We would however be only looking for pure coffee shops. Thus we will have to clean the dataframe and remove everything that is not a coffee shop or café


#### Let's visualize the Coffee Shops that are in the cities. This only shows the data not yet an evaluation.

In [11]:
df.name

0             Laura's Coffee Shop
1              7 Days Coffee Shop
2     The Taste & See Coffee Shop
3        Delicatessen Coffee Shop
4           Eclettico Coffee Shop
                 ...             
45                     Sushi Shop
46                  Dumpling Shop
47                     Sushi Shop
48                  The Body Shop
49                     Sushi Shop
Name: name, Length: 150, dtype: object

In [12]:
venues_map = folium.Map(location=[TOR_latitude, TOR_longitude], zoom_start=3) # generate map centred aroun Toronto

# add a red circle marker to represent the different cities
folium.CircleMarker(
    [TOR_latitude, TOR_longitude],
    radius=10,
    color='red',
    popup='Toronto',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

folium.CircleMarker(
    [VAN_latitude, VAN_longitude],
    radius=10,
    color='red',
    popup='Vancouver',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

folium.CircleMarker(
    [MON_latitude, MON_longitude],
    radius=10,
    color='red',
    popup='Montreal',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Coffee shops as blue circle markers
for lat, lng, label in zip(df.lat, df.lng, df.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

<a id="item2"></a>


# Methodology

With this data we will have a look at which city as the highest density of coffee shops. First of all we will filter for pure coffee shops or similar. This means that no restaurants, sushi places, etc. will be included in the dataframe.

After cleaning the data we will analyze the amount of coffee shops near the centre of the city using a folium map. A simple map showing the amount will suffice in showing which city has the highest density. This will lead to the city selection.

Considering that the coffee has to be distributed we will go one step further and cluster the city into n amounts. For the cluster we will consider that per week each coffee shop has to receive fresh coffee once. The coffee is distributed via bike and 8 coffee shops can be supplied per day as a first estimate. The clusters should show the area the bike has to supply.


# Analysis

As we saw in the earlier description of the dataframe we will need to concentrate on the coffee shops. Everything that is a restaurant, sushi place or similar is not what is needed by the customer and will thus be removed from the dataframe. This we do with the following code.

In [13]:
df_coffee= df[df["name"].str.contains("Coffee|Café")==True]
df_coffee

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Laura's Coffee Shop,Diner,1945 Manitoba St.,at 4th Ave.,49.267427,-123.106913,"[{'label': 'display', 'lat': 49.267427, 'lng':...",891,V5Y 3A1,CA,Vancouver,BC,Canada,"[1945 Manitoba St. (at 4th Ave.), Vancouver BC...",,4c48639e417b20a19bbfe0a9
1,7 Days Coffee Shop,Café,920 Beatty St.,,49.275102,-123.117491,"[{'label': 'display', 'lat': 49.275102, 'lng':...",1604,,CA,,,Canada,"[920 Beatty St., Canada]",,57196f28498e2aeaefab44b2
2,The Taste & See Coffee Shop,Coffee Shop,1628 West 1st Avenue #128,,49.270256,-123.141433,"[{'label': 'display', 'lat': 49.270256, 'lng':...",2252,,CA,Vancouver,BC,Canada,"[1628 West 1st Avenue #128, Vancouver BC, Canada]",,586453fa0037eb3be739c864
3,Delicatessen Coffee Shop,Sandwich Place,,Davie at Burrard,49.278508,-123.129527,"[{'label': 'display', 'lat': 49.27850795147412...",2265,,CA,Vancouver,BC,Canada,"[Davie at Burrard, Vancouver BC, Canada]",,4eda85d546907c1b42d3711e
4,Eclettico Coffee Shop,Café,,,49.247820,-123.089950,"[{'label': 'display', 'lat': 49.24782, 'lng': ...",2269,V5V 4E8,CA,Vancouver,BC,Canada,"[Vancouver BC V5V 4E8, Canada]",,5cca5b711fa763002ca67636
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1,Solo Coffee Shop,Café,"65 Sherbrooke Est, bureau 110",,45.514187,-73.568269,"[{'label': 'display', 'lat': 45.51418721002326...",3788,H2X 1C4,CA,Montréal,QC,Canada,"[65 Sherbrooke Est, bureau 110, Montréal QC H2...",,4d99d98c647d8cfae217113e
2,Vortex CoffeeShop,Coffee Shop,"40, rue Jean-Talon Est",,45.534908,-73.617304,"[{'label': 'display', 'lat': 45.53490842878835...",4230,H2R 1S3,CA,Montréal,QC,Canada,"[40, rue Jean-Talon Est, Montréal QC H2R 1S3, ...",,4d371c62e4b4a093ea2c2a36
3,Coffee Depot,Café,,,45.502210,-73.566780,"[{'label': 'display', 'lat': 45.50221, 'lng': ...",3445,H3B 3C1,CA,Montréal,QC,Canada,"[Montréal QC H3B 3C1, Canada]",,5783d19b498e6150d6db1420
4,Coffee machine BBR,Corporate Coffee Shop,"606, rue Cathcart, 10e étage",,45.503467,-73.568119,"[{'label': 'display', 'lat': 45.503467, 'lng':...",3368,,CA,Montréal,QC,Canada,"[606, rue Cathcart, 10e étage, Montréal QC, Ca...",,587f8a91fad9dc66ac900157


This seems to be a bit more focused on the coffee aspect. Now we will create a marker map to easily see which city has the most potential customers. The amount of coffee shops in the central part (radius 5km) of the city will be our focus. For this we will use a folium map where the marker bubbles will show the amount of coffee shops.

In [14]:
from folium import plugins
from folium.plugins import MarkerCluster

map_canada = folium.Map(location=[TOR_latitude, TOR_longitude], zoom_start=3)

# instantiate a mark cluster object for the incidents in the dataframe
shops = MarkerCluster().add_to(map_canada)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_coffee.lat, df_coffee.lng, df_coffee.name):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(shops)

# display map
map_canada

We have to zoom in to see that Vancouver has the most coffee shops in a radius of 5km closely followed by Toronto and Montreal having only a third the amount in downtown. With this information we will focus on Toronto. The next step is to cluster the city into different areas, to allow for a normalized distribution of the coffee. Considering that the aim is to deliver the coffee by bike, we will need an even distribution in area. This means we will cluster the coffee shops into areas and not into the amount of coffee shops. Thus the area will be the same not the amount of coffee shops. These clusters are then the delivery areas for a specific day of the week. First we will create a dataset that only inlcudes the Toronto coffeeshops. 

In [15]:
df_coffeeVAN= df_coffee[df_coffee["city"].str.contains("Vancouver")==True]
df_coffeeVAN

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Laura's Coffee Shop,Diner,1945 Manitoba St.,at 4th Ave.,49.267427,-123.106913,"[{'label': 'display', 'lat': 49.267427, 'lng':...",891,V5Y 3A1,CA,Vancouver,BC,Canada,"[1945 Manitoba St. (at 4th Ave.), Vancouver BC...",,4c48639e417b20a19bbfe0a9
2,The Taste & See Coffee Shop,Coffee Shop,1628 West 1st Avenue #128,,49.270256,-123.141433,"[{'label': 'display', 'lat': 49.270256, 'lng':...",2252,,CA,Vancouver,BC,Canada,"[1628 West 1st Avenue #128, Vancouver BC, Canada]",,586453fa0037eb3be739c864
3,Delicatessen Coffee Shop,Sandwich Place,,Davie at Burrard,49.278508,-123.129527,"[{'label': 'display', 'lat': 49.27850795147412...",2265,,CA,Vancouver,BC,Canada,"[Davie at Burrard, Vancouver BC, Canada]",,4eda85d546907c1b42d3711e
4,Eclettico Coffee Shop,Café,,,49.24782,-123.08995,"[{'label': 'display', 'lat': 49.24782, 'lng': ...",2269,V5V 4E8,CA,Vancouver,BC,Canada,"[Vancouver BC V5V 4E8, Canada]",,5cca5b711fa763002ca67636
5,Mama and Papa Cuban Coffee Shop,Cuban Restaurant,,,49.281197,-123.095818,"[{'label': 'display', 'lat': 49.28119659423828...",2617,,CA,Vancouver,BC,Canada,"[Vancouver BC, Canada]",,53da9e06498e61b428ee9a31
6,The Coffee Shop,Café,2305 West 41st,,49.234648,-123.159866,"[{'label': 'display', 'lat': 49.234648, 'lng':...",4433,V6m,CA,Vancouver,BC,Canada,"[2305 West 41st, Vancouver BC V6m, Canada]",,4c213b4d99282d7fde4865b0
7,Y57 Coffee Shop Series,Music Venue,2610 West 4th Avenue,,49.268323,-123.16456,"[{'label': 'display', 'lat': 49.26832260906662...",3768,,CA,Vancouver,BC,Canada,"[2610 West 4th Avenue, Vancouver BC, Canada]",,4eea7ca08b81bff6aa21cde9
8,Cultured Coffee & Tea,Coffee Shop,555 W 12th Ave,at Cambie St in City Square Shopping Centre,49.260936,-123.116359,"[{'label': 'display', 'lat': 49.26093629933292...",174,V5Z 3X7,CA,Vancouver,BC,Canada,[555 W 12th Ave (at Cambie St in City Square S...,,4be48099910020a134ced114
9,Blenz Coffee,Coffee Shop,521 West Broadway,at Cambie St,49.263438,-123.115765,"[{'label': 'display', 'lat': 49.26343759196383...",314,V5Z 1E6,CA,Vancouver,BC,Canada,"[521 West Broadway (at Cambie St), Vancouver B...",,4aa8320cf964a520f94f20e3
10,Elysian Coffee,Coffee Shop,2301 Ontario St,at 7th Ave,49.264621,-123.104963,"[{'label': 'display', 'lat': 49.26462056353454...",775,V5T 2X5,CA,Vancouver,BC,Canada,"[2301 Ontario St (at 7th Ave), Vancouver BC V5...",,5478ca03498e3bbc45e6ce54


Now we will only focus on the positional data and remove all the columns that are not necessary. Then we will normalize this data to use it for the clustering method.

In [16]:
df_positiononly=df_coffeeVAN.drop(columns=['name', 'categories', 'address', 'crossStreet',
       'labeledLatLngs', 'distance', 'postalCode', 'cc', 'city', 'state',
       'country', 'formattedAddress', 'neighborhood', 'id'])
X = df_positiononly.values[:,1:]
X = np.nan_to_num(X)
cluster_dataset = StandardScaler().fit_transform(X)
cluster_dataset


array([[ 0.70343385],
       [-1.41824867],
       [-0.68648686],
       [ 1.74602078],
       [ 1.38538602],
       [-2.55118534],
       [-2.8396655 ],
       [ 0.12286536],
       [ 0.1593774 ],
       [ 0.82331051],
       [ 0.20113119],
       [ 1.05591473],
       [ 0.73529661],
       [-0.64330179],
       [-0.34165104],
       [ 0.06311843],
       [ 0.48675615],
       [ 0.58745428],
       [ 0.10190121],
       [-0.08345408],
       [-0.17012162],
       [-0.23363702],
       [ 0.3515349 ],
       [ 0.39443074],
       [-1.96560407],
       [ 1.06988504],
       [ 1.08499252],
       [ 0.38895877],
       [-1.64527623],
       [ 0.39348255],
       [ 0.23913101],
       [ 0.14655482],
       [-0.6190529 ],
       [ 0.88265825],
       [ 0.13485104],
       [ 0.20607476],
       [-0.26683579]])

According to the assumption above we will need 5 clusters (for each day of the week).

In [17]:
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cluster_dataset)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 1, 4, 2, 2, 3, 3, 0, 0, 2], dtype=int32)

Now we visualize the data of the coffee shops that are closest together to show what areas could be delivered to in one day. For this we add a cluster label to the dataframe to allow for visualisation. Furthermore the name of the coffee shop is added as a label for identification. With this the distribution could start.

In [18]:
df_visual=df_coffeeVAN
df_visual.insert(0, 'Cluster Labels', kmeans.labels_)
df_visual


# create map
map_clusters = folium.Map(location=[VAN_latitude, VAN_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_visual['lat'], df_visual['lng'], df_visual['name'], df_visual['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion

The final map shows how the coffee could be distributed in the city of Toronto. Toronto was chosen as the basis for the coffee roastery as it had the greatest density of coffee shops in a 5km radius of the city centre compared to the cities Vancouver and Montreal. From the analysis this would have the highest chance of succeeding as the assumption is that the higher amount of coffee shops the higher the amount of potential for the need of coffee beans from a coffee roastery.

The analysis does however have a few weaknesses in that a few points have to be evaluated in greater detail. The coffee shops that were evaluated are form a single database and maybe not all coffee shops were captured. For this other databases should be evaluated as well. A point that has to be considered is that Montreal shows far less coffee shops in comparison to the other two cities. The reasons for this should be looked to into as well and maybe the location of the city centre moved to find an area where the coffee shop density is higher.

THe analysis focussed on three cities in Canada. It could be possible that further cities might have a higher density of coffee shops. This would however have to be discussed if a different city is an option.

Furthermore it should be noted that the coffee shops probably already have there own coffee roastery distributor that supplies the coffee beans. This is difficult to evaluate but a first step would be to look into what coffee shops are part of a franchise and how many coffee roasteries are in the general area.





## Conclusion

The analysis gave a first insight into which city is most probable to have a good customer base for a coffee roastery. Further analysis should be done to see where the coffee roastery could be physically based, i.e. in the city centre or further outside the city. This would have a large effect on the running costs.

As the names of the coffee shops are available a survey should be done to see if there is a need for a different supplier of coffee beans. This is the proposed next step in finding the acceptance by the potential customers