# Analyzing Coffee Shops in Vancouver, Canada

### Introduction / Business Problem

Vancouver, Canada is the considered one of the most livable cities in the world thanks to the weather. It is also the largest city in Western Canada, third largest metropolitan in Canada after Toronto and Montreal. People in Vancouver love coffee, and they even have lots of local coffee franchise including Waves Coffee and Blenz Coffee, but the demand for good coffee shops is never fulfilled. We are looking for a location for our new coffee shop, preferably at a neighbourhood with less competitors. We will analyze and cluster the data to find out the best location for our new business.

### Data

Data of this project are consisted with the following:

1. Neighborhood names of City of Vancouver
2. Geocoder library to find the latitude and longitude of each neighborhood
3. Foursqaure location data to find venues in each neighborhood

### Importing Libraries

In [65]:
import numpy as np
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup

### Scrap City of Vacouver Government Website for List of Neighborhoods

In [56]:
url = 'https://vancouver.ca/news-calendar/areas-of-the-city.aspx'

html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")

neighborhood_list = soup.find("div", {"role": "navigation"}).find_all('ul')[1].find_all('a')
neighborhood_list

[<a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/arbutus-ridge.aspx?">Arbutus Ridge</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/downtown.aspx?">Downtown</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/dunbar-southlands.aspx?">Dunbar-Southlands</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/fairview.aspx?">Fairview</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/grandview-woodland.aspx?">Grandview-Woodland   </a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1">

In [60]:
# Create Pandas dataframe

column_names = ['Neighborhood']

df = pd.DataFrame(columns = column_names)

for i in neighborhood_list:
    if (i.text != ''):
        df = df.append({'Neighborhood': i.text}, ignore_index=True)

df

Unnamed: 0,Neighborhood
0,Arbutus Ridge
1,Downtown
2,Dunbar-Southlands
3,Fairview
4,Grandview-Woodland
5,Hastings-Sunrise
6,Kensington-Cedar Cottage
7,Kerrisdale
8,Killarney
9,Kitsilano


### Get Location Data for Each Neighborhood

In [70]:
geolocator = Nominatim(user_agent="foursquare_agent")

for index, row in df.iterrows():
    neighborhood = row['Neighborhood']

    location = geolocator.geocode('{}, Vancouver, British Columbia'.format(neighborhood))
    latitude = location.latitude
    longitude = location.longitude
    
    df.at[index, 'Latitude'] = latitude
    df.at[index, 'Longitude'] = longitude
    
df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Arbutus Ridge,49.240968,-123.167001
1,Downtown,49.283393,-123.117456
2,Dunbar-Southlands,49.25346,-123.185044
3,Fairview,49.264113,-123.126835
4,Grandview-Woodland,49.270559,-123.067942
5,Hastings-Sunrise,49.277594,-123.04392
6,Kensington-Cedar Cottage,49.247632,-123.084207
7,Kerrisdale,49.234673,-123.155389
8,Killarney,49.224274,-123.04625
9,Kitsilano,49.26941,-123.155267


In [71]:
print('The dataframe has {} neighborhoods.'.format(
        len(df['Neighborhood'].unique()),
        df.shape[0]
    )
)

The dataframe has 22 neighborhoods.


### Use geopy library to get the latitude and longitude values of Vancouver.

In [72]:
address = 'Vancouver, BC'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vancouver are 49.2608724, -123.1139529.


### Create a map of Vancouver with neighborhoods superimposed on top.

In [79]:
# create map of Vancouver using latitude and longitude values
map_van = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = neighborhood
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_van)  
    
map_van

### Utilizing the Foursquare API to Explore and Segment the Neighborhoods

In [80]:
CLIENT_ID = 'SDJSOB2DM4ZOIV1EUBQRBIA5JDDXC2FCGYGDS3P1NDGTXZBN' # your Foursquare ID
CLIENT_SECRET = '3E4TROMNVJWCRCHMD1FGA3PZYZZMHMAMJAOR1FCXOOJZ3AML' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SDJSOB2DM4ZOIV1EUBQRBIA5JDDXC2FCGYGDS3P1NDGTXZBN
CLIENT_SECRET:3E4TROMNVJWCRCHMD1FGA3PZYZZMHMAMJAOR1FCXOOJZ3AML


### Get the top 100 venues within 500 meters of every neighborhood.

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)