# Analyzing Coffee Shop Locations in Vancouver, Canada

### Introduction / Business Problem

Vancouver, Canada is the considered one of the most livable cities in the world thanks to the weather. It is also the largest city in Western Canada, third largest metropolitan in Canada after Toronto and Montreal. People in Vancouver love coffee, and they even have lots of local coffee franchise including Waves Coffee and Blenz Coffee, but the demand for good coffee shops is never fulfilled. We are looking for a location for our new coffee shop, preferably at a neighbourhood with less competitors. We will analyze and cluster the data to find out the best location for our new business.

### Data

Data of this project includes the following:

1. Neighborhood names of City of Vancouver
2. Geopy library to find the latitude and longitude of each neighborhood
3. Foursqaure location data to find venues in each neighborhood

### Importing Libraries

In [354]:
import numpy as np
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup

### Scrap City of Vacouver Government Website for List of Neighborhoods

In [355]:
url = 'https://vancouver.ca/news-calendar/areas-of-the-city.aspx'

html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")

neighborhood_list = soup.find("div", {"role": "navigation"}).find_all('ul')[1].find_all('a')
neighborhood_list

[<a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/arbutus-ridge.aspx?">Arbutus Ridge</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/downtown.aspx?">Downtown</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/dunbar-southlands.aspx?">Dunbar-Southlands</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/fairview.aspx?">Fairview</a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1"><div></div></a>,
 <a class="menulink" href="/news-calendar/grandview-woodland.aspx?">Grandview-Woodland   </a>,
 <a href="#" onclick="window.location = $(this).next().attr('href');" tabindex="-1">

In [356]:
# Create Pandas dataframe

column_names = ['Neighborhood']

df = pd.DataFrame(columns = column_names)

for i in neighborhood_list:
    if (i.text != ''):
        df = df.append({'Neighborhood': i.text}, ignore_index=True)

df

Unnamed: 0,Neighborhood
0,Arbutus Ridge
1,Downtown
2,Dunbar-Southlands
3,Fairview
4,Grandview-Woodland
5,Hastings-Sunrise
6,Kensington-Cedar Cottage
7,Kerrisdale
8,Killarney
9,Kitsilano


### Get Location Data for Each Neighborhood

In [357]:
geolocator = Nominatim(user_agent="foursquare_agent")

for index, row in df.iterrows():
    neighborhood = row['Neighborhood']

    location = geolocator.geocode('{}, Vancouver, British Columbia'.format(neighborhood))
    latitude = location.latitude
    longitude = location.longitude
    
    df.at[index, 'Latitude'] = latitude
    df.at[index, 'Longitude'] = longitude
    
df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Arbutus Ridge,49.240968,-123.167001
1,Downtown,49.283393,-123.117456
2,Dunbar-Southlands,49.25346,-123.185044
3,Fairview,49.264113,-123.126835
4,Grandview-Woodland,49.270559,-123.067942
5,Hastings-Sunrise,49.277594,-123.04392
6,Kensington-Cedar Cottage,49.247632,-123.084207
7,Kerrisdale,49.234673,-123.155389
8,Killarney,49.224274,-123.04625
9,Kitsilano,49.26941,-123.155267


In [358]:
print('The dataframe has {} neighborhoods.'.format(
        len(df['Neighborhood'].unique()),
        df.shape[0]
    )
)

The dataframe has 22 neighborhoods.


### Use geopy library to get the latitude and longitude values of Vancouver.

In [359]:
address = 'Vancouver, BC'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vancouver are 49.2608724, -123.1139529.


### Create a map of Vancouver with neighborhoods superimposed on top.

In [360]:
# create map of Vancouver using latitude and longitude values
map_van = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = neighborhood
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_van)  
    
map_van

### Utilizing the Foursquare API to Explore and Segment the Neighborhoods

In [361]:
CLIENT_ID = 'SDJSOB2DM4ZOIV1EUBQRBIA5JDDXC2FCGYGDS3P1NDGTXZBN' # your Foursquare ID
CLIENT_SECRET = '3E4TROMNVJWCRCHMD1FGA3PZYZZMHMAMJAOR1FCXOOJZ3AML' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SDJSOB2DM4ZOIV1EUBQRBIA5JDDXC2FCGYGDS3P1NDGTXZBN
CLIENT_SECRET:3E4TROMNVJWCRCHMD1FGA3PZYZZMHMAMJAOR1FCXOOJZ3AML


### Get the top 100 venues within 500 meters of every neighborhood.

In [362]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [363]:
vancouver_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Arbutus Ridge
Downtown
Dunbar-Southlands
Fairview
Grandview-Woodland   
Hastings-Sunrise
Kensington-Cedar Cottage
Kerrisdale
Killarney
Kitsilano
Marpole
Mount Pleasant
Oakridge
Renfrew-Collingwood
Riley Park
Shaughnessy
South Cambie  
Strathcona
Sunset
Victoria-Fraserview
West End
West Point Grey


In [364]:
print(vancouver_venues.shape)
vancouver_venues.head()

(674, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Arbutus Ridge,49.240968,-123.167001,Butter Baked Goods,49.242209,-123.170381,Bakery
1,Arbutus Ridge,49.240968,-123.167001,The Haven,49.241377,-123.166331,Spa
2,Arbutus Ridge,49.240968,-123.167001,Barktholomews Pet Supplies,49.242746,-123.170193,Pet Store
3,Arbutus Ridge,49.240968,-123.167001,The Dragon's Layer,49.238518,-123.169029,Nightlife Spot
4,Arbutus Ridge,49.240968,-123.167001,The Heights Market,49.237902,-123.170949,Grocery Store


From above, we can see that 647 venues are in our dataframe.

We can check how many venues are returned for each neighborhood.

In [365]:
venue_count = vancouver_venues[['Neighborhood', 'Venue']].groupby('Neighborhood').count()
venue_count.rename(columns={"Venue": "Venue Count"}, inplace=True)

venue_count

Unnamed: 0_level_0,Venue Count
Neighborhood,Unnamed: 1_level_1
Arbutus Ridge,5
Downtown,100
Dunbar-Southlands,6
Fairview,27
Grandview-Woodland,70
Hastings-Sunrise,14
Kensington-Cedar Cottage,23
Kerrisdale,39
Killarney,4
Kitsilano,48


We can see neighborhoods like Arbutus Ridge and Shaughnessy are mostly residential with few venues. For these neighborhoods, commercial real estate is limited. We should drop neighborhoods with 15 or less venues for clustering.

In [366]:
# Make neighborhoods with less than or equal to 15 venues to list
neigh_with_few_venues = venue_count[venue_count['Venue Count'] <= 15].index.tolist()

neigh_with_few_venues

['Arbutus Ridge',
 'Dunbar-Southlands',
 'Hastings-Sunrise',
 'Killarney',
 'Oakridge',
 'Shaughnessy',
 'Sunset',
 'Victoria-Fraserview']

In [367]:
# Drop the listed neighborhoods from our dataframe
vancouver_venues = vancouver_venues[~vancouver_venues['Neighborhood'].isin(neigh_with_few_venues)]

vancouver_venues.shape

(626, 7)

Now we only have 626 venues for our analyzation.

##### To find out how many unique categories can be curated from the returned values:

In [368]:
print('There are {} uniques categories.'.format(len(vancouver_venues['Venue Category'].unique())))

There are 153 uniques categories.


### Analyze Each Neighborhood

In [369]:
# one hot encoding
vancouver_onehot = pd.get_dummies(vancouver_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vancouver_onehot['Neighborhood'] = vancouver_venues['Neighborhood'] 

# move neighborhood column to the first column
nei = vancouver_onehot['Neighborhood']
vancouver_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
vancouver_onehot.insert(0, 'Neighborhood', nei)

vancouver_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfall,Wine Shop,Women's Store,Yoga Studio
5,Downtown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Downtown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Downtown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Downtown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Downtown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


##### Group by Neighborhood and take the mean of the frequency of occurence of each category

In [370]:
vancouver_grouped = vancouver_onehot.groupby('Neighborhood').mean().reset_index()
vancouver_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfall,Wine Shop,Women's Store,Yoga Studio
0,Downtown,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.01,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.01
1,Fairview,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0
2,Grandview-Woodland,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.028571,...,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.014286,0.0,0.0
3,Kensington-Cedar Cottage,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0
4,Kerrisdale,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,...,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0


##### Print each neighborhood along with the top 5 most common venues

In [371]:
num_top_venues = 5

for hood in vancouver_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = vancouver_grouped[vancouver_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Downtown----
            venue  freq
0           Hotel  0.09
1            Café  0.05
2     Coffee Shop  0.05
3      Restaurant  0.04
4  Sandwich Place  0.03


----Fairview----
               venue  freq
0        Coffee Shop  0.15
1   Asian Restaurant  0.07
2               Park  0.07
3  Indian Restaurant  0.04
4              Diner  0.04


----Grandview-Woodland   ----
                venue  freq
0         Coffee Shop  0.10
1  Italian Restaurant  0.06
2         Pizza Place  0.04
3                Café  0.04
4   Indian Restaurant  0.04


----Kensington-Cedar Cottage----
                   venue  freq
0               Bus Stop  0.13
1            Coffee Shop  0.13
2     Chinese Restaurant  0.13
3  Vietnamese Restaurant  0.09
4       Greek Restaurant  0.04


----Kerrisdale----
                venue  freq
0         Coffee Shop  0.10
1  Chinese Restaurant  0.08
2      Sandwich Place  0.05
3    Sushi Restaurant  0.05
4            Tea Room  0.05


----Kitsilano----
                 venue  freq

##### Put this into a pandas dataframe

In [372]:
# Sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues of each neighborhood.

In [373]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = vancouver_grouped['Neighborhood']

for ind in np.arange(vancouver_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vancouver_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown,Hotel,Coffee Shop,Café,Restaurant,Steakhouse,Concert Hall,Seafood Restaurant,Sandwich Place,Dessert Shop,Food Truck
1,Fairview,Coffee Shop,Park,Asian Restaurant,Diner,Japanese Restaurant,Sandwich Place,Malay Restaurant,Restaurant,Falafel Restaurant,Camera Store
2,Grandview-Woodland,Coffee Shop,Italian Restaurant,Café,Pizza Place,Park,Indian Restaurant,Sushi Restaurant,Japanese Restaurant,Bakery,Burger Joint
3,Kensington-Cedar Cottage,Bus Stop,Coffee Shop,Chinese Restaurant,Vietnamese Restaurant,American Restaurant,Ice Cream Shop,Greek Restaurant,Liquor Store,Malay Restaurant,Filipino Restaurant
4,Kerrisdale,Coffee Shop,Chinese Restaurant,Sandwich Place,Sushi Restaurant,Pharmacy,Tea Room,Italian Restaurant,Bubble Tea Shop,Business Service,Café


### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 3 clusters.

In [374]:
# set number of clusters
kclusters = 3

vancouver_grouped_clustering = vancouver_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vancouver_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 2, 0, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [375]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

vancouver_merged = df[~df['Neighborhood'].isin(neigh_with_few_venues)]

# merge vancouver_grouped with vancouver_data to add latitude/longitude for each neighborhood
vancouver_merged = vancouver_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

vancouver_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown,49.283393,-123.117456,0,Hotel,Coffee Shop,Café,Restaurant,Steakhouse,Concert Hall,Seafood Restaurant,Sandwich Place,Dessert Shop,Food Truck
3,Fairview,49.264113,-123.126835,0,Coffee Shop,Park,Asian Restaurant,Diner,Japanese Restaurant,Sandwich Place,Malay Restaurant,Restaurant,Falafel Restaurant,Camera Store
4,Grandview-Woodland,49.270559,-123.067942,0,Coffee Shop,Italian Restaurant,Café,Pizza Place,Park,Indian Restaurant,Sushi Restaurant,Japanese Restaurant,Bakery,Burger Joint
6,Kensington-Cedar Cottage,49.247632,-123.084207,1,Bus Stop,Coffee Shop,Chinese Restaurant,Vietnamese Restaurant,American Restaurant,Ice Cream Shop,Greek Restaurant,Liquor Store,Malay Restaurant,Filipino Restaurant
7,Kerrisdale,49.234673,-123.155389,0,Coffee Shop,Chinese Restaurant,Sandwich Place,Sushi Restaurant,Pharmacy,Tea Room,Italian Restaurant,Bubble Tea Shop,Business Service,Café
9,Kitsilano,49.26941,-123.155267,0,Bakery,American Restaurant,Coffee Shop,Ice Cream Shop,French Restaurant,Food Truck,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Performing Arts Venue
10,Marpole,49.209223,-123.13615,1,Sushi Restaurant,Dessert Shop,Vietnamese Restaurant,Pizza Place,Chinese Restaurant,Bus Stop,Japanese Restaurant,Plaza,Sandwich Place,Bubble Tea Shop
11,Mount Pleasant,49.26333,-123.096588,0,Coffee Shop,Diner,Sandwich Place,Breakfast Spot,Sushi Restaurant,Lounge,Thrift / Vintage Store,Brewery,Indian Restaurant,Arts & Crafts Store
13,Renfrew-Collingwood,49.242024,-123.057679,1,Vietnamese Restaurant,Chinese Restaurant,Park,Bank,Shanghai Restaurant,Food,Café,Fried Chicken Joint,Cantonese Restaurant,Gas Station
14,Riley Park,49.247438,-123.102966,0,Japanese Restaurant,Restaurant,Café,Coffee Shop,Arts & Crafts Store,Vegetarian / Vegan Restaurant,Thai Restaurant,Lounge,Pub,Sushi Restaurant


##### Lets visualize the resulting clusters

In [376]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vancouver_merged['Latitude'], vancouver_merged['Longitude'], vancouver_merged['Neighborhood'], vancouver_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

##### Cluster 0

Traditional Canadian neighborhoods where coffee shops, cafe, and bakery are popular.

In [377]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 0, vancouver_merged.columns[[0] + list(range(4, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown,Hotel,Coffee Shop,Café,Restaurant,Steakhouse,Concert Hall,Seafood Restaurant,Sandwich Place,Dessert Shop,Food Truck
3,Fairview,Coffee Shop,Park,Asian Restaurant,Diner,Japanese Restaurant,Sandwich Place,Malay Restaurant,Restaurant,Falafel Restaurant,Camera Store
4,Grandview-Woodland,Coffee Shop,Italian Restaurant,Café,Pizza Place,Park,Indian Restaurant,Sushi Restaurant,Japanese Restaurant,Bakery,Burger Joint
7,Kerrisdale,Coffee Shop,Chinese Restaurant,Sandwich Place,Sushi Restaurant,Pharmacy,Tea Room,Italian Restaurant,Bubble Tea Shop,Business Service,Café
9,Kitsilano,Bakery,American Restaurant,Coffee Shop,Ice Cream Shop,French Restaurant,Food Truck,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Performing Arts Venue
11,Mount Pleasant,Coffee Shop,Diner,Sandwich Place,Breakfast Spot,Sushi Restaurant,Lounge,Thrift / Vintage Store,Brewery,Indian Restaurant,Arts & Crafts Store
14,Riley Park,Japanese Restaurant,Restaurant,Café,Coffee Shop,Arts & Crafts Store,Vegetarian / Vegan Restaurant,Thai Restaurant,Lounge,Pub,Sushi Restaurant
17,Strathcona,Coffee Shop,Sandwich Place,Park,Soup Place,Pub,Café,Vietnamese Restaurant,Food Truck,Brewery,Restaurant
20,West End,Bakery,Japanese Restaurant,Greek Restaurant,Coffee Shop,Dessert Shop,American Restaurant,Farmers Market,Restaurant,Park,Indian Restaurant
21,West Point Grey,Coffee Shop,Japanese Restaurant,Café,Bus Station,Bookstore,Sporting Goods Shop,Pub,Sushi Restaurant,Vegetarian / Vegan Restaurant,Pizza Place


##### Cluser 1

Coffee shops in this area is extremely popular. 28% of the venues in this area belongs to coffee shops.

In [379]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 2, vancouver_merged.columns[[0] + list(range(4, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,South Cambie,Coffee Shop,Bus Stop,Sushi Restaurant,Park,Shopping Mall,Vietnamese Restaurant,Cafeteria,Café,Grocery Store,Malay Restaurant


##### Cluster 2

Neighborhoods where international food is popular. It is possible that large number of residents in this cluster are immigrants.

In [378]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 1, vancouver_merged.columns[[0] + list(range(4, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Kensington-Cedar Cottage,Bus Stop,Coffee Shop,Chinese Restaurant,Vietnamese Restaurant,American Restaurant,Ice Cream Shop,Greek Restaurant,Liquor Store,Malay Restaurant,Filipino Restaurant
10,Marpole,Sushi Restaurant,Dessert Shop,Vietnamese Restaurant,Pizza Place,Chinese Restaurant,Bus Stop,Japanese Restaurant,Plaza,Sandwich Place,Bubble Tea Shop
13,Renfrew-Collingwood,Vietnamese Restaurant,Chinese Restaurant,Park,Bank,Shanghai Restaurant,Food,Café,Fried Chicken Joint,Cantonese Restaurant,Gas Station


### Discussion

There is a strong competetion in cluster 1 about coffee shops. Although South Cambie only have 18 venues, 5 of them are coffee shops. What's more, the "shopping mall" venue in this area indicates the Oakridge Shopping Center, one of the largest shopping mall in the city. It is not easy to start a new business in this market.

Cluster 2 are international neighborhoods. We can see that large number of residents are probably from Asia, where coffee is not as popular as here in North America. Although there are not many competitors, the demand is relatively low in these neighborhoods.

Cluster 0 is the cluster we should look at. This cluster consists of many traditional Canadian neighborhood, where coffee is popular among residents. However, Kitsilano, Riley Park, and West End are neighborhoods with not enough coffee shops, and that means these neighborhoods have demand for coffee and less competitors. Let's look at these three neighborhoods.

##### Kitsilano, Riley Park, and West End

In [382]:
nei_list = ['Kitsilano', 'Riley Park', 'West End']

choices = vancouver_merged[vancouver_merged['Neighborhood'].isin(nei_list)]

choices

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Kitsilano,49.26941,-123.155267,0,Bakery,American Restaurant,Coffee Shop,Ice Cream Shop,French Restaurant,Food Truck,Thai Restaurant,Japanese Restaurant,Sushi Restaurant,Performing Arts Venue
14,Riley Park,49.247438,-123.102966,0,Japanese Restaurant,Restaurant,Café,Coffee Shop,Arts & Crafts Store,Vegetarian / Vegan Restaurant,Thai Restaurant,Lounge,Pub,Sushi Restaurant
20,West End,49.284131,-123.131795,0,Bakery,Japanese Restaurant,Greek Restaurant,Coffee Shop,Dessert Shop,American Restaurant,Farmers Market,Restaurant,Park,Indian Restaurant


In [386]:
venue_count[venue_count.index.isin(nei_list)]

Unnamed: 0_level_0,Venue Count
Neighborhood,Unnamed: 1_level_1
Kitsilano,48
Riley Park,56
West End,62


Let's plot the three neighborhoods in map.

In [387]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(choices['Latitude'], choices['Longitude'], choices['Neighborhood'], choices['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Three neighborhoods have similar amount of venues, which are comparable. From the map, we see West End and Kitsilano are on each side of English Bay. This areas are more interesting which attracts many tourists.

__Result:__ This analyzaion recommend to open new business in West End or Kitsilano.