# Coursera Captstone Project: Battle of the Neighbourhoods
## Opening a coffee shop in Toronto


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

##Introduction <a name="introduction"></a>

Opening a coffee shop in a large city like Toronto can be challenging as there are many factors to consider, such as location, target consumer etc. This can be made especially if you are not familiar with the area. 

We are a large chain of coffee shops looking to break into the Toronto market. We want to start by buying five coffee shops in the downtown area of Toronto. However, in order to decide where these first five shops should be placed we need information. Our strategy will be to find "hotspots" for coffee shops in  Toronto. Our hypothesis is basic supply and demand, i.e. that areas with a lot of coffee consumers have a lot of coffee shops. By finding these areas with a cluster of coffee shops, we can determine where to open our first five locations in order to reach the maximum number of customers as quickly as possible.

The target audience for this is myself and anyone else who wants to open a coffee shop in an area known for coffee. 

## Data <a name="data"></a>

Using Folium and a list of Toronto neighbourhoods available from wikipedia, I will create a map of the Toronto neighbourhoods considered to be in the downtown area.

Foursquare provides location information on venues, including coffeeshops. I will use the venues found from frousquare and produce a clustering analysis to determine "hotspots" for coffee shops. 

##Toronto downtown Neighbourhoods

The first step is to import the relevant libraries

In [1]:
#import relevant libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
!pip3 install lxml
print('Libraries imported.')

Libraries imported.


Next, get the data on Toronto neighbourhoods from wikipedia. And select only the data for downtown neighbourhoods.

In [2]:
#Scrape information for Toronto neighbourhoods
df=pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [3]:
#downtown boroughs only
df_cleanDT = df[df['Borough'] == 'Downtown Toronto']
df_cleanDT.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
13,M5B,Downtown Toronto,"Garden District, Ryerson"
22,M5C,Downtown Toronto,St. James Town
31,M5E,Downtown Toronto,Berczy Park


Add some longitude and latitude coordinates for the neighbourhoods to use with our map later on.

In [4]:
#import the coordinates
LongLat = pd.read_csv("https://cocl.us/Geospatial_data")

full_dfDT = pd.merge(df_cleanDT, LongLat, on = 'Postal Code')
full_dfDT.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


Now let's map out the neighbourhoods

In [5]:
#Get coordinates for Toronto
address = 'Toronto, ON'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [6]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood in zip(full_dfDT['Latitude'], full_dfDT['Longitude'], full_dfDT['Borough'], full_dfDT['Neighbourhood']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

##Foursquare location data

Now we need the Foursquare data

In [7]:
#import relevant libraries
import requests # library to handle requests

import random # library for random number generation

!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


In [8]:
# define Foursquare credentials
CLIENT_ID = '*************************' # your Foursquare ID
CLIENT_SECRET = '*************************' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 50

In [86]:
#define the area to search in
address = '190 Yonge st, Toronto, Canada'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.652537 -79.379427


In [87]:
#search for results
search_query = 'coffee'
radius = 1000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

  # This is added back by InteractiveShellApp.init_path()


Here we can see the first few examples of coffee shops in the downtown area.

In [88]:
dataframe_filtered.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,id
0,HotBlack Coffee,Coffee Shop,245 Queen Street West,at St Patrick St,43.650364,-79.388669,"[{'label': 'display', 'lat': 43.65036434800487...",782,M5V 1Z4,CA,Entertainment District,Toronto,ON,Canada,"[245 Queen Street West (at St Patrick St), Tor...",59f784dd28122f14f9d5d63d
1,Timothy's World Coffee,Coffee Shop,401 Bay St.,at Richmond St. W,43.652135,-79.381172,"[{'label': 'display', 'lat': 43.65213455850074...",147,M5H 2Y4,CA,,Toronto,ON,Canada,"[401 Bay St. (at Richmond St. W), Toronto ON M...",4baa9f6cf964a520817a3ae3
2,Fahrenheit Coffee,Coffee Shop,120 Lombard St,at Jarvis St,43.652384,-79.372719,"[{'label': 'display', 'lat': 43.65238358726612...",540,M5C 3H5,CA,,Toronto,ON,Canada,"[120 Lombard St (at Jarvis St), Toronto ON M5C...",4fff1f96e4b042ae8acddca5
3,Timothy's World Coffee,Coffee Shop,"483 Bay St, Bell Trinity Square",Bell Trinity Square,43.653436,-79.382314,"[{'label': 'display', 'lat': 43.653436, 'lng':...",253,M5G 2C9,CA,,Toronto,ON,Canada,"[483 Bay St, Bell Trinity Square (Bell Trinity...",4b0aaa8ef964a520272623e3
4,Timothy's World Coffee,Coffee Shop,427 University Avenue,,43.654053,-79.38809,"[{'label': 'display', 'lat': 43.65405317976302...",717,,CA,,Toronto,ON,Canada,"[427 University Avenue, Toronto ON, Canada]",4b44fc77f964a520cc0026e3


# Methodology <a name="methodology"></a>

Now that we have obtained the data we are going to use. The first step is to find out which three neighourhoods (or whatever makes sense) have the most shops to narrow down the search.
Pick a spot and map out the venues. 

The second step is to investigate in those areas, where the shops are located and create a heat map. 

Finally, we will use Kmeans clustering to create clusters and determine 3 cluster centres to find possible locations and see how it overlaps with the heatmap

#Analysis <a name="analysis"></a>

###Step 1: Which neighbourhoods have the most coffee shops?

First let's find the venues in our downtown neighbourhoods

In [12]:
# get venues in all of the neighbourhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
toronto_venues = getNearbyVenues(names=full_dfDT['Neighbourhood'],
                                   latitudes=full_dfDT['Latitude'],
                                   longitudes=full_dfDT['Longitude']
)

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [14]:
print(toronto_venues.shape)
toronto_venues.head()

(784, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


Now let's find out which neighbourhoods have the most coffee shops and donut shops

In [96]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

#Take out just the venues we are interested in
Cafes_Coffee = toronto_onehot[['Neighborhood', 'Coffee Shop', 'Donut Shop']]
#sum up the donut and coffee shops
Cafes_Coffee['Total']= Cafes_Coffee.iloc[:, -2:-1].sum(axis=1)
#group by neighbourhood and sort values
toronto_grouped = Cafes_Coffee.groupby('Neighborhood').mean().reset_index()
toronto_grouped.sort_values(by=['Total'], ascending=False, inplace = True)
toronto_grouped


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


Unnamed: 0,Neighborhood,Coffee Shop,Donut Shop,Total
10,"Queen's Park, Ontario Provincial Government",0.2,0.0,0.2
2,Central Bay Street,0.18,0.02,0.18
11,"Regent Park, Harbourfront",0.155556,0.0,0.155556
17,"Toronto Dominion Centre, Design Exchange",0.1,0.0,0.1
6,"First Canadian Place, Underground city",0.1,0.0,0.1
0,Berczy Park,0.08,0.0,0.08
12,"Richmond, Adelaide, King",0.08,0.0,0.08
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.071429,0.0,0.071429
15,"St. James Town, Cabbagetown",0.069767,0.0,0.069767
3,Christie,0.0625,0.0,0.0625


Let's see get the coordinates for the top 5 neighbourhoods

In [97]:
coffeeshops_top5 = toronto_grouped.head()
coffeeshops_top5.rename(columns = {'Neighborhood' : 'Neighbourhood'}, inplace = True)
top5_loc = pd.merge(coffeeshops_top5, full_dfDT, on = 'Neighbourhood')
top5_loc.insert(0, "Rank", [1, 2, 3, 4, 5],  True)
top5_loc


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Rank,Neighbourhood,Coffee Shop,Donut Shop,Total,Postal Code,Borough,Latitude,Longitude
0,1,"Queen's Park, Ontario Provincial Government",0.2,0.0,0.2,M7A,Downtown Toronto,43.662301,-79.389494
1,2,Central Bay Street,0.18,0.02,0.18,M5G,Downtown Toronto,43.657952,-79.387383
2,3,"Regent Park, Harbourfront",0.155556,0.0,0.155556,M5A,Downtown Toronto,43.65426,-79.360636
3,4,"Toronto Dominion Centre, Design Exchange",0.1,0.0,0.1,M5K,Downtown Toronto,43.647177,-79.381576
4,5,"First Canadian Place, Underground city",0.1,0.0,0.1,M5X,Downtown Toronto,43.648429,-79.38228


In [85]:
# create map of Toronto using latitude and longitude values
map_toronto2 = folium.Map(location=[latitude, longitude], zoom_start=15)

centreLon = top5_loc['Longitude'].mean()
centreLat = top5_loc['Latitude'].mean()
# add a red circle marker to represent the centre of the places
folium.CircleMarker(
    [centreLat, centreLon],
    radius=10,
    color='red',
    popup='Centre point',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(map_toronto2)

# add markers to map
for lat, lng, borough, neighborhood in zip(top5_loc['Latitude'], top5_loc['Longitude'], top5_loc['Rank'], top5_loc['Neighbourhood']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto2)  
    
map_toronto2

###Step 2: Now that we have determined where to look, let's take the venue data (pulled in the data section, from the center point found above) and plot the coffee shops on a map

In [91]:
#get a list of the longitude and latitude for the venues
venues_loc = dataframe_filtered[['lat', 'lng']]

#map out the venues
venues_map = folium.Map(location=[latitude, longitude], zoom_start=14) # generate map centred around the "Downtown address"

# add a red circle marker to represent the Center point
folium.CircleMarker(
    [centreLat, centreLon],
    radius=10,
    color='red',
    popup='Centre point',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the coffee shops as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

Now let's make that a heatmap

In [92]:
import folium
from folium import plugins
from folium.plugins import HeatMap
venues_heatmap = folium.Map(location=[latitude, longitude], zoom_start=14) # generate map centred around the "Downtown address"

locs = venues_loc.to_numpy()
HeatMap(locs).add_to(venues_heatmap)
venues_heatmap


###Step 3: Let's use K means clustering to determine the 3 clusters and plot these centres on the map

First we will find the cluster centers

In [98]:
#!pip install pyproj
import matplotlib.pyplot as plt

import pyproj
import math

X = venues_loc['lat']
y =  venues_loc['lng']

k_means = KMeans(init = "k-means++", n_clusters = 5, n_init = 12)
k_means.fit(venues_loc)
k_means_labels = k_means.labels_
k_means_labels
k_means_cluster_centers = k_means.cluster_centers_

cluster_centers2 = pd.DataFrame.from_records(k_means_cluster_centers)
cluster_centers3 = cluster_centers2.rename(columns={ 0 : "lat", 1 : "lng"})
cluster_centers3

Unnamed: 0,lat,lng
0,43.647835,-79.382474
1,43.659473,-79.385828
2,43.651841,-79.374032
3,43.652755,-79.388139
4,43.654517,-79.381211


Now let's add those cluster centers to our map

In [99]:
venues_map2 = folium.Map(location=[latitude, longitude], zoom_start=14) # generate map centred around the "Downtown address"

for lat, lng in zip(cluster_centers3.lat, cluster_centers3.lng):
    folium.CircleMarker(
        [lat, lng], 
        radius=10, 
        color='red', 
        fill=True, 
        fill_opacity=0.6
      ).add_to(venues_map2) 

for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.3
    ).add_to(venues_map2)

HeatMap(locs).add_to(venues_map2)
venues_map2
    

#Results and Discussion <a name="results"></a>

We first determined that the five neighbourhoods with the most coffee shops were Queen's Park, Central Bay Street, Regent Park, TD centre and First Canadian Place. 
We then found the centre point of those areas and used the foursquare API to pull venues from that area. By plotting the venues on a heatmap, we were able to see some potential locations.
To go about this a bit more analytically, we used K means clustering to find five cluster centres to narrow down where to open our coffee shops. We found that they should be located around 110 Lombard St, the Eaton Centre, King St and Bay St, 380 University Ave and 760 Bay St. 
Unsuprisingly, the cluster centres were close to the heat map hotspots.

#Conclusion <a name="conclusion"></a>

Our goal was to find five potential locations for our new coffee shops. In order to identify these locations, we found areas that already had a large number of coffee shops as these should also be the most popular areas for coffee consumers. By mapping out locations and using clustering analysis we have identified our five potential locations. 