# Introduction - The Coffee Shop Problem (Toronto)

One of the most common establishments that can be found within any neighborhood of the city of Toronto is coffee shops. Although this shows there is a great market for this type of establishment, it also shows that competition would be fierce for a new business.

The problem is as follows, you are approached by an entrepeneur interested in starting their first small coffee shop. They are asking you to leverage available data to consult them on the most reasonable location within the city of Toronto to start their business. Key factors to take into account are population, and the presence of competition. Assuming cost of operation due to the location is not a concern, where should the entrepeneur open their coffee shop?

The intended audience for this report are entrepeneurs potentials hoping to make the same decision as the hypthetical one mentioned above, using the analysis provided to help their decision. Secondary to this, other data scientists and the curious may also find this report useful as an insight into the business demographics of neighborhoods within their city.

# Data - Wikipedia, Stat Canada and Foursquare API

This analysis will be performed with the list of postal codes of Toronto provided by wikipedia here:
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

The 2016 Canadian census data on forwards sortation areas provided here (population by postal code):
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/comprehensive.cfm

The foursquare API will also be used in order to makes sense of the postal codes provided by wikipedia by accessing nearby venues within each neighborhood in order to identify competition in the area.

The above data and API will be used to gether to create a map visualizing viable neighborhoods in toronto and their potential value as a location for a new coffee shop.

## 1 - Package Imports and Webscraping

### 1.1 - Package imports

In [2]:
from bs4 import BeautifulSoup
import requests

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print("Libraries Imported!")

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries Imported!


### 1.2 - Initialize a dataframe by webscraping the                              wikipedia page for neighborhoods in Toronto.

In [21]:
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'lxml')

table_contents = []
table = soup.find('table')

for cell in table.findAll('td'):
    
    content = {}
    
    if cell.span.text == "Not assigned":
        pass
    else:
        content['Postal'] = cell.p.text[:3]
        content['Borough'] = (cell.span.text).split("(")[0]
        content['Neighborhood'] = (((((cell.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(content)
        
df = pd.DataFrame(table_contents)
df['Borough'] = df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

df.head()

Unnamed: 0,Postal,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


### 1.3 - Import Geospatial Coordinates and create master dataframe

In [22]:
df_geo_coor = pd.read_csv("./Geospatial_Coordinates.csv")
df_geo_coor.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the dataframes into a master frame

In [24]:
df = pd.merge(df, df_geo_coor, how='left', left_on='Postal', right_on='Postal Code')
df.drop("Postal Code", axis=1, inplace=True)
df.head()

Unnamed: 0,Postal,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


## 2 - Preparation With Geopy and Folium

### 2.1 - Initializing

In [25]:
address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Toronto city are', latitude, longitude)

The coordinates of Toronto city are 43.6534817 -79.3839347


### 2.2 - Mapping Toronto Neighborhoods With Folium

In [None]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_toronto_final['Latitude'], df_toronto_final['Longitude'], df_toronto_final['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto