# Capstone Project - The Battle of Neighbourhoods

#### Christy Chen

#### 2019 Septemeber

## 1. Introduction

### 1.1 Background

Company ABC (the "Company") is a China-based Express Delivery Company who started their business about 5 years ago. With the rapid expanding trade between Canada and China, they foresee the potential profitability to open an international branch in Canada. Currently the Express Delivery Market is mainly lead by big corporate (e.g. FedEx, Canada Post, etc.). Considering the high shipping cost for individual or small business orders, the company built a strategy to only focus on China – Canada delivery and provide a more friendly rate to attract customer with small orders from the market. The company plans to open the first business in Ontario, and then depends on how the P&Ls within next 5 years. The main reason they choose Ontario as for it's being Canada's leading manufacturing province, and also tourism contributes heavily to the economy.

### 1.2 Business Problem

Currently, the Company had determined to open a new international Express Delivery branch in GTA, Ontario, with main focus on providing products & services to both individual and small business owners, and also thinking offering door-to-door service that will attract customers that would require more assistance.
The Company now has two places in mind: Markham or Mississauga. They need to determine which City and where they should open the branch that will allow them to attract more business. The ideal site location should take the following requirement into consideration (if not all, as much as possible): 


a.	Enough space for truck loading / offloading 

b.	Inventory place which is secured and good for temperature-sensitive product

c.	Easily accessible to transportation options (e.g. near highway)

d.	Ideal no or minimum competing business nearby

e.	Parking space and public-transit friendly

f.	Noticeable store front for advertisement

g.	Cost of the rent


### 1.3 Interest Audience

The use of FourSquare API and analysis combined with data analysis will help resolve the key questions arisen. The final result will be presented to the key stakeholders in the company to help them determine whether they will proceed opening a new branch in the recommended area. Further adjustment could lead to next round of deeper analysis based on the feedback received from the board.

## 2. Data Preparation

To start performing the analysis, following data has been loaded:

### 2.1 CENSUS data for both Markham & Mississauga (2016)

A national census in Canada is conducted every five years by Statistics Canada, which provides demographic and statistical data to plan public services including health care, education, and transportation, determine federal transfer payments. Two csv files has been created for Markham and Mississauga 2016 CENSUS result, which will be read in order to create a dataframe. The csv file 'CENSUS_Markham.csv' and 'CENSUS_Mississauga.csv' has the following below data structure. The file will be directly read to the Jupiter Notebook for convenience and space savings.

#### 2.1.1 CENSUS_Markham.csv
The data is publicly accessible via: https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/page.cfm?Lang=E&Geo1=CSD&Code1=3519036&Geo2=PR&Code2=35&SearchText=Markham&SearchType=Begins&SearchPR=01&B1=All&GeoLevel=PR&GeoCode=3519036&TABID=1&type=0

In [19]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Category,Sub_Category,Total,Male,Female
0,Age characteristics,0 to 4 years,17085,8715.0,8370.0
1,Age characteristics,5 to 9 years,19085,9905.0,9180.0
2,Age characteristics,10 to 14 years,19220,9950.0,9270.0
3,Age characteristics,15 to 19 years,21095,10910.0,10185.0
4,Age characteristics,20 to 24 years,21455,11225.0,10230.0


#### 2.1.2 CENSUS_Mississauga.csv
The data is publicly accessible via: https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/page.cfm?Lang=E&Geo1=CSD&Code1=3521005&Geo2=PR&Code2=35&Data=Count&SearchText=Mississauga&SearchType=Begins&SearchPR=01&TABID=1&B1=All

In [2]:
body = client_a2a9f78cf9c84387af515fb9bfedbee0.get_object(Bucket='capstoneproject-donotdelete-pr-k5ambwispq6wtu',Key='CENSUS_Mississauga.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_miss = pd.read_csv(body)
df_miss.head()


Unnamed: 0,Category,Sub_Category,Total,Male,Female
0,Age characteristics,0 to 4 years,35460,17880.0,17585.0
1,Age characteristics,5 to 9 years,41485,21220.0,20270.0
2,Age characteristics,10 to 14 years,43980,22805.0,21180.0
3,Age characteristics,15 to 19 years,49205,25670.0,23535.0
4,Age characteristics,20 to 24 years,53645,27795.0,25850.0


### 2.2 Crime data for both Markham & Mississauga (2017)

I will use 2017 data to compare between Markham and Mississauga due to the data accessibility limitation. Two csv files has been created which will be read in order to create a dataframe. The csv file 'Crime_Markham.csv' and 'Crime_Mississauga.csv' has the following below data structure. The file will be directly read to the Jupiter Notebook for convenience and space savings.

#### 2.2.1 Crime_Markham.csv
The data is publicly accessible via: https://www.yrp.ca/en/about/statistical-reports.asp. For this analysis, I am using the "Markham" session in 2018 York Regional Police Statistical Report.

In [3]:

body = client_a2a9f78cf9c84387af515fb9bfedbee0.get_object(Bucket='capstoneproject-donotdelete-pr-k5ambwispq6wtu',Key='Crime_Markham.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_2 = pd.read_csv(body)
df_data_2.head()


Unnamed: 0,Category,2017Actual,2017Percent Cleared,"2017Rate Per 100,000 Population",2018Actual,2018Percent Cleared,"2018Rate Per 100,000 Population"
0,Crimes Against Persons,1598,76.4,447.97,1831,72.6,525.07
1,Violations Causing Death,1,200.0,0.28,2,50.0,0.57
2,Attempt Capital Crime,4,75.0,1.12,3,100.0,0.86
3,Sexual Violations,136,68.4,38.13,155,68.4,44.45
4,Commodification of Sexual Activity,33,90.9,9.25,15,86.7,4.3


#### 2.2.2 Crime_Mississauga.csv
The data is publicly accessible via: http://safecitymississauga.on.ca/wp-content/uploads/2019/02/2017-Safest-City-Report-1.pdf

In [4]:
body = client_a2a9f78cf9c84387af515fb9bfedbee0.get_object(Bucket='capstoneproject-donotdelete-pr-k5ambwispq6wtu',Key='Crime_Mississauga.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

crime_markham = pd.read_csv(body)
crime_markham.head()

Unnamed: 0,Category,2016Number,2016Percent Solved,"2016Rate per 100,000",2017Number,2017Percent Solved,"2017Rate per 100,000"
0,Crimes Against Persons,3524,0.759,466.1,3876,0.727,511.3
1,Homicide,6,0.5,0.8,9,1.0,1.2
2,Attempt Murder,15,0.467,2.0,9,0.667,1.2
3,Robbery - Total,423,0.414,56.0,480,0.381,63.3
4,Robbery - With Weapons,246,0.407,32.5,261,0.356,34.4


### 2.3 Geographical information

In [5]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!pip install folium
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [6]:
address = 'Markham, Ontario'

geolocator_markham = Nominatim()
location_markham = geolocator_markham.geocode(address)
latitude_markham = location_markham.latitude
longitude_markham = location_markham.longitude
print(latitude_markham,longitude_markham)

43.854336 -79.326782


  app.launch_new_instance()


In [7]:
address = 'Mississauga, Ontario'

geolocator_mississauga = Nominatim()
location_mississauga = geolocator_mississauga.geocode(address)
latitude_mississauga = location_mississauga.latitude
longitude_mississauga = location_mississauga.longitude
print(latitude_mississauga,longitude_mississauga)

  app.launch_new_instance()


43.590338 -79.645729


### 2.4 Neibourghbood information

In [8]:
!pip install geocoder

import numpy as np
import pandas as pd
import geocoder



In [9]:
body = client_a2a9f78cf9c84387af515fb9bfedbee0.get_object(Bucket='capstoneproject-donotdelete-pr-k5ambwispq6wtu',Key='PostalCode_Markham.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

pc_markham = pd.read_csv(body)
pc_markham.head()


Unnamed: 0,Postalcode,Borough,Neighbourhood
0,L6C,Markham,"Berczy Village, Cachet, Angus Glen"
1,L3S,Markham,Markham Southeast
2,L3R,Markham,Outer Southwest
3,L3P,Markham,Central
4,L6G,Markham,"Downtown Markham, Markham Centre"


In [10]:
def get_latlng1(postal_code):
    
    lat_lng_coords = None

    while(lat_lng_coords is None):

        lat_lng_coords = geocoder.arcgis('{}, Markham, Ontario'.format(postal_code)).latlng
    return lat_lng_coords

In [11]:
postal_codes = pc_markham['Postalcode']    
coords = [get_latlng1(postal_code) for postal_code in postal_codes.tolist() ]

In [12]:
df1 = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
pc_markham['Latitude'] = df1['Latitude']
pc_markham['Longitude'] = df1['Longitude']

In [13]:
pc_markham

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,L6C,Markham,"Berczy Village, Cachet, Angus Glen",43.904525,-79.339338
1,L3S,Markham,Markham Southeast,43.849285,-79.269179
2,L3R,Markham,Outer Southwest,43.83186,-79.328545
3,L3P,Markham,Central,43.929105,-79.273375
4,L6G,Markham,"Downtown Markham, Markham Centre",43.848375,-79.334748
5,L6E,Markham,Wismer Commons,43.900415,-79.266374


In [14]:
body = client_a2a9f78cf9c84387af515fb9bfedbee0.get_object(Bucket='capstoneproject-donotdelete-pr-k5ambwispq6wtu',Key='PostalCode_Mississauga.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

pc_mississauga = pd.read_csv(body)
pc_mississauga.head()


Unnamed: 0,Postalcode,Borough,Neighbourhood
0,L5A,Mississauga,"Mississauga Valley, East Cooksville"
1,L5B,Mississauga,"WestCooksville, Fairview, City Centre, EastCre..."
2,L5C,Mississauga,"WestCreditview, Mavis, Erindale"
3,L5E,Mississauga,Central Lakeview
4,L5G,Mississauga,"SWLakeview, Mineola, EastPort Credit"


In [15]:
def get_latlng2(postal_code):
    
    lat_lng_coords = None

    while(lat_lng_coords is None):

        lat_lng_coords = geocoder.arcgis('{}, Mississauga, Ontario'.format(postal_code)).latlng
    return lat_lng_coords

In [16]:
postal_codes = pc_mississauga['Postalcode']    
coords = [get_latlng2(postal_code) for postal_code in postal_codes.tolist() ]

In [17]:
df2 = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
pc_mississauga['Latitude'] = df2['Latitude']
pc_mississauga['Longitude'] = df2['Longitude']

In [18]:
pc_mississauga

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,L5A,Mississauga,"Mississauga Valley, East Cooksville",43.588405,-79.609455
1,L5B,Mississauga,"WestCooksville, Fairview, City Centre, EastCre...",43.578945,-79.632699
2,L5C,Mississauga,"WestCreditview, Mavis, Erindale",43.562932,-79.651714
3,L5E,Mississauga,Central Lakeview,43.58374,-79.56244
4,L5G,Mississauga,"SWLakeview, Mineola, EastPort Credit",43.56556,-79.583091
5,L5H,Mississauga,"West Port Credit, Lorne Park, EastSheridan",43.53663,-79.624784
6,L5J,Mississauga,"Clarkson, Southdown",43.508105,-79.631731
7,L5K,Mississauga,West Sheridan,43.52616,-79.661823
8,L5L,Mississauga,"Erin Mills, Western Business Park",43.535515,-79.692865
9,L5M,Mississauga,"Churchill Meadows, Central Erin Mills, SouthSt...",43.56002,-79.720542


### 2.5 FourSquare API data exploring

The foursquare location data will be explored in the Methdology section in details later.

### 2.6 How the data will be used to solve the problem

The data will be used as follow: First, compare the demographical data (using 2016 CENSUS data for Markham and Mississauga) as well as the Crime data, provide analysis and determine which city would be the recommended city to choose to open the new branch. Then based on the city selected, using foursquare and geopy data to explore the venue among neighbors.  Finally based on the result from last step, recommend the area the company to rent and as the new site location.