# The Battle of the Neighbourhoods Capstone Project

## Introduction / Business Problem

### Introduction

Retail has always been shaped by shrewd merchants with a proclivity for taking risks and choosing the right products at the right time. This has always been considered more of an art than a science. However, with the tools to leverage consumer data, “Winning decisions are increasingly driven by analytics more than instinct, experience, or merchant ‘art’. By leveraging smarter tools—those beyond backward-looking, “hind sighting” analysis—retailers can increasingly make forward-looking predictions that are quickly becoming the “table stakes” necessary to keep up”, says [Mckinsey](https://www.mckinsey.com/industries/retail/our-insights/how-analytics-and-digital-will-drive-next-generation-retail-merchandising).


In a rapidly evolving world where busy bustling cities are booming with opportunity, when opening a new business, it is of paramount importance to do your homework in order to decide if the location for consideration is going to prove a profitable exercise.  As a data scientist, it is my job to assist my clients in their decision-making process. 

### Business Problem

Lucy Finnigan has approached me with an idea to open a coffee shop / book store and has provided me with two possible locations of interest. 

Although coffee has proven to be a favourite beverage of early morning go getters world-wide, there is massive competition where franchise brands are concerned. Lucy is not buying into a franchise, so location is of utmost importance to insure she becomes the forefront of her game in her new business venture to avoid her business plan folding before she’s even established her brand. 

Lucy has her heart set on Alberta, Canada. It is now my task to leverage data from various sources in order to expose a gap in the market. 


## Data Section

[Geonames](https://www.geonames.org/postal-codes/CA/AB/alberta.html) will be used to obtain necessary data for the neighbourhoods, postal codes and geographic coordinates in Alberta. This website also offers the option of downloading the data into text files which enables one to format and import a CSV with the necessary dataset.

[Foursquare](https://foursquare.com) will be used to leverage and explore venue data to target recommended locations for the prospective business venture. 

By merging the data from Geonames and Foursquare we will be able to conclude where starting a non-franchised coffee shop / book store would prove most profitable. 

### Libraries required for to handle the data

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # comment this line again once installed
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # comment this line again once installed
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Import the CSV file stored as an asset in our Watson Studio repository. This contains all the location information we need for Alberta. 

In [27]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.
client_0fa096a4b4294aba946d33b7cba66123 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='sI72FWIhaoIsNnlZrsO_mzNwPUCcWuGlTZUaR2izH6sG',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_0fa096a4b4294aba946d33b7cba66123.get_object(Bucket='capstoneprojectcoursera-donotdelete-pr-9efd3ob21m6npd',Key='CA Geo Data.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body, delimiter = ';')

In [28]:
alberta_data = df[df['County'].str.contains("Alberta")].reset_index(drop=True)
alberta_data.head()

Unnamed: 0,Country Code,Postal Code,Neighbourhood,County,County Code,Latitude,Longitude
0,CA,T0A,Eastern Alberta (St. Paul),Alberta,AB,54.766,-111.7174
1,CA,T0B,Wainwright Region (Tofield),Alberta,AB,53.0727,-111.5816
2,CA,T0C,Central Alberta (Stettler),Alberta,AB,52.1431,-111.6941
3,CA,T0E,Western Alberta (Jasper),Alberta,AB,53.6758,-115.0948
4,CA,T0G,North Central Alberta (Slave Lake),Alberta,AB,55.6993,-114.4529


#### Define Foursquare Credentials and Version

In [22]:
CLIENT_ID = 'YSCTJOHAR2JB4ED34YQN01MYF5B4WOHBIJRYDHBMIGODMEQB' # your Foursquare ID
CLIENT_SECRET = 'XDOW1SGFHMRDEL2JEQLTRPGBDF2G3ETU2IUTM4XDHDRG5ECT' # your Foursquare Secret
VERSION = '20180605'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YSCTJOHAR2JB4ED34YQN01MYF5B4WOHBIJRYDHBMIGODMEQB
CLIENT_SECRET:XDOW1SGFHMRDEL2JEQLTRPGBDF2G3ETU2IUTM4XDHDRG5ECT


In [33]:
#Get the Geo coorinantes of Alberta to begin the neighbourhood analysis
address = 'Alberta, CA'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 55.001251, -115.002136.
