### Description of the problem

I have been contacted by one of the biggest gym chains in North America to discover the best location for the opening of a new cutting-edge gym in the city of Toronto. The facility needs to be open in an under-served area, where people are not satisfied by the current offer. In order to avoid facing fierce competition with other gyms, the location should be identified as a neighborhood with the lowest number of gyms in the area and the lowest rating from FourSquare users. In this way, we will match an un-met demand where competion is low. The goal of the client is indeed to absorb unhappy clients of already existing gym facilities. In this way, we do not need to create a demand, but to capture it.

### Data used to solve the problem

In oreder to answer this question, I will use the data pertaining the neighborhoods in the city of Toronto from Wikipedia, as well as a .csv file containing the geo-spatial coordinates of each neighborhood. This is necessary in order to communicate with the FourSquare API and get the information on venues location.

### Let's start by importing the required libraries

In [8]:
#Import required libraries

import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

### Now let's scrap from Wikipedia the data needed to perform the task

In [3]:
#Scraping data from Wikipedia page

r = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
html_data = r.text
soup = BeautifulSoup(html_data, 'html5lib')

In [4]:
#Creating a DataFrame with the data from Wikipedia

table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


### Let's also upload the data from the .csv file with the geo-spatial coordinates and merge the two dataframe in a single one

In [6]:
#Uploading data from the .csv file containing geospatial coordinates

gs_coord = pd.read_csv('Geospatial_Coordinates.csv')
gs_coord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
#Merging the two DataFrame to obtain a single object to work with

df = df.join(gs_coord.set_index('Postal Code'), on='PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


### Now let's have an overview of the different neighborhoods with Folium

In [9]:
#Use Geopy to get the coordinates of Toronto

address = 'Toronto, TO'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.65238435, -79.38356765.


In [10]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Now, with the help of the Foursquare API, I will make a request for the venues in the city of Toronto

In [84]:
CLIENT_ID = 'S0L5CWONFEW5WRLZKJ5R3Y54W2NKU3RFKLZXUFLWXSGA25JT' # your Foursquare ID
CLIENT_SECRET = 'UTKKHPY01GQ1S5YF1YGEPP4NO1C2FIZO1KS3OSFUEDXR4L11' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
ACCESS_TOKEN = 'ZKPOUMTEXBOJ23IULO5K1LVE23Q20OUNZVAQVRFR1PMWK1E5' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: S0L5CWONFEW5WRLZKJ5R3Y54W2NKU3RFKLZXUFLWXSGA25JT
CLIENT_SECRET:UTKKHPY01GQ1S5YF1YGEPP4NO1C2FIZO1KS3OSFUEDXR4L11


In [97]:
#Create a function to explore neighborhoods in Toronto

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['id'],
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'id',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [98]:
#Apply the above function and visualize the venues in a DataFrame

toronto_venues = getNearbyVenues(names=df['Neighborhood'], latitudes=df['Latitude'], longitudes=df['Longitude'])

print(toronto_venues.shape)
toronto_venues.head()

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,4e8d9dcdd5fbbbb6b3003c7b,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,4cb11e2075ebb60cd1c4caad,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,4c633acb86b6be9a61268e34,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,4f3ecce6e4b0587016b6f30d,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,4bbe904a85fbb713420d7167,Tim Hortons,43.725517,-79.313103,Coffee Shop


### Let's group the dataframe to understand how many gyms there are in Toronto and in which neighborhoods they are located

In [99]:
toronto_gyms = toronto_venues.loc[toronto_venues['Venue Category'] == 'Gym']
toronto_gyms.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
87,Ontario Provincial Government,43.662301,-79.389494,4bc64a7ad35d9c742083e23a,Hart House Gym,43.664172,-79.394888,Gym
95,Don Mills North,43.745906,-79.352188,4c18e819d4d9c9284e19f029,LA Fitness,43.747665,-79.347077,Gym
207,"Garden District, Ryerson",43.657162,-79.378937,50885719498ea7b5aab3a74c,GoodLife Fitness Toronto Bell Trinity Centre,43.653436,-79.382314,Gym
220,Don Mills South,43.7259,-79.340923,4b71ff80f964a52035692de3,Fitness Connection,43.727473,-79.341707,Gym
231,Don Mills South,43.7259,-79.340923,4ce5aed1f3bda1430f89a6e4,GoodLife Fitness North York Don Mills and Egli...,43.722704,-79.337508,Gym


In [100]:
toronto_gyms_sorted = toronto_gyms.groupby('Neighborhood').count().sort_values(by='Venue', ascending=False)
toronto_gyms_sorted

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Commerce Court, Victoria Hotel",4,4,4,4,4,4,4
"First Canadian Place, Underground city",4,4,4,4,4,4,4
"Richmond, Adelaide, King",3,3,3,3,3,3,3
Davisville,2,2,2,2,2,2,2
St. James Town,2,2,2,2,2,2,2
Don Mills South,2,2,2,2,2,2,2
Enclave of M5E,2,2,2,2,2,2,2
"Alderwood, Long Branch",1,1,1,1,1,1,1
"Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West",1,1,1,1,1,1,1
Thorncliffe Park,1,1,1,1,1,1,1


### From the above table we can see that our focus will be on those neighborhoods with no more than one gym. In this way we will avoid to get into fierce competition with too numerous players. We should have searched for neighborhoods with zero gyms, but in this way, we are more confident that in the areas we identified there is already a demand (even if not completely satisfied) for this kind of service

In [101]:
#Select neighborhoods with only one gym present

toronto_gyms_1 = toronto_gyms_sorted.loc[toronto_gyms_sorted['Venue'] == 1]
toronto_gyms_1.reindex()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Alderwood, Long Branch",1,1,1,1,1,1,1
"Mimico NW, The Queensway West, South of Bloor, Kingsway Park South West, Royal York South West",1,1,1,1,1,1,1
Thorncliffe Park,1,1,1,1,1,1,1
"Runnymede, Swansea",1,1,1,1,1,1,1
Ontario Provincial Government,1,1,1,1,1,1,1
"New Toronto, Mimico South, Humber Bay Shores",1,1,1,1,1,1,1
"Garden District, Ryerson",1,1,1,1,1,1,1
"India Bazaar, The Beaches West",1,1,1,1,1,1,1
"Harbourfront East, Union Station, Toronto Islands",1,1,1,1,1,1,1
"Brockton, Parkdale Village, Exhibition Place",1,1,1,1,1,1,1


In [102]:
#Extract a list of the neighborhoods from the index for slicing our toronto_gyms dataframe

l = list(toronto_gyms_1.index.values)
selected_nh = toronto_gyms.isin(l)
selected_nh = selected_nh.loc[selected_nh['Neighborhood'] == True]
selected_nh

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
87,True,False,False,False,False,False,False,False
95,True,False,False,False,False,False,False,False
207,True,False,False,False,False,False,False,False
511,True,False,False,False,False,False,False,False
570,True,False,False,False,False,False,False,False
866,True,False,False,False,False,False,False,False
973,True,False,False,False,False,False,False,False
1071,True,False,False,False,False,False,False,False
1111,True,False,False,False,False,False,False,False
1383,True,False,False,False,False,False,False,False


In [103]:
#Let's filter our dataframe with the venues of our interest

selected_venues = toronto_gyms.loc[selected_nh.index]
selected_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
87,Ontario Provincial Government,43.662301,-79.389494,4bc64a7ad35d9c742083e23a,Hart House Gym,43.664172,-79.394888,Gym
95,Don Mills North,43.745906,-79.352188,4c18e819d4d9c9284e19f029,LA Fitness,43.747665,-79.347077,Gym
207,"Garden District, Ryerson",43.657162,-79.378937,50885719498ea7b5aab3a74c,GoodLife Fitness Toronto Bell Trinity Centre,43.653436,-79.382314,Gym
511,Central Bay Street,43.657952,-79.387383,503aa065e4b0bbcb176a802b,Burano Gym,43.662053,-79.386038,Gym
570,Thorncliffe Park,43.705369,-79.349372,56c5d5e1cd106ec35067eeee,Fit4Less,43.705689,-79.346018,Gym
866,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,4c0dca39c700c9b612ffa2dd,GoodLife Fitness Toronto Union Station,43.644336,-79.383625,Gym
973,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,501ae947e4b0d11883b910a7,Equinox Bay Street,43.6481,-79.379989,Gym
1071,"Brockton, Parkdale Village, Exhibition Place",43.636847,-79.428191,4f54ef6ce4b0929810978bb6,Reebok Crossfit Liberty Village,43.637036,-79.424802,Gym
1111,"India Bazaar, The Beaches West",43.668999,-79.315572,4ade390ff964a5200e7421e3,System Fitness,43.667171,-79.312733,Gym
1383,Davisville North,43.712751,-79.390197,4da99b34a86e771ea70e84c1,Gym,43.713126,-79.393537,Gym


### Now let's visualize where our target competitors are located

In [70]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Toronto


# add Toronto as a red circle mark
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Ecco',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add selected gyms to the map as blue circle markers
for lat, lng, label in zip(selected_venues['Venue Latitude'], selected_venues['Venue Longitude'], selected_venues['Venue Category']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

In [104]:
venue_id = '4bc64a7ad35d9c742083e23a'
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)

result = requests.get(url).json()
try:
    print(result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

6.3


In [113]:
ids = selected_venues['id'].to_list()
ids

['4bc64a7ad35d9c742083e23a',
 '4c18e819d4d9c9284e19f029',
 '50885719498ea7b5aab3a74c',
 '503aa065e4b0bbcb176a802b',
 '56c5d5e1cd106ec35067eeee',
 '4c0dca39c700c9b612ffa2dd',
 '501ae947e4b0d11883b910a7',
 '4f54ef6ce4b0929810978bb6',
 '4ade390ff964a5200e7421e3',
 '4da99b34a86e771ea70e84c1',
 '586d32a30802cb696d293d1a',
 '4ba7867af964a520909a39e3',
 '4bf681e794b2a5936788adee',
 '4b9fbdb4f964a520583a37e3',
 '4c8821e9bbec6dcb7b93d158']

In [117]:
for i in range(len(ids)):
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(ids[i], CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)
    result = requests.get(url).json()
    try:
        print(result['response']['venue']['rating'])
    except:
        print('This venue has not been rated yet.')

6.3
7.8
7.2
This venue has not been rated yet.
7.4
6.8
8.4
8.6
8.2
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
7.2
This venue has not been rated yet.


### From the above loop we can see that, even though six out of fifteen of our target competitors are not rated yet, the first of our list is the one with the worst rating. Therefore, we can consider opening our new gym in this "Ontario Provincial Government" neighborhood

In [119]:
target_nh = selected_venues.loc[selected_venues['id'] == ids[0]]
target_nh

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
87,Ontario Provincial Government,43.662301,-79.389494,4bc64a7ad35d9c742083e23a,Hart House Gym,43.664172,-79.394888,Gym


In [124]:
lat= float(target_nh['Neighborhood Latitude'])
lng= float(target_nh['Neighborhood Longitude'])

target_nh_map = folium.Map(location=[lat, lng], zoom_start=10)

# add our neighborhood a red circle mark
folium.CircleMarker(
    [lat, lng],
    radius=10,
    popup='Ontario Provincial Government',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(target_nh_map)

# display map
target_nh_map