# Chris Devine's Coursera Capstone Project
# Which Toronto Neighborhood Should I Build My Crossfit Gym In?

## Introduction

**1. Background** Crossfit is a sport with growing interest worldwide because of the most recent implementation of "National Champions" reporting to the Crossfit Games.  In years past, only the top 40 men and women worldwide were invited to the Crossfit Games to determine who is the best.  This past year, Sanctionals were implemented, allowing for smaller competitions around the world whose winner was also invited to this game.  This gave people more opportunities to experience Crossfit and enjoy watching local competition.  They also invited the top ranked Crossfiter from each country that has a Crossfit gym.

**2. Problem** Two of the most famous Crossfit personalities are Patrick Vellner, and Brent Fikowski, both of whom are Canadian.  This has caused a rise in popularity of Crossfit, so I am looking to determine which neighborhood in Toronto should I build a Crossfit gym.

## Data Acquisition and Cleaning

Data on neighborhood segmentation was scraped from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
using Beautiful Soup and merged with Latitude and Longitude data from Geospatial_Coordinates.csv

In [244]:
import urllib.request

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [3]:
page = urllib.request.urlopen(url)

In [4]:
from bs4 import BeautifulSoup

In [5]:
soup = BeautifulSoup(page, "lxml")

In [7]:
soup.title

<title>List of postal codes of Canada: M - Wikipedia</title>

In [245]:
right_table = soup.find("table", class_="wikitable sortable")

In [12]:
A = []
B = []
C = []

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

In [13]:
import pandas as pd

In [14]:
df = pd.DataFrame(A,columns=['Postal Code'])
df['Borough']=B
df['Neighborhood']=C
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"


## Cleaning Dataframe

In [149]:
df_dropna = df[df.Borough != 'Not assigned'].reset_index(drop=True)
df_dropna.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [192]:
df_group = df_dropna.sort_values(['Postal Code','Borough'])
df_group.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
6,M1B,Scarborough,"Malvern, Rouge"
12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
18,M1E,Scarborough,"Guildwood, Morningside, West Hill"
22,M1G,Scarborough,Woburn
26,M1H,Scarborough,Cedarbrae


In [194]:
df_group.reset_index(inplace=True)
df_group.head()

Unnamed: 0,level_0,index,Postal Code,Borough,Neighborhood
0,0,6,M1B,Scarborough,"Malvern, Rouge"
1,1,12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,2,18,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,3,22,M1G,Scarborough,Woburn
4,4,26,M1H,Scarborough,Cedarbrae


In [169]:
df2 = pd.read_csv("Geospatial_Coordinates.csv")

In [170]:
df2.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [198]:
df_group['Latitude'] = df2['Latitude'].values
df_group['Longitude'] = df2['Longitude'].values

In [199]:
df_group.head()

Unnamed: 0,level_0,index,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,0,6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,1,12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,2,18,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,3,22,M1G,Scarborough,Woburn,43.770992,-79.216917
4,4,26,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [201]:
new_df = df_group.drop(['level_0','index'], axis=1)

In [202]:
new_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [203]:
toronto_df = new_df[new_df['Borough'].str.contains("Toronto")]

In [204]:
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


## Exploring Map of Neighborhoods of Toronto

In [209]:
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

Collecting package metadata (current_repodata.json): done
Solving environment: \ 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/osx-64::anaconda==2019.07=py37_0
  - defaults/osx-64::numba==0.44.1=py37h6440ff4_0
done

## Package Plan ##

  environment location: //anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    tbb-2020.0                 |       h04f5b5a_0         150 KB
    ------------------------------------------------------------
                                           Total:         150 KB

The following NEW packages will be INSTALLED:

  tbb                pkgs/main/osx-64::tbb-2020.0-h04f5b5a_0



Downloading and Extracting Packages
tbb-2020.0           | 150 KB    | ##################################### | 100% 
Preparing transaction: d

In [207]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [210]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Using Foursquare API To Find Most Common Venue Types In Neighborhoods

In [211]:
CLIENT_ID = 'Y4THQXXWVTLPFVDV2VK0I1Y2WJXFZHIG0SP2VXHDLXV53O00' # your Foursquare ID
CLIENT_SECRET = 'WN20ELHBSNIMZR4Z1HKIN1YTXGB4Z2VYGRIVGIUBWFIPKRTD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y4THQXXWVTLPFVDV2VK0I1Y2WJXFZHIG0SP2VXHDLXV53O00
CLIENT_SECRET:WN20ELHBSNIMZR4Z1HKIN1YTXGB4Z2VYGRIVGIUBWFIPKRTD


In [212]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [214]:
LIMIT = 100
toronto_venues = getNearbyVenues(names=toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )

The Beaches

The Danforth West, Riverdale

India Bazaar, The Beaches West

Studio District

Lawrence Park

Davisville North

North Toronto West

Davisville

Moore Park, Summerhill East

Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park

Rosedale

St. James Town, Cabbagetown

Church and Wellesley

Regent Park, Harbourfront

Garden District, Ryerson

St. James Town

Berczy Park

Central Bay Street

Richmond, Adelaide, King

Harbourfront East, Union Station, Toronto Islands

Toronto Dominion Centre, Design Exchange

Commerce Court, Victoria Hotel

Roselawn

Forest Hill North & West

The Annex, North Midtown, Yorkville

University of Toronto, Harbord

Kensington Market, Chinatown, Grange Park

CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport

Stn A PO Boxes

First Canadian Place, Underground city

Christie

Dufferin, Dovercourt Village

Little Portugal, Trinity

Brockton, Parkdale Village, Exhibition Place

High Park

In [215]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop


In [216]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,55,55,55,55,55,55
"Brockton, Parkdale Village, Exhibition Place",22,22,22,22,22,22
Business reply mail Processing Centre,18,18,18,18,18,18
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",14,14,14,14,14,14
Central Bay Street,65,65,65,65,65,65
Christie,17,17,17,17,17,17
Church and Wellesley,77,77,77,77,77,77
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,34,34,34,34,34,34
Davisville North,7,7,7,7,7,7


In [218]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [219]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing Centre,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.071429,0.142857,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.025974,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [220]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park
----
                venue  freq
0         Coffee Shop  0.07
1        Cocktail Bar  0.05
2                Café  0.04
3            Beer Bar  0.04
4  Seafood Restaurant  0.04


----Brockton, Parkdale Village, Exhibition Place
----
            venue  freq
0            Café  0.14
1     Coffee Shop  0.09
2  Breakfast Spot  0.09
3             Bar  0.05
4      Restaurant  0.05


----Business reply mail Processing Centre
----
                  venue  freq
0           Yoga Studio  0.06
1         Garden Center  0.06
2            Skate Park  0.06
3  Fast Food Restaurant  0.06
4            Smoke Shop  0.06


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
----
              venue  freq
0  Airport Terminal  0.14
1    Airport Lounge  0.14
2   Harbor / Marina  0.07
3          Boutique  0.07
4     Boat or Ferry  0.07


----Central Bay Street
----
                venue  freq
0         Coffee Shop  0.17
1  Italian Restaurant 

In [221]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [246]:
import numpy as np
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Beer Bar,Seafood Restaurant
1,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Breakfast Spot,Stadium,Italian Restaurant
2,Business reply mail Processing Centre,Yoga Studio,Butcher,Smoke Shop,Skate Park,Brewery
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Terminal,Airport Lounge,Coffee Shop,Sculpture Garden,Boat or Ferry
4,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Burger Joint


## Using Foursquare API To Find All Crossfit Gyms

In [250]:
search_query = 'Crossfit'
radius = 5000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=Y4THQXXWVTLPFVDV2VK0I1Y2WJXFZHIG0SP2VXHDLXV53O00&client_secret=WN20ELHBSNIMZR4Z1HKIN1YTXGB4Z2VYGRIVGIUBWFIPKRTD&ll=43.6534817,-79.3839347&v=20180605&query=Crossfit&radius=5000&limit=100'

In [251]:
results = requests.get(url).json()

In [253]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
crossfit_df = json_normalize(venues)
crossfit_df

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4f4528bc4b90abdf24c9de85', 'name': 'A...",False,567598db498eb1a182a547ee,103 Richmond St E B01,CA,Toronto,Canada,,808,"[103 Richmond St E B01, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.65263017147902...",43.65263,-79.373969,,ON,Crossfit 6S,v-1590520021,
1,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",False,53fbb59c498ed00f02d01ef6,,CA,,Canada,,825,[Canada],"[{'label': 'display', 'lat': 43.65279956765199...",43.6528,-79.373733,,,Crossfit Argon,v-1590520021,
2,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",False,4eee43b39adf257d67bf33f5,64 Ossington Ave,CA,Toronto,Canada,,2987,"[64 Ossington Ave, Toronto ON M6J 1X6, Canada]","[{'label': 'display', 'lat': 43.64571736232545...",43.645717,-79.419437,M6J 1X6,ON,Academy Of Lions CrossFit Toronto,v-1590520021,37325322.0
3,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",False,4f54ef6ce4b0929810978bb6,Liberty Village,CA,Toronto,Canada,,3766,"[Liberty Village, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.63703559193200...",43.637036,-79.424802,,ON,Reebok Crossfit Liberty Village,v-1590520021,
4,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",False,4ee62e218b81bab285100b5a,"370 Queens Quay W, Unit 114",CA,Toronto,Canada,Spadina Ave,1786,"[370 Queens Quay W, Unit 114 (Spadina Ave), To...","[{'label': 'display', 'lat': 43.63819496273582...",43.638195,-79.390696,M5V 3A6,ON,Crossfit 416,v-1590520021,
5,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",False,570ff104498eeab32a85da3e,175 Avenue Rd,CA,Toronto,Canada,,2571,"[175 Avenue Rd, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.674738, 'lng':...",43.674738,-79.396433,,ON,crossfit ykz,v-1590520021,
6,"[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",False,4ad8cd16f964a520c91421e3,1 Dundas St E,CA,Toronto,Canada,at Yonge St,398,"[1 Dundas St E (at Yonge St), Toronto ON M5B 2...","[{'label': 'display', 'lat': 43.65605389742188...",43.656054,-79.380495,M5B 2R8,ON,Yonge-Dundas Square,v-1590520021,68861986.0
7,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",False,4b505d01f964a5205f2127e3,20 Gladstone Ave,CA,Gladstone Ave,Canada,Queen St W,3682,"[20 Gladstone Ave (Queen St W), Gladstone Ave ...","[{'label': 'display', 'lat': 43.643108, 'lng':...",43.643108,-79.427341,M5J 2Y5,ON,Crossfit Gyms,v-1590520021,
8,"[{'id': '4bf58dd8d48988d175941735', 'name': 'G...",False,5277c8ce498ef0b484e3ccb9,567 Dupont St,CA,Toronto,Canada,,3483,"[567 Dupont St, Toronto ON M6G 1Y8, Canada]","[{'label': 'display', 'lat': 43.67222118147730...",43.672221,-79.418576,M6G 1Y8,ON,Crossfit Leviathan,v-1590520021,
9,"[{'id': '4f4528bc4b90abdf24c9de85', 'name': 'A...",False,571139dc498e1c30bcb21654,40 Laird Dr,CA,Toronto,Canada,,5769,"[40 Laird Dr, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.703357, 'lng':...",43.703357,-79.364447,,ON,crossfit yyz,v-1590520021,


In [255]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in crossfit_df.columns if col.startswith('location.')] + ['id']
dataframe_filtered = crossfit_df.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id
0,Crossfit 6S,Athletics & Sports,103 Richmond St E B01,CA,Toronto,Canada,,808,"[103 Richmond St E B01, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.65263017147902...",43.65263,-79.373969,,ON,567598db498eb1a182a547ee
1,Crossfit Argon,Gym,,CA,,Canada,,825,[Canada],"[{'label': 'display', 'lat': 43.65279956765199...",43.6528,-79.373733,,,53fbb59c498ed00f02d01ef6
2,Academy Of Lions CrossFit Toronto,Gym / Fitness Center,64 Ossington Ave,CA,Toronto,Canada,,2987,"[64 Ossington Ave, Toronto ON M6J 1X6, Canada]","[{'label': 'display', 'lat': 43.64571736232545...",43.645717,-79.419437,M6J 1X6,ON,4eee43b39adf257d67bf33f5
3,Reebok Crossfit Liberty Village,Gym,Liberty Village,CA,Toronto,Canada,,3766,"[Liberty Village, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.63703559193200...",43.637036,-79.424802,,ON,4f54ef6ce4b0929810978bb6
4,Crossfit 416,Gym,"370 Queens Quay W, Unit 114",CA,Toronto,Canada,Spadina Ave,1786,"[370 Queens Quay W, Unit 114 (Spadina Ave), To...","[{'label': 'display', 'lat': 43.63819496273582...",43.638195,-79.390696,M5V 3A6,ON,4ee62e218b81bab285100b5a
5,crossfit ykz,Gym / Fitness Center,175 Avenue Rd,CA,Toronto,Canada,,2571,"[175 Avenue Rd, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.674738, 'lng':...",43.674738,-79.396433,,ON,570ff104498eeab32a85da3e
6,Yonge-Dundas Square,Plaza,1 Dundas St E,CA,Toronto,Canada,at Yonge St,398,"[1 Dundas St E (at Yonge St), Toronto ON M5B 2...","[{'label': 'display', 'lat': 43.65605389742188...",43.656054,-79.380495,M5B 2R8,ON,4ad8cd16f964a520c91421e3
7,Crossfit Gyms,Gym / Fitness Center,20 Gladstone Ave,CA,Gladstone Ave,Canada,Queen St W,3682,"[20 Gladstone Ave (Queen St W), Gladstone Ave ...","[{'label': 'display', 'lat': 43.643108, 'lng':...",43.643108,-79.427341,M5J 2Y5,ON,4b505d01f964a5205f2127e3
8,Crossfit Leviathan,Gym / Fitness Center,567 Dupont St,CA,Toronto,Canada,,3483,"[567 Dupont St, Toronto ON M6G 1Y8, Canada]","[{'label': 'display', 'lat': 43.67222118147730...",43.672221,-79.418576,M6G 1Y8,ON,5277c8ce498ef0b484e3ccb9
9,crossfit yyz,Athletics & Sports,40 Laird Dr,CA,Toronto,Canada,,5769,"[40 Laird Dr, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.703357, 'lng':...",43.703357,-79.364447,,ON,571139dc498e1c30bcb21654


In [256]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


## Map of Toronto Neighborhoods And Crossfit Gyms

In [303]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13)


# add the Crossfit Gyms as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(venues_map)
    
for lat, lng, label in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(venues_map)

# display map
venues_map

## Looking For Neighborhoods With Potential Crossfit Competitors

In [264]:
for col in toronto_grouped.columns:
    print(col)

Neighborhood
Yoga Studio
Afghan Restaurant
Airport
Airport Food Court
Airport Gate
Airport Lounge
Airport Service
Airport Terminal
American Restaurant
Antique Shop
Aquarium
Art Gallery
Art Museum
Arts & Crafts Store
Asian Restaurant
Athletics & Sports
Auto Workshop
BBQ Joint
Baby Store
Bagel Shop
Bakery
Bank
Bar
Baseball Stadium
Basketball Stadium
Beach
Bed & Breakfast
Beer Bar
Beer Store
Belgian Restaurant
Bike Rental / Bike Share
Bistro
Boat or Ferry
Bookstore
Boutique
Brazilian Restaurant
Breakfast Spot
Brewery
Bubble Tea Shop
Building
Burger Joint
Burrito Place
Bus Line
Butcher
Café
Cajun / Creole Restaurant
Camera Store
Candy Store
Caribbean Restaurant
Cheese Shop
Chinese Restaurant
Chocolate Shop
Church
Climbing Gym
Clothing Store
Cocktail Bar
Coffee Shop
College Arts Building
College Auditorium
College Cafeteria
College Gym
College Rec Center
Colombian Restaurant
Comfort Food Restaurant
Comic Shop
Concert Hall
Convenience Store
Convention Center
Cosmetics Shop
Coworking Space
Cr

In [265]:
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing Centre,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.071429,0.142857,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0


In [284]:
toronto_competition = toronto_grouped[['Neighborhood','Yoga Studio','Climbing Gym','College Gym','College Rec Center','Gym','Gym / Fitness Center']]
toronto_competition.head()

Unnamed: 0,Neighborhood,Yoga Studio,Climbing Gym,College Gym,College Rec Center,Gym,Gym / Fitness Center
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.045455,0.0,0.0,0.045455,0.0
2,Business reply mail Processing Centre,0.055556,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.015385,0.0,0.0,0.0,0.0,0.015385


In [296]:
yoga = toronto_competition[toronto_competition['Yoga Studio'] !=0]
yoga

Unnamed: 0,Neighborhood,Yoga Studio,Climbing Gym,College Gym,College Rec Center,Gym,Gym / Fitness Center
2,Business reply mail Processing Centre,0.055556,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.015385,0.0,0.0,0.0,0.0,0.015385
6,Church and Wellesley,0.025974,0.0,0.0,0.0,0.012987,0.0
19,"Little Portugal, Trinity",0.022222,0.0,0.0,0.0,0.0,0.0
21,North Toronto West,0.045455,0.0,0.0,0.0,0.0,0.0
23,"Queen's Park, Ontario Provincial Government",0.032258,0.0,0.0,0.0,0.032258,0.0
24,"Regent Park, Harbourfront",0.020833,0.0,0.0,0.0,0.0,0.020833
28,"Runnymede, Swansea",0.02439,0.0,0.0,0.0,0.02439,0.0
31,Stn A PO Boxes,0.01087,0.0,0.0,0.0,0.021739,0.0
32,Studio District,0.025,0.0,0.0,0.0,0.0,0.025


In [297]:
climb = toronto_competition[toronto_competition['Climbing Gym'] !=0]
climb

Unnamed: 0,Neighborhood,Yoga Studio,Climbing Gym,College Gym,College Rec Center,Gym,Gym / Fitness Center
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.045455,0.0,0.0,0.045455,0.0


In [298]:
college = toronto_competition[toronto_competition['College Gym'] !=0]
college

Unnamed: 0,Neighborhood,Yoga Studio,Climbing Gym,College Gym,College Rec Center,Gym,Gym / Fitness Center
38,"University of Toronto, Harbord",0.029412,0.0,0.029412,0.0,0.0,0.0


In [299]:
gym = toronto_competition[toronto_competition['Gym'] !=0]
gym

Unnamed: 0,Neighborhood,Yoga Studio,Climbing Gym,College Gym,College Rec Center,Gym,Gym / Fitness Center
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.045455,0.0,0.0,0.045455,0.0
6,Church and Wellesley,0.025974,0.0,0.0,0.0,0.012987,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.04,0.01
8,Davisville,0.0,0.0,0.0,0.0,0.058824,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.142857,0.0
11,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.04,0.01
13,"Garden District, Ryerson",0.0,0.0,0.0,0.01,0.01,0.01
14,"Harbourfront East, Union Station, Toronto Isla...",0.0,0.0,0.0,0.0,0.01,0.0
16,"India Bazaar, The Beaches West",0.0,0.0,0.0,0.0,0.052632,0.0
20,"Moore Park, Summerhill East",0.0,0.0,0.0,0.0,0.333333,0.0


In [300]:
fc = toronto_competition[toronto_competition['Gym / Fitness Center'] !=0]
fc

Unnamed: 0,Neighborhood,Yoga Studio,Climbing Gym,College Gym,College Rec Center,Gym,Gym / Fitness Center
4,Central Bay Street,0.015385,0.0,0.0,0.0,0.0,0.015385
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.04,0.01
11,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.04,0.01
13,"Garden District, Ryerson",0.0,0.0,0.0,0.01,0.01,0.01
24,"Regent Park, Harbourfront",0.020833,0.0,0.0,0.0,0.0,0.020833
25,"Richmond, Adelaide, King",0.0,0.0,0.0,0.0,0.032609,0.01087
32,Studio District,0.025,0.0,0.0,0.0,0.0,0.025
37,"Toronto Dominion Centre, Design Exchange",0.0,0.0,0.0,0.0,0.01,0.01


In [366]:
potential_neighborhoods = toronto_competition.iloc[[3,5,10,12,15,17,18,22,26,27,30,33,34,35]]
potential_neighborhoods

Unnamed: 0,Neighborhood,Yoga Studio,Climbing Gym,College Gym,College Rec Center,Gym,Gym / Fitness Center
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0
10,"Dufferin, Dovercourt Village",0.0,0.0,0.0,0.0,0.0,0.0
12,Forest Hill North & West,0.0,0.0,0.0,0.0,0.0,0.0
15,"High Park, The Junction South",0.0,0.0,0.0,0.0,0.0,0.0
17,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0
18,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0
22,"Parkdale, Roncesvalles",0.0,0.0,0.0,0.0,0.0,0.0
26,Rosedale,0.0,0.0,0.0,0.0,0.0,0.0
27,Roselawn,0.0,0.0,0.0,0.0,0.0,0.0


## Income Data

Crossfit is very expensive, compared to the typical gym because of the unique coaching aspect.  Therefore, when looking for which neighborhood to choose, it is crucial to also look at income to make sure that the citizens will be willing to spend disposable income on the classes.

In [319]:
url2 = "https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods"

In [321]:
page2 = urllib.request.urlopen(url2)

In [322]:
soup2 = BeautifulSoup(page2, "lxml")

In [323]:
soup2.title

<title>Demographics of Toronto neighbourhoods - Wikipedia</title>

In [326]:
right_table2 = soup.find('table', class_='wikitable sortable')
right_table2

<table border="1" cellpadding="5" cellspacing="0" class="wikitable sortable">
<tbody><tr>
<th width="20%">Name
</th>
<th width="5%">FM
</th>
<th width="20%">Census Tracts
</th>
<th width="5%">Population
</th>
<th width="5%">Land area (km2)
</th>
<th width="5%">Density (people/km2)
</th>
<th width="5%">% Change in Population since 2001
</th>
<th width="5%">Average Income
</th>
<th width="5%">Transit Commuting %
</th>
<th width="5%">% Renters
</th>
<th width="10%">Second most common language (after English) by name
</th>
<th width="10%">Second most common language (after English) by percentage
</th>
<th width="10%">Map
</th></tr>
<tr>
<td><b>Toronto <a class="mw-redirect" href="/wiki/Census_metropolitan_area" title="Census metropolitan area">CMA</a> Average</b>
</td>
<td>
</td>
<td>All
</td>
<td><b>5,113,149</b>
</td>
<td><b>5903.63</b>
</td>
<td><b>866</b>
</td>
<td><b>9.0</b>
</td>
<td><b>40,704</b>
</td>
<td><b>10.6</b>
</td>
<td><b>11.4</b>
</td>
<td>
</td>
<td>
</td>
<td>
</td></tr>

In [327]:
A2 = []
B2 = []

for row in right_table2.findAll('tr'):
    cells = row.findAll('td')
    if len(cells)==13:
        A2.append(cells[0].find(text=True))
        B2.append(cells[7].find(text=True))

In [351]:
income_df = pd.DataFrame(A2,columns=['Neighborhood'])
income_df['Average Income']=B2

income_df.head()

Unnamed: 0,Neighborhood,Average Income
0,Toronto,40704
1,Agincourt,25750
2,Alderwood,35239
3,Alexandra Park,19687
4,Allenby,245592


In [352]:
income_df.dtypes

Neighborhood      object
Average Income    object
dtype: object

In [354]:
income_df['Average Income'] = (income_df['Average Income'].str.split()).apply(lambda x: float(x[0].replace(',', '')))

In [368]:
income_df_sorted = income_df.sort_values(by = ['Average Income'], ascending = False)
income_df_sorted.head()

Unnamed: 0,Neighborhood,Average Income
18,Bridle Path,314107.0
4,Allenby,245592.0
71,Hoggs Hollow,222560.0
90,Lawrence Park,214110.0
130,Rosedale,213941.0


### Map of Neighborhoods and Crossfit Gyms with Rosedale Highlighted

In [378]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13)


# add the Crossfit Gyms as blue circle markers
folium.features.CircleMarker(
        [43.679, -79.378],
        radius=15,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(venues_map)

for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(venues_map)
    
for lat, lng, label in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(venues_map)

# display map
venues_map

# Conclusion

## Where to Build the Crossfit Gym - Rosedale

There are very few neighborhoods with no alternative fitness opportunities for the citizens in the city.  Rosedale is one of those neighborhoods that also has the 5th highest average income.  It is also close to other neighborhoods with successful Crossfit gyms, hoping to draw in members who were leaving from Rosedale to the other neighborhoods to workout.