<h1> Where in Toronto should i open a new Bike shop? </h1>
<p> Cycling is a common mode of transportation within the city and is a great recreational activity.   As an enthusiast and budding entrepreneur, where would be the ideal location in Toronto to open a new business?

The ideal location ideally should be an area that is currently underserved by existing bike shops but also may have a large number of cyclists.  To help identify this area, we'll use the following:

* Outdoor (high capacity) bicycle parking infrastructure
* Neighbourhood details based on Postal Codes (FSA)
* FourSquare Data relating to existing bike shops

<h2>  Data Sources </h2>

* Toronto Open Data (https://open.toronto.ca/dataset/bicycle-parking-high-capacity-outdoor/

* FourSquare API 

* Wikipedia (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)

* GeoPy 

In [1]:
conda update -n base -c defaults conda

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\T450s\anaconda3

  added / updated specs:
    - conda


The following packages will be REMOVED:

  python_abi-3.7-1_cp37m

The following packages will be SUPERSEDED by a higher-priority channel:

  conda              conda-forge::conda-4.9.2-py37h03978a9~ --> pkgs/main::conda-4.9.2-py37haa95532_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\T450s\anaconda3

  added / updated specs:
    - geopy


The following NEW packages will be INSTALLED:

  python_abi         conda-forge/win-64::python_abi-3.7-1_cp37m

The following packages will be SUPERSEDED by a higher-priority channel:

  conda               pkgs/main::conda-4.9.2-py37haa95532_0 --> conda-forge::conda-4.9.2-py37h03978a9_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


In [3]:
!pip install beautifulsoup4
!pip install lxml
from bs4 import BeautifulSoup # library to parse HTML documents



<h1> Toronto Data

In [4]:
# get the response in the form of html
wikiurl="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
table_class="wikitable sortable jquery-tablesorter"
response=requests.get(wikiurl)
soup = BeautifulSoup(response.text, 'html.parser')
torontowiki=soup.find('table',{'class':"wikitable"})
df=pd.read_html(str(torontowiki))
df=pd.DataFrame(df[0])
print(df.head())

  Postal Code           Borough              Neighbourhood
0         M1A      Not assigned               Not assigned
1         M2A      Not assigned               Not assigned
2         M3A        North York                  Parkwoods
3         M4A        North York           Victoria Village
4         M5A  Downtown Toronto  Regent Park, Harbourfront


In [5]:
df1 = df[df.Borough != 'Not assigned']
df1.columns = [c.replace(' ', '_') for c in df1.columns]
df2 = df1.groupby(['Postal_Code','Borough'], sort=False).agg(', '.join)
df2.reset_index(inplace=True)
df2['Neighbourhood'] = np.where(df2['Neighbourhood'] == 'Not assigned',df2['Borough'], df2['Neighbourhood'])

In [81]:
#geocode
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon.columns = [c.replace(' ', '_') for c in lat_lon.columns]
df3 = pd.merge(df2,lat_lon,on='Postal_Code')
df3.rename(columns={'Postal_Code':'POSTAL_CODE'},inplace=True)

In [82]:
tdot = df3[df3['Borough'].str.contains('Toronto',regex=False)]
tdot.head()

Unnamed: 0,POSTAL_CODE,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


<h1> Bike Data

In [86]:
BikeRacks = pd.read_csv("Bicycle Parking Map Data.csv")
BikeRacks2 = BikeRacks[['POSTAL_CODE','LONGITUDE', 'LATITUDE', 'BICYCLE_CAPACITY']]
BikeRacks3 = BikeRacks2[BikeRacks2.POSTAL_CODE.notnull()]
BikeRacks3['POSTAL_CODE'] = BikeRacks3['POSTAL_CODE'].str[:3]
BikeRacks4 = BikeRacks3[['POSTAL_CODE', 'BICYCLE_CAPACITY']]
BikeRacks5 = BikeRacks4.groupby(['POSTAL_CODE'])['BICYCLE_CAPACITY'].sum().reset_index()
BikeRacks5

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,POSTAL_CODE,BICYCLE_CAPACITY
0,M1P,8
1,M1T,8
2,M1V,8
3,M1W,8
4,M4J,18
5,M4K,32
6,M4L,8
7,M4M,40
8,M4R,16
9,M4S,24


<h1> Merge Bike Data and Toronto Data

In [87]:
TDotBike = pd.merge(tdot,BikeRacks5, on='POSTAL_CODE')
TDotBike

Unnamed: 0,POSTAL_CODE,Borough,Neighbourhood,Latitude,Longitude,BICYCLE_CAPACITY
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,78
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,48
2,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,8
3,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,80
4,M6G,Downtown Toronto,Christie,43.669542,-79.422564,112
5,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,144
6,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,80
7,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,102
8,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,93
9,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,32


In [90]:
#create a map
map_toronto = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

for lat,lng,BICYCLE_CAPACITY,neighbourhood in zip(TDotBike['Latitude'],TDotBike['Longitude'],TDotBike['BICYCLE_CAPACITY'],TDotBike['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, BICYCLE_CAPACITY)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
map_toronto

<h1> FourSquare Data

In [91]:
#Credentials
CLIENT_ID = 'SECRET' #Removed Afterwards
CLIENT_SECRET = 'SECRET' #Removed Afterwards
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [92]:
#from the labs
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [165]:
#Bike Shop = 4bf58dd8d48988d115951735
#Bike Share / Rental = 4e4c9077bd41f78e849722f9
#Bike Trails = 56aa371be4b08b9a8d57355e  -- Might be interesting
#Radius is in Meters
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    categoryID='4bf58dd8d48988d115951735' 
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryID)
        #print(requests.get(url).json())
        try:
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            print('Your quota may have been exceeded')
            return
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [166]:
bikeshops = getNearbyVenues(names=TDotBike['Neighbourhood'],latitudes=TDotBike['Latitude'],longitudes=TDotBike['Longitude'],radius=500)
bikeshops.head()

Regent Park, Harbourfront
Garden District, Ryerson
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Studio District
High Park, The Junction South
North Toronto West, Lawrence Park
The Annex, North Midtown, Yorkville
Davisville
University of Toronto, Harbord
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
St. James Town, Cabbagetown


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central Bay Street,43.657952,-79.387383,Franklin the Bike,43.66065,-79.38573,Bike Shop
1,Central Bay Street,43.657952,-79.387383,Bixi Bike Stand,43.661136,-79.39108,Bike Shop
2,Christie,43.669542,-79.422564,Dave Fix My Bike,43.670339,-79.420917,Bike Shop
3,"Dufferin, Dovercourt Village",43.669005,-79.442259,Issie Cycling Services,43.669443,-79.439229,Bike Shop
4,"Dufferin, Dovercourt Village",43.669005,-79.442259,Bill's Used bikes & repair,43.668798,-79.436936,Bike Shop


In [167]:
#merge with TDOT1
bikeshops.rename(columns={'Neighborhood':'Neighbourhood'},inplace=True)
bikeshops

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central Bay Street,43.657952,-79.387383,Franklin the Bike,43.66065,-79.38573,Bike Shop
1,Central Bay Street,43.657952,-79.387383,Bixi Bike Stand,43.661136,-79.39108,Bike Shop
2,Christie,43.669542,-79.422564,Dave Fix My Bike,43.670339,-79.420917,Bike Shop
3,"Dufferin, Dovercourt Village",43.669005,-79.442259,Issie Cycling Services,43.669443,-79.439229,Bike Shop
4,"Dufferin, Dovercourt Village",43.669005,-79.442259,Bill's Used bikes & repair,43.668798,-79.436936,Bike Shop
5,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Wheel Excitement,43.638784,-79.385697,Bike Shop
6,"Little Portugal, Trinity",43.647927,-79.41975,The Pedal Stop Bike Shop,43.646984,-79.419776,Bike Shop
7,"Little Portugal, Trinity",43.647927,-79.41975,Pedal Stop Bicycle Shop,43.647315,-79.419829,Bike Shop
8,"The Danforth West, Riverdale",43.679557,-79.352188,Cyclemania,43.677068,-79.354368,Bike Shop
9,"India Bazaar, The Beaches West",43.668999,-79.315572,Velotique,43.666237,-79.317904,Bike Shop


<h1> Map Bike shops

In [168]:
map_shops = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

for lat,lng,venue in zip(bikeshops['Venue Latitude'],bikeshops['Venue Longitude'],bikeshops['Venue']):
    label = venue.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_shops)
map_shops

<h1> Merge Shop Data and Bike Parking data

In [169]:
#mergeSet
bikeshopcount = bikeshops.groupby(['Neighbourhood'])['Venue'].count().reset_index()
bikeshopcount


Unnamed: 0,Neighbourhood,Venue
0,Central Bay Street,2
1,Christie,1
2,"Dufferin, Dovercourt Village",2
3,"Harbourfront East, Union Station, Toronto Islands",1
4,"High Park, The Junction South",3
5,"India Bazaar, The Beaches West",1
6,"Kensington Market, Chinatown, Grange Park",3
7,"Little Portugal, Trinity",2
8,Studio District,2
9,"The Annex, North Midtown, Yorkville",1


In [193]:
TDotBike2 = TDotBike.set_index('Neighbourhood').reset_index()
TDotBike2 = TDotBike2.merge(right=bikeshopcount, how='outer', on='Neighbourhood')
TDotBike2.rename(columns={'POSTAL_CODE':'Postal_Code'},inplace=True)
TDotBike2.rename(columns={'Venue':'Store_Count'},inplace=True)
TDotBike2.rename(columns={'BICYCLE_CAPACITY':'Bike_Rack_Capacity'},inplace=True)
TDotBike2['Store_Count'] = TDotBike2['Store_Count'].fillna(0)
TDotBike2

Unnamed: 0,Neighbourhood,Postal_Code,Borough,Latitude,Longitude,Bike_Rack_Capacity,Store_Count
0,"Regent Park, Harbourfront",M5A,Downtown Toronto,43.65426,-79.360636,78,0.0
1,"Garden District, Ryerson",M5B,Downtown Toronto,43.657162,-79.378937,48,0.0
2,Berczy Park,M5E,Downtown Toronto,43.644771,-79.373306,8,0.0
3,Central Bay Street,M5G,Downtown Toronto,43.657952,-79.387383,80,2.0
4,Christie,M6G,Downtown Toronto,43.669542,-79.422564,112,1.0
5,"Richmond, Adelaide, King",M5H,Downtown Toronto,43.650571,-79.384568,144,0.0
6,"Dufferin, Dovercourt Village",M6H,West Toronto,43.669005,-79.442259,80,2.0
7,"Harbourfront East, Union Station, Toronto Islands",M5J,Downtown Toronto,43.640816,-79.381752,102,1.0
8,"Little Portugal, Trinity",M6J,West Toronto,43.647927,-79.41975,93,2.0
9,"The Danforth West, Riverdale",M4K,East Toronto,43.679557,-79.352188,32,1.0


<h1> Cluster Data

In [194]:
#break it down into larger areas
k=4
BikeClustered1 = TDotBike2
TCluster1 = BikeClustered1.drop(['Postal_Code','Borough','Neighbourhood', 'Bike_Rack_Capacity', 'Store_Count'],1)
BikeClustered1
kmeans = KMeans(n_clusters = k,random_state=0).fit(TCluster1)
kmeans.labels_
BikeClustered1.insert(0, 'Cluster Labels', kmeans.labels_)


In [195]:
BikeClustered1

Unnamed: 0,Cluster Labels,Neighbourhood,Postal_Code,Borough,Latitude,Longitude,Bike_Rack_Capacity,Store_Count
0,0,"Regent Park, Harbourfront",M5A,Downtown Toronto,43.65426,-79.360636,78,0.0
1,1,"Garden District, Ryerson",M5B,Downtown Toronto,43.657162,-79.378937,48,0.0
2,1,Berczy Park,M5E,Downtown Toronto,43.644771,-79.373306,8,0.0
3,1,Central Bay Street,M5G,Downtown Toronto,43.657952,-79.387383,80,2.0
4,3,Christie,M6G,Downtown Toronto,43.669542,-79.422564,112,1.0
5,1,"Richmond, Adelaide, King",M5H,Downtown Toronto,43.650571,-79.384568,144,0.0
6,3,"Dufferin, Dovercourt Village",M6H,West Toronto,43.669005,-79.442259,80,2.0
7,1,"Harbourfront East, Union Station, Toronto Islands",M5J,Downtown Toronto,43.640816,-79.381752,102,1.0
8,3,"Little Portugal, Trinity",M6J,West Toronto,43.647927,-79.41975,93,2.0
9,0,"The Danforth West, Riverdale",M4K,East Toronto,43.679557,-79.352188,32,1.0


In [196]:
map_clusters = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighbourhood, cluster in zip(BikeClustered1['Latitude'], BikeClustered1['Longitude'], BikeClustered1['Neighbourhood'], BikeClustered1['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [212]:
TDotClustered = BikeClustered1.groupby(['Cluster Labels']).agg({'Bike_Rack_Capacity':'sum','Store_Count':'sum'})
TDotClustered.columns = [c.replace(' ', '_') for c in TDotClustered.columns]
TDotClustered

Unnamed: 0_level_0,Bike_Rack_Capacity,Store_Count
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,166,4.0
1,678,10.0
2,112,0.0
3,343,8.0


In [222]:
BikeClustered1.columns = [c.replace(' ', '_') for c in BikeClustered1.columns]
BikeClustered1.loc[BikeClustered1['Cluster_Labels']== 2]

Unnamed: 0,Cluster_Labels,Neighbourhood,Postal_Code,Borough,Latitude,Longitude,Bike_Rack_Capacity,Store_Count
14,2,"North Toronto West, Lawrence Park",M4R,Central Toronto,43.715383,-79.405678,16,0.0
16,2,Davisville,M4S,Central Toronto,43.704324,-79.38879,24,0.0
18,2,"Moore Park, Summerhill East",M4T,Central Toronto,43.689574,-79.38316,32,0.0
21,2,Rosedale,M4W,Downtown Toronto,43.679563,-79.377529,40,0.0


To help narrow down the neighbourhoods and assess Toronto into larger chucks, the data was separated into 4 clusters.
Based on Clustering, the Northmost cluster in Torono may be seen as an ideal area to open a bike shop.  There are zero close by stores to that cluster.   When clustered, it appears there would be the greatest ratio of riders relative to store count (Assuming that the city installs bike racks based on where ridership is highest).

However, if you take a look at the the earlier data, the "Richmond, Adelaide, King" neighbourhood actually has more total bike racks than all of the north most cluster (144 to 112) and also has zero bike stores within the defined radius.   So while it does appear to be the most underserved area, if you look at the larger neighbourhood based on zones there is more competition as there are other nearby stores.  
