# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project, we will attempt to identify the optimal location for  new **bike sharing stations** in **Dublin, Ireland**. 

We will look at the profile of venues in neighbourhoods with existing bike sharing stations and use this to find other similar neighbourhoods. We will also examine the popularity of exisiting bike sharing stations to identify key factors.

We will use the analaysis to generate recommended locations for new **bike sharing stations** which will be of use and interest to the local governement and also private sector bike share operators.

## Data <a name="data"></a>

The following are the key data we will need for the analysis:

* [Bike sharing station location and use data]
* [Neighbourhood venue data]


This data will be retrieved from the following sources:
* **Dublinbike Data** provided by Dublin City Council on Ireland's open data portal
* Neighbourhood venue data from **Foursquare API**. 

### Bike Sharing Station Data

First of all, let's get the basic information on exisiting bike sharing stations in Dublin City. This is available on Ireland's open data portal - https://data.gov.ie/. The usage data provided by Dublin City Council is available on a quarterly basis. We will examone Q3 2019 data as this is before the COVID-19 pandemic. Movement restrctiions during the pandemic are likely to have impacted on bike usage which may not give us an accurate picture.

In [1]:
import pandas as pd
import requests 


!pip install folium
import folium

print('Folium installed and imported!')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.6 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Folium installed and imported!


In [2]:
url = 'https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/305d39ac-b6a0-4216-a535-0ae2ddf59819/download/dublinbikes_20190701_20191001.csv'

bike_use = pd.read_csv(url)

In [3]:
bike_use.head()

Unnamed: 0,STATION ID,TIME,LAST UPDATED,NAME,BIKE STANDS,AVAILABLE BIKE STANDS,AVAILABLE BIKES,STATUS,ADDRESS,LATITUDE,LONGITUDE
0,2,2019-07-01 00:00:03,2019-06-30 23:52:13,BLESSINGTON STREET,20,18,2,Open,Blessington Street,53.35677,-6.26814
1,2,2019-07-01 00:05:02,2019-07-01 00:02:22,BLESSINGTON STREET,20,18,2,Open,Blessington Street,53.35677,-6.26814
2,2,2019-07-01 00:10:02,2019-07-01 00:02:22,BLESSINGTON STREET,20,18,2,Open,Blessington Street,53.35677,-6.26814
3,2,2019-07-01 00:15:03,2019-07-01 00:12:31,BLESSINGTON STREET,20,18,2,Open,Blessington Street,53.35677,-6.26814
4,2,2019-07-01 00:20:03,2019-07-01 00:12:31,BLESSINGTON STREET,20,18,2,Open,Blessington Street,53.35677,-6.26814


In [4]:
bike_use.shape

(2974314, 11)

We can see from an intial look at the data that the  information on exisiting bike sharing stations is useful. Each station has a unique ID under the "Number" column and the address and locaton data are provided. The number of available bike stands and available bikes at any given time is also recorded. We can examine this to determine the usage levels or popularity of each station. 

Next we will look at how many bike stations there are and view these on a map.

In [5]:
bike_use_loc = bike_use.drop_duplicates(subset = 'STATION ID')

In [6]:
bike_use_loc.shape

(112, 11)

In [7]:
dub_map = folium.Map(location=[53.3342, -6.2675], zoom_start =12.5)

In [8]:
for i in range(0,len(bike_use_loc)):
   folium.Marker(
      location=[bike_use_loc.iloc[i]['LATITUDE'], bike_use_loc.iloc[i]['LONGITUDE']],
      popup=bike_use_loc.iloc[i]['NAME'],
   ).add_to(dub_map)


dub_map


Next we will analayse the data to determine availability rates or popularity of bike stations. 

In [9]:
#Making an availability rate column for purposes of comparison

bike_use = bike_use.drop(columns='LAST UPDATED')

bike_use['TIME'] = pd.to_datetime(bike_use['TIME']) 

bike_use['Available Rate'] = bike_use['AVAILABLE BIKES'] / bike_use['BIKE STANDS']

#0.9 availability rate will mean 90% of bikes are available at that time

In [10]:
bike_use.tail()

Unnamed: 0,STATION ID,TIME,NAME,BIKE STANDS,AVAILABLE BIKE STANDS,AVAILABLE BIKES,STATUS,ADDRESS,LATITUDE,LONGITUDE,Available Rate
2974309,115,2019-10-01 23:35:02,KILLARNEY STREET,30,0,30,Open,Killarney Street,53.354843,-6.247579,1.0
2974310,115,2019-10-01 23:40:02,KILLARNEY STREET,30,0,30,Open,Killarney Street,53.354843,-6.247579,1.0
2974311,115,2019-10-01 23:45:02,KILLARNEY STREET,30,0,30,Open,Killarney Street,53.354843,-6.247579,1.0
2974312,115,2019-10-01 23:50:04,KILLARNEY STREET,30,0,30,Open,Killarney Street,53.354843,-6.247579,1.0
2974313,115,2019-10-01 23:55:02,KILLARNEY STREET,30,0,30,Open,Killarney Street,53.354843,-6.247579,1.0


In [11]:
# Want to do available rate calculation for all entries of a bile station. 
#i.e for this quarter I want to know the avg available rate

mean_avail = bike_use.groupby('STATION ID', as_index=False)['Available Rate'].mean()

In [12]:
mean_avail.head()

Unnamed: 0,STATION ID,Available Rate
0,2,0.260541
1,3,0.345654
2,4,0.422828
3,5,0.337016
4,6,0.296613


In [13]:
bike_core = bike_use.drop_duplicates(subset = 'STATION ID')

In [14]:
bike_core.shape

(112, 11)

In [15]:
bike_core = pd.merge(bike_core, mean_avail, on = 'STATION ID')

In [16]:
bike_core.head()

Unnamed: 0,STATION ID,TIME,NAME,BIKE STANDS,AVAILABLE BIKE STANDS,AVAILABLE BIKES,STATUS,ADDRESS,LATITUDE,LONGITUDE,Available Rate_x,Available Rate_y
0,2,2019-07-01 00:00:03,BLESSINGTON STREET,20,18,2,Open,Blessington Street,53.35677,-6.26814,0.1,0.260541
1,3,2019-07-01 00:00:03,BOLTON STREET,20,14,5,Open,Bolton Street,53.351181,-6.269859,0.25,0.345654
2,4,2019-07-01 00:00:03,GREEK STREET,20,3,17,Open,Greek Street,53.346874,-6.272976,0.85,0.422828
3,5,2019-07-01 00:00:03,CHARLEMONT PLACE,40,3,37,Open,Charlemont Street,53.330662,-6.260177,0.925,0.337016
4,6,2019-07-01 00:00:03,CHRISTCHURCH PLACE,20,14,6,Open,Christchurch Place,53.343369,-6.27012,0.3,0.296613


In [17]:
#Section can be tidied up and drop columns put together


#bike_core = bike_core.drop(columns = 'Available Rate_x')

bike_core = bike_core.drop(columns = ['AVAILABLE BIKE STANDS','AVAILABLE BIKES', 'Available Rate_x', 'TIME'])

bike_core = bike_core.rename(columns = {'Available Rate_y':'AVAILABILITY RATE'})

bike_core.drop(32, inplace = True) #Dropping closed station

bike_core = bike_core.reset_index()

bike_core = bike_core.drop(columns = 'index')



In [18]:
#Now have a key DF which shows bike stations ordered by popularity - a low availability rate means that there is a 
# low number of bikes available at that station. We also have address + lat and long we can work with

bike_core.head()

Unnamed: 0,STATION ID,NAME,BIKE STANDS,STATUS,ADDRESS,LATITUDE,LONGITUDE,AVAILABILITY RATE
0,2,BLESSINGTON STREET,20,Open,Blessington Street,53.35677,-6.26814,0.260541
1,3,BOLTON STREET,20,Open,Bolton Street,53.351181,-6.269859,0.345654
2,4,GREEK STREET,20,Open,Greek Street,53.346874,-6.272976,0.422828
3,5,CHARLEMONT PLACE,40,Open,Charlemont Street,53.330662,-6.260177,0.337016
4,6,CHRISTCHURCH PLACE,20,Open,Christchurch Place,53.343369,-6.27012,0.296613


With the data cleaned up to focus on core information, we can now see that there are 111 bike sharing stations in Dublin. Let's take an intial look at these on a mapm including visualising the populairity of stations:

In [20]:


!pip install folium

import folium

print('Folium installed and imported!')


  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Folium installed and imported!


In [21]:
bike_core.sort_values('AVAILABILITY RATE')

Unnamed: 0,STATION ID,NAME,BIKE STANDS,STATUS,ADDRESS,LATITUDE,LONGITUDE,AVAILABILITY RATE
99,104,GRANGEGORMAN LOWER (CENTRAL),40,Open,Grangegorman Lower (Central),53.355171,-6.278424,0.090865
98,103,GRANGEGORMAN LOWER (SOUTH),40,Open,Grangegorman Lower (South),53.354664,-6.278681,0.127176
100,105,GRANGEGORMAN LOWER (NORTH),36,Open,Grangegorman Lower (North),53.355953,-6.278378,0.129797
27,30,PARNELL SQUARE NORTH,20,Open,Parnell Square North,53.353462,-6.265305,0.137483
74,79,ECCLES STREET EAST,27,Open,Eccles Street East,53.358116,-6.265601,0.164507
...,...,...,...,...,...,...,...,...
110,115,KILLARNEY STREET,30,Open,Killarney Street,53.354843,-6.247579,0.595954
95,100,HEUSTON BRIDGE (SOUTH),25,Open,Heuston Bridge (South),53.347107,-6.292041,0.599417
87,92,HEUSTON BRIDGE (NORTH),40,Open,Heuston Bridge (North),53.347801,-6.292432,0.618750
38,42,SMITHFIELD NORTH,30,Open,Smithfield North,53.349564,-6.278198,0.621344


In [22]:
bike_core['MARKER COLOUR'] = pd.cut(bike_core['AVAILABILITY RATE'], bins=4, 
                              labels=['green', 'blue', 'orange', 'red'])

In [23]:

dub_maps = folium.Map(location=[53.3498, -6.2603], zoom_start =13)

for index, row in bike_core.iterrows():
    folium.CircleMarker([row['LATITUDE'], row['LONGITUDE']],
                    radius=8, color=row['MARKER COLOUR']).add_to(dub_maps)
dub_maps

On the above maps we can now see the location of exisiting bike sharing stations and their popularity. The top 25% of stations by popularity are indicated in green, the next 25% most popular in blue then orange with the least popular 25% in red.

### Foursquare API

Now that we have our bike sharing stations locations, let us take a look at what venues are near them using the FourSquare API.

In [29]:
bike_latitude = bike_core['LATITUDE'] # neighborhood latitude value
bike_longitude = bike_core['LONGITUDE'] # neighborhood longitude value


bike_latitude[11]

53.336075

In [24]:
CLIENT_ID = 'DAGWZX2FQ3QQSW3BGYYIYK1NQXQBATZFWSF1JNWLL4QDVV00' # your Foursquare ID
CLIENT_SECRET = 'G2NADRJEQVHHG3WVNDVEJVSVFGITI54VVRRPSXJRQ1Q3ITXI' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DAGWZX2FQ3QQSW3BGYYIYK1NQXQBATZFWSF1JNWLL4QDVV00
CLIENT_SECRET:G2NADRJEQVHHG3WVNDVEJVSVFGITI54VVRRPSXJRQ1Q3ITXI


In [26]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    bike_latitude, 
    bike_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=DAGWZX2FQ3QQSW3BGYYIYK1NQXQBATZFWSF1JNWLL4QDVV00&client_secret=G2NADRJEQVHHG3WVNDVEJVSVFGITI54VVRRPSXJRQ1Q3ITXI&v=20180605&ll=0      53.356770\n1      53.351181\n2      53.346874\n3      53.330662\n4      53.343369\n         ...    \n106    53.356716\n107    53.357841\n108    53.338615\n109    53.333652\n110    53.354843\nName: LATITUDE, Length: 111, dtype: float64,0     -6.268140\n1     -6.269859\n2     -6.272976\n3     -6.260177\n4     -6.270120\n         ...   \n106   -6.256359\n107   -6.251557\n108   -6.248606\n109   -6.248345\n110   -6.247579\nName: LONGITUDE, Length: 111, dtype: float64&radius=500&limit=100'

In [27]:
results = requests.get(url).json()
results

{'meta': {'code': 400,
  'errorType': 'param_error',
  'errorDetail': 'll must be of the form XX.XX,YY.YY (received 0      53.356770\n1      53.351181\n2      53.346874\n3      53.330662\n4      53.343369\n         ...    \n106    53.356716\n107    53.357841\n108    53.338615\n109    53.333652\n110    53.354843\nName: LATITUDE, Length: 111, dtype: float64,0     -6.268140\n1     -6.269859\n2     -6.272976\n3     -6.260177\n4     -6.270120\n         ...   \n106   -6.256359\n107   -6.251557\n108   -6.248606\n109   -6.248345\n110   -6.247579\nName: LONGITUDE, Length: 111, dtype: float64)',
  'requestId': '608d040e0e6a1736547740b1'},
 'response': {}}