# CityBikes

In [1]:
# import relevant libraries and functions
import pandas as pd
import pprint
from pandas import json_normalize
from datetime import datetime
import pytz

Send a request to CityBikes for the city of your choice. <br>

Vancouver, Canada was selected as the city for this project.<br>
The bikeshare company information available through City Bikes API is from Mobi Bikes.<br>

To find relevant 'href', ctrl+F "city name" in the following url: http://api.citybik.es/v2/networks

In [None]:
# Use Vancouver City Bikes href for 'mobibikes'
url = r'https://api.citybik.es/v2/networks/mobibikes'

**City Bikes Query Results:** <br>
Due to the temporal nature of the study, queries were sent hourly from 5pm to 3am Pacific Time, with each result being saved to a new assignment name which included the time of the query. <br> Unfortunately the 11pm timepoint was overwritten and therefore the data was lost. <br> 
In future projects with temporal data requirements, it will be useful to define a function which can be applied to perform this activity automatically.<br>
This will help to reduce the burden of manual querying and reduce issues such as the overwrite error that occured at the 11pm timepoint.<br>

The time points obtained include: <br>
5pm <br>
6pm <br>
7pm <br>
8pm <br>
9pm <br>
10pm <br>
'<br>
12am <br>
1am <br>
2am <br>
3am <br>

Below is an example of the hourly query code for the 5pm PT timepoint.

In [None]:
result_5pm_vancouver_bikeshare = requests.get(url)
print(result_5pm_vancouver_bikeshare)

In [None]:
#pprint can be used to view the results of the query in a manner that helps the viewer determine the relationships within the json data
pprint.pprint(result_5pm_vancouver_bikeshare.json())

Parse through the response to get the details you want for the bike stations in that city. <br>
Put your parsed results into a DataFrame.<br>

The following columns were created using the below code:<br>
'city', 'latitude', 'longitude', 'name', 'id', 'empty_slots', 'ebikes', 'normal_bikes', 'slots', 'free_bikes', 'timestamp' <br>

In [None]:
# Extracting relevant data from the JSON response
stations_data = json_data['network']['stations']
network_data = json_data['network']['location']

# Creating a DataFrame
df = pd.DataFrame(stations_data)

# Adding network information to each row in the DataFrame
df['city'] = network_data['city']

# Reordering columns
df = df[['city', 'latitude', 'longitude', 'name', 'id', 'empty_slots', 'extra', 'free_bikes', 'timestamp']]

# Extracting additional information from the 'extra' column using lambda
# df['ebikes'] = df['extra'].apply(lambda x: x['ebikes'])
# df['normal_bikes'] = df['extra'].apply(lambda x: x['normal_bikes'])
# df['slots'] = df['extra'].apply(lambda x: x['slots'])

# Normalize the 'extra' column to extract ebikes, normal_bikes and slots information
extra_info = json_normalize(df['extra'])
df['ebikes'] = extra_info['ebikes']
df['normal_bikes'] = extra_info['normal_bikes']
df['slots'] = extra_info['slots']

# Dropping the 'extra' column
df = df.drop(columns=['extra'])

# Convert timestamp to Pacific Time 
# df['timestamp'] = pd.to_datetime(df['timestamp']).dt.tz_convert('America/Los_Angeles')
# use format='ISO8601' to solve error due to certain timestamps containing smaller milisecond values
df['timestamp'] = pd.to_datetime(df['timestamp'], format='ISO8601').dt.tz_convert('America/Los_Angeles')

In [None]:
# The head of the dataframe can be viewed as follows to ensure accurate creation and population of columns.
print(df.head(3))

Each data frame was saved as a .pkl file with a corresponding name including the query time.

In [None]:
df.to_pickle("result_5pm_vancouver_bikeshare.pkl")