Urban Data Science & Smart Cities <br>
URSP688Y <br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

[<img src="https://colab.research.google.com/assets/colab-badge.svg">](https://colab.research.google.com/github/ncsg/ursp688y_sp2024/blob/main/exercises/exercise06/exercise06.ipynb)

# Exercise 6

## Problem

In class this week, we saw how to access real-time data about Capital Bikeshare from the internet using their API. We also dealt with the challenge of wrangling those data. We needed to parse a JSON file into a table, and we considered how we might retrieve, store, and combine many JSONs in order to understand how bike availability changed over time.

These real-time data can help us answer questions about how well Captial Bikeshare is being utilized.

See if you can use data from the API (I have already stored and combined it--see below) answer these questions:
- How many bikes were available within the system during each hour over a 24 hour period?
    - Can you graph this over time?
    - Which hour of the day were bikes most available? Least available?

**Bonus:** Can you write a function to estimate how many bikes are <ins>currently being used</ins>, whenever you call the function? This will require loading real-time data from the API and comparing it to stored data.

## Data

I wrote a script, which you can see [here](https://github.com/ncsg/ursp688y_sp2024/blob/main/demos/demo06/cabi_data/get_cabi_free_bikes.py), to retrieve and store JSON data from the `free_bike_status` table in [Capital Bikeshare's](https://capitalbikeshare.com/system-data) GBFS feed every 5 minutes. I ran this script on my computer for a bit more than 24 hours. ([Here's a tutorial](https://realpython.com/run-python-scripts/) on running scripts on the command line, if you're curious.) All of those JSONS are available for you to use. They're stored at [`ursp688y_sp2024/demos/demo06/cabi_data`](https://github.com/ncsg/ursp688y_sp2024/tree/main/demos/demo06/cabi_data).

## Building Off of the Demo

The in-class demo gave us a starting point for how to access real-time JSON data from the API, load saved JSON data, and parse JSON data into a DataFrame.

I have copied what we did in class below and added onto it to develop a single tidy dataframe with records from all the saved JSONs, plus timestamps. This should be all the data you need for the questions above (except the bonus).

See if you can follow my code, then build onto it.

As usual, please wrap the code for your solution in a function, and put that function into a module (you can add to my module, or make a new one if you prefer). Then load your main function from the module and call it in the notebook to demonstrate your solution.


# Setup

In [None]:
# Import packages
import os
import json
import requests
import pandas as pd

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [3]:
# Set the working directory
# You will need to change this to your own folder on Google Drive
os.chdir('/content/drive/MyDrive/Teaching/URSP688Y Spring 2024/Exercise Demos/exercise06')

In [39]:
# Import module
import exercise06

# Request current data from the API

In [4]:
# Making a get request
response = requests.get('https://gbfs.lyft.com/gbfs/1.1/dca-cabi/en/free_bike_status.json')

# Get JSON content
data = response.json()

# Inspect the contents
data.keys()

dict_keys(['data', 'last_updated', 'ttl', 'version'])

In [5]:
# Make a dataframe out of data for available bikes
df = pd.DataFrame(data['data']['bikes'])

df.head()

Unnamed: 0,type,fusion_lat,lon,bike_id,is_disabled,is_reserved,rental_uris,fusion_lon,name,lat
0,electric_bike,0.0,-76.940127,fc02aae4aca57dbd6414368e3cea29e1,0,0,{'android': 'https://dc.lft.to/lastmile_qr_sca...,0.0,570-760,38.955433
1,electric_bike,0.0,-77.049662,00677c29d7827cdbc8d9ce020e430952,0,0,{'android': 'https://dc.lft.to/lastmile_qr_sca...,0.0,268-224,38.896053
2,electric_bike,0.0,-77.045894,32fb0c045ba029b131e89846fbb47ddc,0,0,{'android': 'https://dc.lft.to/lastmile_qr_sca...,0.0,329-768,38.902572
3,electric_bike,0.0,-77.147678,2d698429b86315064e6211d75ade27b9,0,0,{'android': 'https://dc.lft.to/lastmile_qr_sca...,0.0,201-455,38.863933
4,electric_bike,0.0,-77.002738,c704f9f6475b1d3decaac6f2566a7822,0,0,{'android': 'https://dc.lft.to/lastmile_qr_sca...,0.0,361-062,38.960355


# Load JSON data saved in a file

In [6]:
# open a single stored json
with open('cabi_data/cabi_bike_status_2024-03-03_13-11-54.json') as json_data: # Notice how I added 'cabi_data/' to the front of the path to get into that subdirectory where the jsons are stored?
    data = json.load(json_data)
    json_data.close()

In [8]:
# see how the data are stored
type(data)

dict

In [10]:
# see what keys are available
data.keys()

dict_keys(['data', 'last_updated', 'ttl', 'version'])

In [11]:
# drill into the records for each bike
records = data['data']['bikes']

# convert to a dataframe
df = pd.DataFrame(records)

# drop a column that we won't use, just to keep things clean
df = df.drop(columns=['rental_uris'])

In [12]:
df.head()

Unnamed: 0,is_reserved,fusion_lon,fusion_lat,lat,type,is_disabled,bike_id,name,lon
0,0,0.0,0.0,38.887458,electric_bike,0,d94788433d337e4186fb431076b52e91,320-065,-77.025747
1,0,0.0,0.0,38.905328,electric_bike,0,cc49246f85fdc23a6a13b3402ab52b37,222-581,-77.058526
2,0,0.0,0.0,38.908954,electric_bike,0,75c5df17a8236707a7948f509a5ab929,228-812,-77.043055
3,0,0.0,0.0,38.955421,electric_bike,0,4e51a79c1e03962064762ff16013b1a8,570-760,-76.940135
4,0,0.0,0.0,38.892292,electric_bike,0,5983a1b66f086f7905d8aa701fa7b5df,268-224,-77.042912


# Iteratively load all the JSON files and combine them into a single dataframe

Except for the impact statements above, this is probably the only part of the code you'll need to keep. This function wraps all the loading steps. Feel free to delete the cells above if you're not using them.

In [40]:
df = exercise06.load_and_combine_free_bike_status_jsons_as_df('cabi_data')

df.head()

Unnamed: 0,is_reserved,fusion_lon,fusion_lat,lat,type,is_disabled,bike_id,name,lon,timestamp
0,0,0.0,0.0,38.887454,electric_bike,0,004adeb2565b0b8af16a6f3d2b8ad722,320-065,-77.0257,2024-03-04 00:27:20-05:00
1,0,0.0,0.0,38.955455,electric_bike,0,6cd9e2fb1a21dbfef272e9c124fe24cf,570-760,-76.940044,2024-03-04 00:27:20-05:00
2,0,0.0,0.0,38.881496,electric_bike,0,bab1bca067cc7519c1a62a62b01ba295,268-224,-77.0274,2024-03-04 00:27:20-05:00
3,0,0.0,0.0,38.907839,electric_bike,0,1f719ba71dea69a3374262a0774e1499,137-726,-77.071575,2024-03-04 00:27:20-05:00
4,0,0.0,0.0,38.898333,electric_bike,0,01ad43dc4c3bf21915eb6eaf244e6fce,329-768,-77.046828,2024-03-04 00:27:20-05:00


This is where you take over. Can you use this dataframe to answer the question(s) above?