#### API Endpoints: 

* ~~system_information~~ (Phone number, timezone, etc.)
* ~~system_hours~~
* ~~system_calendar~~
* ~~system_alerts~~ 

**[index](https://gbfs.capitalbikeshare.com/gbfs/gbfs.json)**


**[station_information](https://gbfs.capitalbikeshare.com/gbfs/en/station_information.json)**

```json
stations:
  0: 
    legacy_id:	                    "592"
    external_id:                    "8bbafd08-edb1-426d-94f4-c7b7d9be34dd"
    capacity:	                    19
    electric_bike_surcharge_waiver: false
    rental_methods:	
      0:	                    "KEY"
      1:	                    "CREDITCARD"
    short_name:	                    "31528"
    station_type:	            "classic"
    has_kiosk:	                    true
    lon:	                    -76.997226
    station_id:	                    "8bbafd08-edb1-426d-94f4-c7b7d9be34dd"
    eightd_has_key_dispenser:	    false
    lat:	                    38.938889
    name:	                    "John McCormack Rd NE"
    eightd_station_services:	    []
    rental_uris:	
      ios:	                    "https://dc.lft.to/lastmile_qr_scan"
      android:	                    "https://dc.lft.to/lastmile_qr_scan"
    region_id:	                    "42"
last_updated:                       1678656476
```

**[station_status](https://gbfs.capitalbikeshare.com/gbfs/en/station_status.json)**

```json 
station:
  0:	
    num_docks_available:      13
    is_renting:	              1
    station_status:	      "active"
    num_scooters_unavailable: 0
    num_bikes_disabled:	      0
    num_docks_disabled:	      0
    num_ebikes_available:     0
    station_id:	              "08263959-1f3f-11e7-bf6b-3863bb334450"
    is_installed:	      1
    last_reported:	      1678665956
    legacy_id:	              "436"
    num_bikes_available:      1
    eightd_has_available_keys:  false
    is_returning:	      1
    num_scooters_available:   0
last_updated:                 1678666062
```

**[free_bike_status](https://gbfs.capitalbikeshare.com/gbfs/en/free_bike_status.json)**

```json
bikes:
  0:	
    lat:	 38.96772166666667
    fusion_lon:	 0
    lon:	-77.06762216666667
    is_disabled: 0
    fusion_lat:	 0
    name:	 "0586677e1742be5dc0888aaa81ce885a"
    type:	 "electric_bike"
    is_reserved: 0
    rental_uris:	
      android:	 "https://dc.lft.to/lastmile_qr_scan"
      ios:	 "https://dc.lft.to/lastmile_qr_scan"
    bike_id:     "0586677e1742be5dc0888aaa81ce885a"
last_updated:	1678667011
```

**[system_regions](https://gbfs.capitalbikeshare.com/gbfs/en/system_regions.json)** (Regions dimension table)

```json
regions:
  0:	
    region_id:	  "40"
    name:	  "Alexandria, VA"
  last_updated:	  1678667476
```


In [2]:
import json 
import requests
import pandas as pd
from bs4 import BeautifulSoup
from time import sleep
import datetime as dt 

#### Scraping Dimension Tables

These tables are slow-changing so we will schedule these to be loaded once daily. 

##### station_information


In [7]:
url = "https://gbfs.capitalbikeshare.com/gbfs/en/station_information.json"

response = requests.get(url)

response_json = response.json()


In [8]:
station_info_df = pd.json_normalize(response_json['data']['stations'])

station_info_df

Unnamed: 0,capacity,eightd_has_key_dispenser,region_id,lon,station_type,station_id,eightd_station_services,rental_methods,lat,short_name,external_id,has_kiosk,legacy_id,name,electric_bike_surcharge_waiver,rental_uris.android,rental_uris.ios
0,19,False,42,-77.007800,classic,6d5ad96d-a704-4fa6-8b65-3ac643c5aa93,[],"[KEY, CREDITCARD]",38.915000,31523,6d5ad96d-a704-4fa6-8b65-3ac643c5aa93,True,506,Lincoln Rd & Seaton Pl NE/Harry Thomas Rec Center,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
1,14,False,104,-77.356324,classic,08263959-1f3f-11e7-bf6b-3863bb334450,[],"[KEY, CREDITCARD]",38.960574,32212,08263959-1f3f-11e7-bf6b-3863bb334450,True,436,New Dominion Pkwy & Fountain Dr,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
2,12,False,104,-77.341823,classic,c0ec45a3-ec59-4c82-9671-13d9c122be30,[],"[KEY, CREDITCARD]",38.983199,32256,c0ec45a3-ec59-4c82-9671-13d9c122be30,True,702,North Village and Park Garden,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
3,15,False,42,-77.011915,classic,84a0159b-5f00-417a-8f06-6c7c8437049f,[],"[KEY, CREDITCARD]",38.870824,31676,84a0159b-5f00-417a-8f06-6c7c8437049f,True,84a0159b-5f00-417a-8f06-6c7c8437049f,1st & Q St SW,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
4,19,False,42,-77.074323,classic,7a19d0e7-2b3c-4805-b843-1aec84a05bfe,[],"[KEY, CREDITCARD]",38.912614,31325,7a19d0e7-2b3c-4805-b843-1aec84a05bfe,True,608,Reservoir Rd & 38th St NW,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
716,19,False,42,-77.038627,classic,770,[],"[CREDITCARD, KEY]",38.906767,31210,affeb890-08dc-46d2-9f8a-3eedf4452640,True,770,17th St & Rhode Island Ave NW,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
717,15,False,42,-77.017585,classic,771,[],"[CREDITCARD, KEY]",38.872964,31678,9c6fb0a8-d371-433f-b52e-ca47881156b3,True,771,4th & O St SW,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
718,15,False,42,-76.990157,classic,772,[],"[CREDITCARD, KEY]",38.943596,31542,e800b2e8-5d8c-4ca1-920b-9622fa2a87ea,True,772,12th & Varnum St NE,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
719,19,False,42,-77.026747,classic,773,[],"[CREDITCARD, KEY]",38.976061,31425,f8b331c6-d208-4b17-b788-1a17d2520c40,True,773,Georgia Ave & Dahlia St NW,False,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan


In [10]:
station_info_df.station_id.value_counts().sort_values()

6d5ad96d-a704-4fa6-8b65-3ac643c5aa93    1
a2507ccc-aca1-48b2-a6a3-a3144ad92d21    1
0825164b-1f3f-11e7-bf6b-3863bb334450    1
c0ec45a3-ec59-4c82-9671-13d9c122be30    1
84a0159b-5f00-417a-8f06-6c7c8437049f    1
                                       ..
768                                     1
770                                     1
771                                     1
773                                     1
774                                     1
Name: station_id, Length: 721, dtype: int64

##### system_regions 

In [103]:
url = "https://gbfs.capitalbikeshare.com/gbfs/en/system_regions.json"

response = requests.get(url)

response_json = response.json()

In [104]:
regions_df = pd.json_normalize(response_json['data']['regions'])
regions_df

Unnamed: 0,region_id,name
0,40,"Alexandria, VA"
1,41,"Arlington, VA"
2,42,"Washington, DC"
3,43,"Montgomery County, MD (North)"
4,44,"Montgomery County, MD (South)"
5,48,Test & Operations
6,104,"Fairfax, VA"
7,128,8D
8,133,Prince George's County
9,152,"Falls Church, VA"


#### Scraping Fact Tables 

The fact tables are updated in ten-second intervals so we will handle these using a streaming CDC process.

##### free_bike_status

In [5]:
url = "https://gbfs.capitalbikeshare.com/gbfs/en/free_bike_status.json"

response = requests.get(url)

response_json = response.json()

In [6]:
free_bikes_df = pd.json_normalize(response_json['data']['bikes'])

free_bikes_df

Unnamed: 0,lon,bike_id,is_reserved,is_disabled,fusion_lat,name,type,lat,fusion_lon,rental_uris.ios,rental_uris.android
0,-77.050627,a9954c8d1a570e361fb2b1b2e1770d75,0,0,0,a9954c8d1a570e361fb2b1b2e1770d75,electric_bike,38.896640,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
1,-77.053469,fb1cf56699cf9a199a1ab69a7f6c2341,0,0,0,fb1cf56699cf9a199a1ab69a7f6c2341,electric_bike,38.861898,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
2,-77.018761,4db64b2e16dd4483ceb06c8b07485487,0,0,0,4db64b2e16dd4483ceb06c8b07485487,electric_bike,38.923844,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
3,-77.062486,20de9cef4daab093df237f10cc3c8363,0,0,0,20de9cef4daab093df237f10cc3c8363,electric_bike,38.845469,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
4,-77.110358,139df920cb6f15ff3d23c4498166919d,0,0,0,139df920cb6f15ff3d23c4498166919d,electric_bike,38.893767,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
...,...,...,...,...,...,...,...,...,...,...,...
203,-77.029520,af94838fadacd51decdadcb167e9f333,0,0,0,af94838fadacd51decdadcb167e9f333,electric_bike,38.906505,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
204,-77.048573,10f6a8d7615a5e94a126b7d607626426,0,0,0,10f6a8d7615a5e94a126b7d607626426,electric_bike,38.864160,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
205,-77.056567,4606163101d64ed2e969915706b46733,0,0,0,4606163101d64ed2e969915706b46733,electric_bike,38.806326,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan
206,-77.016713,7dbd826097b5c12e810f2bc5402df8f1,0,0,0,7dbd826097b5c12e810f2bc5402df8f1,electric_bike,38.881961,0,https://dc.lft.to/lastmile_qr_scan,https://dc.lft.to/lastmile_qr_scan


**CDC Structure** 

Get latest free_bike_status data (latest_data) from data table: {\<bike_id\>:\<data\>} where last_updated == max(last_updated)
* Get API data as JSON (new_data)
* Compare latest_data and new_data:
  *  For row in new_data:
     *  propagate "last_updated": row["last_updated"] = new_data["last_updated"] 
     *  if row["bike_id"] not in latest_data.keys():
        *  append to output
        *  append to latest_data
        *  log as new row 
     * elif row != latest_data["bike_id"] (i.e. row has updates):
       * append to output 
       * update latest_data: latest_data\[row["bike_id"\]\] = row["bike_id"]
* Load output to data table (if output non-empty)


##### station_status


In [107]:
url = "https://gbfs.capitalbikeshare.com/gbfs/en/station_status.json"

response = requests.get(url)

response_json = response.json()

In [108]:
station_status_df = pd.json_normalize(response_json['data']['stations'])
station_status_df

Unnamed: 0,num_scooters_available,last_reported,station_id,num_docks_disabled,num_docks_available,num_scooters_unavailable,num_ebikes_available,num_bikes_available,station_status,legacy_id,is_returning,is_renting,eightd_has_available_keys,num_bikes_disabled,is_installed
0,0.0,1678724640,94b1507c-7f76-483c-b06a-a4131e0b5970,0,5,0.0,0,14,active,94b1507c-7f76-483c-b06a-a4131e0b5970,1,1,False,0,1
1,0.0,1678724645,08263fbd-1f3f-11e7-bf6b-3863bb334450,0,10,0.0,0,1,active,447,1,1,False,0,1
2,0.0,1678724651,082522f0-1f3f-11e7-bf6b-3863bb334450,0,1,0.0,0,32,active,169,1,1,False,2,1
3,0.0,1678724652,0136af02-2670-4042-b5cd-18b46c15c0c1,0,7,0.0,0,5,active,680,1,1,False,0,1
4,0.0,1678724651,12b1c990-90e7-4fef-a5b9-c78d4c81af96,0,15,0.0,1,4,active,659,1,1,False,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
716,,1678724436,770,0,1,,1,18,active,770,1,1,False,0,1
717,,1678721429,771,0,3,,0,12,active,771,1,1,False,0,1
718,,1678697860,772,0,7,,0,8,active,772,1,1,False,0,1
719,,1678697859,773,0,13,,0,6,active,773,1,1,False,0,1


In [109]:
# ID Columns severely messed up
station_status_df.filter(regex="_id", axis=1)

Unnamed: 0,station_id,legacy_id
0,94b1507c-7f76-483c-b06a-a4131e0b5970,94b1507c-7f76-483c-b06a-a4131e0b5970
1,08263fbd-1f3f-11e7-bf6b-3863bb334450,447
2,082522f0-1f3f-11e7-bf6b-3863bb334450,169
3,0136af02-2670-4042-b5cd-18b46c15c0c1,680
4,12b1c990-90e7-4fef-a5b9-c78d4c81af96,659
...,...,...
716,770,770
717,771,771
718,772,772
719,773,773


**CDC structure**

* Get latest station_status data (latest_data) from data table: {\<station_id\>:\<data\>} where last_updated == max(last_updated)
* Get API data as JSON (new_data)
* Compare latest_data and new_data:
  *  For row in new_data:
     *  propagate "last_updated": row["last_updated"] = new_data["last_updated"] 
     *  if row["station_id"] not in latest_data.keys():
        *  append to output
        *  append to latest_data
        *  log as new row 
     * elif row != latest_data["station_id"] (i.e. row has updates):
       * append to output 
       * update latest_data: latest_data\[row["station_id"\]\] = row["station_id"]
* Load output to data table (if output non-empty)