# Module 1. Application programming interfaces: APIs
# The BART API

## Lecture objectives

1. Demonstrate different ways to access data via an API
2. Introduce the `requests` library 
3. Show how to parse JSON data

One way to get data is to download it manually from a website. You might click through a series of links and then save a `.csv` or similar file to your hard drive.

Another way is through an Application Programming Interface (API). These APIs make it possible to request just the data that you are interested in. Sometimes, this data is static—it will be the same each time you request it. The U.S. Census is a good example. Other APIs are dynamic—for example, bus and train arrival times.

## Example: BART
Many APIs return a format known as JSON. While it seems complicated, it's relatively easy to work with because it's highly structured.

BART provides its [API documentation here](https://api.bart.gov/docs/overview/index.aspx).

How do we get data from the API into Python? The `requests` library is the key to unlocking many interactions with the web.

Let's get the real-time departures from 12th St Oakland City Center station. According to the [documentation](https://api.bart.gov/docs/etd/etd.aspx), we need to pass `orig=12TH`. We can also pass `json=y` to return the results in JSON format.

Most APIs, including that of BART, require a "key" that identifies you to the developer. For serious usage, you'd request your own key. But for experimentation, BART provides a key that anyone can use. 

So how do we construct the request? We simply build a string that contains all of these commands, following the example in the documentation. Then we pass that string to `requests.get()`.

In [1]:
import requests

APIkey = 'MW9S-E7SL-26DU-VV8V'  # the key posted on BART's website
requestString = 'http://api.bart.gov/api/etd.aspx?cmd=etd&orig=12TH&json=y&key='+APIkey

r = requests.get(requestString)

Here, we've made a request to the BART API, and stored the response in the `r` object. 

Let's see what the `r` object includes. Use `r.` and the `tab` autocomplete in Jupyter Notebook to see the different attributes. Or you can call the help function.

In [2]:
help(r)

Help on Response in module requests.models object:

class Response(builtins.object)
 |  The :class:`Response <Response>` object, which contains a
 |  server's response to an HTTP request.
 |  
 |  Methods defined here:
 |  
 |  __bool__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |      
 |      This attribute checks if the status code of the response is between
 |      400 and 600 to see if there was a client error or a server error. If
 |      the status code, is between 200 and 400, this will return True. This
 |      is **not** a check to see if the response code is ``200 OK``.
 |  
 |  __enter__(self)
 |  
 |  __exit__(self, *args)
 |  
 |  __getstate__(self)
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self)
 |      Allows you to use a response as an iterator.
 |  
 |  __nonzero__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |      
 |      This attribute checks if

One attribute is `ok`: did the request succeed?

In [3]:
print(r.ok)

True


The `text` attribute shows the text that was returned.

In [4]:
print(r.text)

{"?xml":{"@version":"1.0","@encoding":"utf-8"},"root":{"@id":"1","uri":{"#cdata-section":"http://api.bart.gov/api/etd.aspx?cmd=etd&orig=12TH&json=y"},"date":"04/05/2023","time":"06:31:11 AM PDT","station":[{"name":"12th St. Oakland City Center","abbr":"12TH","etd":[{"destination":"Antioch","abbreviation":"ANTC","limited":"0","estimate":[{"minutes":"10","platform":"3","direction":"North","length":"8","color":"YELLOW","hexcolor":"#ffff33","bikeflag":"1","delay":"0","cancelflag":"0","dynamicflag":"0"},{"minutes":"25","platform":"3","direction":"North","length":"10","color":"YELLOW","hexcolor":"#ffff33","bikeflag":"1","delay":"0","cancelflag":"0","dynamicflag":"0"},{"minutes":"40","platform":"3","direction":"North","length":"10","color":"YELLOW","hexcolor":"#ffff33","bikeflag":"1","delay":"0","cancelflag":"0","dynamicflag":"0"}]},{"destination":"Berryessa","abbreviation":"BERY","limited":"0","estimate":[{"minutes":"Leaving","platform":"2","direction":"South","length":"8","color":"ORANGE","

This looks promising. But how can we get this into a more usable form? The `json` module is the key. It's built-in to Python so you don't have to install anything.

In [5]:
import json

In [8]:
# json.loads() will turn the JSON object into a dictionary
d = json.loads(r.text)
print(type(d))

<class 'dict'>


<div class="alert alert-block alert-info">
<strong>Exercise:</strong> How can you access the relevant contents of the dictionary? Hint: First, look at the keys.
</div>

In [9]:
# remember, a dictionary is a collection of keys and values
# here looks like there are two keys, called ?xml and root
# let's look at the root item
print(d.keys())
print(d['root'])

dict_keys(['?xml', 'root'])
{'@id': '1', 'uri': {'#cdata-section': 'http://api.bart.gov/api/etd.aspx?cmd=etd&orig=12TH&json=y'}, 'date': '04/05/2023', 'time': '06:31:11 AM PDT', 'station': [{'name': '12th St. Oakland City Center', 'abbr': '12TH', 'etd': [{'destination': 'Antioch', 'abbreviation': 'ANTC', 'limited': '0', 'estimate': [{'minutes': '10', 'platform': '3', 'direction': 'North', 'length': '8', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '25', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '40', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}]}, {'destination': 'Berryessa', 'abbreviation': 'BERY', 'limited': '0', 'estimate': [{'minutes': 'Leaving', '

In [10]:
# and in turn, 'root' is another dictionary. You could also have seen this because of the curly brackets { }  
print((d['root'].keys()))

dict_keys(['@id', 'uri', 'date', 'time', 'station', 'message'])


The "station" item seems to hold most of the useful information. 

It's a list.

In [11]:
print(type((d['root']['station'])))

<class 'list'>


Of length 1

In [12]:
print(len(d['root']['station']))

1


In [13]:
print(d['root']['station'][0])

{'name': '12th St. Oakland City Center', 'abbr': '12TH', 'etd': [{'destination': 'Antioch', 'abbreviation': 'ANTC', 'limited': '0', 'estimate': [{'minutes': '10', 'platform': '3', 'direction': 'North', 'length': '8', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '25', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '40', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}]}, {'destination': 'Berryessa', 'abbreviation': 'BERY', 'limited': '0', 'estimate': [{'minutes': 'Leaving', 'platform': '2', 'direction': 'South', 'length': '8', 'color': 'ORANGE', 'hexcolor': '#ff9933', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '15', 'platf

And this is another dictionary! (You can tell by the curly brackets).

Most of the information appears to be in `etd`, which is another list. 

In [14]:
print(d['root']['station'][0]['etd'])

[{'destination': 'Antioch', 'abbreviation': 'ANTC', 'limited': '0', 'estimate': [{'minutes': '10', 'platform': '3', 'direction': 'North', 'length': '8', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '25', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '40', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}]}, {'destination': 'Berryessa', 'abbreviation': 'BERY', 'limited': '0', 'estimate': [{'minutes': 'Leaving', 'platform': '2', 'direction': 'South', 'length': '8', 'color': 'ORANGE', 'hexcolor': '#ff9933', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '15', 'platform': '2', 'direction': 'South', 'length': '8', 'color': 'ORANGE

Each element of the list appears to be a dictionary, giving details of trains to a particular destination. Let's simplify things by pulling this list out to a separate variable.

The `destination` item is self explanatory. The `estimate` item is yet another list!

In [15]:
etd = d['root']['station'][0]['etd']

print(etd[0]['destination'])
print(etd[0]['estimate'])

print(etd[1]['destination'])
print(etd[1]['estimate'])

Antioch
[{'minutes': '10', 'platform': '3', 'direction': 'North', 'length': '8', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '25', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '40', 'platform': '3', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}]
Berryessa
[{'minutes': 'Leaving', 'platform': '2', 'direction': 'South', 'length': '8', 'color': 'ORANGE', 'hexcolor': '#ff9933', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '15', 'platform': '2', 'direction': 'South', 'length': '8', 'color': 'ORANGE', 'hexcolor': '#ff9933', 'bikeflag': '1', 'delay': '0', 'cancelflag': '0', 'dynamicflag': '0'}, {'minutes': '30', 'platform': '2', 'direction': 

We can print this more nicely using the `.format()` method for a string. The curly braces `{}` are placeholders for the items to be inserted.

In [16]:
print('Train to {} is arriving in {} minutes'.format(etd[0]['destination'], etd[0]['estimate'][0]['minutes']))

Train to Antioch is arriving in 10 minutes


To make this easier to work with, we can convert to a pandas `DataFrame`. This doesn't always work, but is worth a try.

In [17]:
import pandas as pd
df = pd.DataFrame(etd[0]['estimate'])
df

Unnamed: 0,minutes,platform,direction,length,color,hexcolor,bikeflag,delay,cancelflag,dynamicflag
0,10,3,North,8,YELLOW,#ffff33,1,0,0,0
1,25,3,North,10,YELLOW,#ffff33,1,0,0,0
2,40,3,North,10,YELLOW,#ffff33,1,0,0,0


Note that this gave us only the trains to the first destination (contained in `etd[0]`). We'd have to loop over `etd` to get the other destinations.

Here, we use `display` rather than `print` to format the output more nicely. [Here's an explanation](https://stackoverflow.com/questions/26873127/show-dataframe-as-table-in-ipython-notebook).

In [18]:
from IPython.display import display
for e in etd:
    print('\nTrains to {}'.format(e['destination']))
    display(pd.DataFrame(e['estimate']))


Trains to Antioch


Unnamed: 0,minutes,platform,direction,length,color,hexcolor,bikeflag,delay,cancelflag,dynamicflag
0,10,3,North,8,YELLOW,#ffff33,1,0,0,0
1,25,3,North,10,YELLOW,#ffff33,1,0,0,0
2,40,3,North,10,YELLOW,#ffff33,1,0,0,0



Trains to Berryessa


Unnamed: 0,minutes,platform,direction,length,color,hexcolor,bikeflag,delay,cancelflag,dynamicflag
0,Leaving,2,South,8,ORANGE,#ff9933,1,0,0,0
1,15,2,South,8,ORANGE,#ff9933,1,0,0,0
2,30,2,South,8,ORANGE,#ff9933,1,0,0,0



Trains to Millbrae/SFO


Unnamed: 0,minutes,platform,direction,length,color,hexcolor,bikeflag,delay,cancelflag,dynamicflag
0,7,2,South,10,RED,#ff0000,1,0,0,0
1,22,2,South,10,RED,#ff0000,1,0,1,0
2,37,2,South,10,RED,#ff0000,1,0,0,0



Trains to Richmond


Unnamed: 0,minutes,platform,direction,length,color,hexcolor,bikeflag,delay,cancelflag,dynamicflag
0,11,1,North,8,ORANGE,#ff9933,1,0,0,0
1,16,1,North,10,RED,#ff0000,1,0,0,0
2,26,1,North,8,ORANGE,#ff9933,1,0,0,0



Trains to SF Airport


Unnamed: 0,minutes,platform,direction,length,color,hexcolor,bikeflag,delay,cancelflag,dynamicflag
0,12,2,South,10,YELLOW,#ffff33,1,65,0,0
1,26,2,South,10,YELLOW,#ffff33,1,0,0,0
2,41,2,South,8,YELLOW,#ffff33,1,0,0,0


<div class="alert alert-block alert-info">
<strong>Exercise:</strong> How would you calculate the mean headway (time between trains)?
</div>

mean - group by? average

<div class="alert alert-block alert-info">
<strong>Let's recap.</strong> What did we just do?
<ul>  
<li>We constructed a text string following the API documentation, and passed that string to `requests`</li>
<li>We did some step-by-step detective work to convert the output into a usable format</li>
</uli>
</div>

Let's focus on the first step, and try more commands from the BART API, [such as returning fare information](https://api.bart.gov/docs/sched/fare.aspx).

We see that the string begins like this:

`http://api.bart.gov/api/sched.aspx?cmd=fare`

Then we add the various inputs separated by `&`.

For example, the fare from `12TH` to `CIVC` is as follows.

In [19]:
requestString = 'http://api.bart.gov/api/sched.aspx?&origcmd=fare=12TH&dest=CIVC&key={}&json=y'.format(APIkey)
r = requests.get(requestString)
print(r.text)

{"?xml":{"@version":"1.0","@encoding":"utf-8"},"root":{"message":{"error":{"text":"Invalid cmd","details":"The cmd parameter () is missing or invalid. Please correct the error and try again."}}}}


Note that the output is the same as if you paste the string into a web browser.

In [20]:
print(requestString)

http://api.bart.gov/api/sched.aspx?&origcmd=fare=12TH&dest=CIVC&key=MW9S-E7SL-26DU-VV8V&json=y


An alternative and more elegant way of calling requests is to put all the inputs (parameters) into a dictionary. This version is identical to the previous API call.

In [26]:
requestString = 'http://api.bart.gov/api/sched.aspx'
params = {'cmd':'fare',
          'orig':'12TH',
          'dest':'CIVC',
          'key':APIkey,
          'json':'y'}
r = requests.get(requestString, params=params)
print(r.text)

{"?xml":{"@version":"1.0","@encoding":"utf-8"},"root":{"uri":{"#cdata-section":"http://api.bart.gov/api/sched.aspx?cmd=fare&orig=12TH&dest=CIVC&json=y"},"origin":"12TH","destination":"CIVC","trip":{"fare":"3.85","discount":{"clipper":"1.40"}},"fares":{"@level":"normal","fare":[{"@amount":"3.85","@class":"clipper","@name":"Clipper"},{"@amount":"3.05","@class":"start","@name":"Clipper START"},{"@amount":"1.40","@class":"rtcclipper","@name":"Senior/Disabled Clipper"},{"@amount":"1.90","@class":"student","@name":"Youth Clipper"}]},"message":""}}


In [25]:
requestString = 'https://api.bart.gov/api/sched.aspx'
params = {'cmd':'routesched',
          'route':'7',
          'date':'wd',
          'key':APIkey,
          'json':'y'}
r = requests.get(requestString, params=params)
print(r.text)

{"?xml":{"@version":"1.0","@encoding":"utf-8"},"root":{"uri":{"#cdata-section":"http://api.bart.gov/api/sched.aspx?cmd=routesched&route=7&date=wd&json=y"},"date":"4/5/2023","route":{"train":[{"@index":"1","stop":[{"@station":"RICH","@load":"0","@bikeflag":"1","@origTime":"04:57 AM"},{"@station":"DELN","@load":"0","@bikeflag":"1","@origTime":"05:02 AM"},{"@station":"PLZA","@load":"0","@bikeflag":"1","@origTime":"05:05 AM"},{"@station":"NBRK","@load":"0","@bikeflag":"1","@origTime":"05:08 AM"},{"@station":"DBRK","@load":"0","@bikeflag":"1","@origTime":"05:10 AM"},{"@station":"ASHB","@load":"0","@bikeflag":"1","@origTime":"05:13 AM"},{"@station":"MCAR","@load":"0","@bikeflag":"1","@origTime":"05:16 AM"},{"@station":"19TH","@load":"0","@bikeflag":"1","@origTime":"05:20 AM"},{"@station":"12TH","@load":"0","@bikeflag":"1","@origTime":"05:22 AM"},{"@station":"WOAK","@load":"0","@bikeflag":"1","@origTime":"05:26 AM"},{"@station":"EMBR","@load":"0","@bikeflag":"1","@origTime":"05:33 AM"},{"@sta

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Explore some of BART's other API collections at the same link above
</div>

<div class="alert alert-block alert-info">
<h3>Key Takeaways</h3>
<ul>
  <li>Many APIs are just URLs. You can compose the URL as a string.</li>
  <li>JSON is the typical format of the returned data, but you will often need to experiment.</li>
  <li>Be nice! Some APIs will ask you to register. Most will kick you off if you make too many requests.</li>
</ul>
</div>