# Module 1. Application programming interfaces: APIs
# The BART API

## Lecture objectives

1. Demonstrate different ways to access data via an API
2. Introduce the `requests` library 
3. Show how to parse JSON data

One way to get data is to download it manually from a website. You might click through a series of links and then save a `.csv` or similar file to your hard drive.

Another way is through an Application Programming Interface (API). These APIs make it possible to request just the data that you are interested in. Sometimes, this data is static—it will be the same each time you request it. The U.S. Census is a good example. Other APIs are dynamic—for example, bus and train arrival times.

## Example: BART
Many APIs return a format known as JSON. While it seems complicated, it's relatively easy to work with because it's highly structured.

BART provides its [API documentation here](https://api.bart.gov/docs/overview/index.aspx).

How do we get data from the API into Python? The `requests` library is the key to unlocking many interactions with the web.

Let's get the real-time departures from 12th St Oakland City Center station. According to the [documentation](https://api.bart.gov/docs/etd/etd.aspx), we need to pass `orig=12TH`. We can also pass `json=y` to return the results in JSON format.

Most APIs, including that of BART, require a "key" that identifies you to the developer. For serious usage, you'd request your own key. But for experimentation, BART provides a key that anyone can use. 

So how do we construct the request? We simply build a string that contains all of these commands, following the example in the documentation. Then we pass that string to `requests.get()`.

### Update
BART used to post an API key on its website for anyone to use. That's what you see on the lecture videos. Now, you need to register for your own key.

You can do that [here](https://api.bart.gov/api/register.aspx). It takes a few minutes for your key to arrive by email, but might take an hour before it is activated.

In [None]:
import requests

APIkey = 'XXXX'  # replace XXXX with the key you received by email
requestString = 'http://api.bart.gov/api/etd.aspx?cmd=etd&orig=12TH&json=y&key='+APIkey

r = requests.get(requestString)

Here, we've made a request to the BART API, and stored the response in the `r` object. 

Let's see what the `r` object includes. Use `r.` and the `tab` autocomplete in Jupyter Notebook to see the different attributes. Or you can call the help function.

In [None]:
help(r)

One attribute is `ok`: did the request succeed? If it didn't, your API key might not be activated yet.

In [None]:
print(r.ok)

The `text` attribute shows the text that was returned.

In [None]:
print(r.text)

This looks promising. But how can we get this into a more usable form? The `json` module is the key. It's built-in to Python so you don't have to install anything.

In [None]:
import json

In [None]:
# json.loads() will turn the JSON object into a dictionary
d = json.loads(r.text)
print(type(d))

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> How can you access the relevant contents of the dictionary? Hint: First, look at the keys.
</div>

In [None]:
# remember, a dictionary is a collection of keys and values
# here looks like there are two keys, called ?xml and root
# let's look at the root item
print(d.keys())
print(d['root'])

In [None]:
# and in turn, 'root' is another dictionary. You could also have seen this because of the curly brackets { }  
print((d['root'].keys()))

The "station" item seems to hold most of the useful information. 

It's a list.

In [None]:
print(type((d['root']['station'])))

Of length 1

In [None]:
print(len(d['root']['station']))

In [None]:
print(d['root']['station'][0])

And this is another dictionary! (You can tell by the curly brackets).

Most of the information appears to be in `etd`, which is another list. 

In [None]:
print(d['root']['station'][0]['etd'])

Each element of the list appears to be a dictionary, giving details of trains to a particular destination. Let's simplify things by pulling this list out to a separate variable.

The `destination` item is self explanatory. The `estimate` item is yet another list!

In [None]:
etd = d['root']['station'][0]['etd']

print(etd[0]['destination'])
print(etd[0]['estimate'])

print(etd[1]['destination'])
print(etd[1]['estimate'])

We can print this more nicely using the `.format()` method for a string. The curly braces `{}` are placeholders for the items to be inserted.

In [None]:
print('Train to {} is arriving in {} minutes'.format(etd[0]['destination'], etd[0]['estimate'][0]['minutes']))

To make this easier to work with, we can convert to a pandas `DataFrame`. This doesn't always work, but is worth a try.

In [None]:
import pandas as pd
df = pd.DataFrame(etd[0]['estimate'])
df

Note that this gave us only the trains to the first destination (contained in `etd[0]`). We'd have to loop over `etd` to get the other destinations.

Here, we use `display` rather than `print` to format the output more nicely. [Here's an explanation](https://stackoverflow.com/questions/26873127/show-dataframe-as-table-in-ipython-notebook).

In [None]:
from IPython.display import display
for e in etd:
    print('\nTrains to {}'.format(e['destination']))
    display(pd.DataFrame(e['estimate']))

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> How would you calculate the mean headway (time between trains)?
</div>

<div class="alert alert-block alert-info">
<strong>Let's recap.</strong> What did we just do?
<ul>  
<li>We constructed a text string following the API documentation, and passed that string to `requests`</li>
<li>We did some step-by-step detective work to convert the output into a usable format</li>
</uli>
</div>

Let's focus on the first step, and try more commands from the BART API, [such as returning fare information](https://api.bart.gov/docs/sched/fare.aspx).

We see that the string begins like this:

`http://api.bart.gov/api/sched.aspx?cmd=fare`

Then we add the various inputs separated by `&`.

For example, the fare from `12TH` to `CIVC` is as follows.

In [None]:
requestString = 'http://api.bart.gov/api/sched.aspx?&cmd=fare&orig=12TH&dest=CIVC&key={}&json=y'.format(APIkey)
r = requests.get(requestString)
print(r.text)

Note that the output is the same as if you paste the string into a web browser.

In [None]:
print(requestString)

An alternative and more elegant way of calling requests is to put all the inputs (parameters) into a dictionary. This version is identical to the previous API call.

In [None]:
requestString = 'http://api.bart.gov/api/sched.aspx'
params = {'cmd':'fare',
          'orig':'12TH',
          'dest':'CIVC',
          'key':APIkey,
          'json':'y'}
r = requests.get(requestString, params=params)
print(r.text)

<div class="alert alert-block alert-info">
<strong>Exercise:</strong> Explore some of BART's other API collections at the same link above
</div>

<div class="alert alert-block alert-info">
<h3>Key Takeaways</h3>
<ul>
  <li>Many APIs are just URLs. You can compose the URL as a string.</li>
  <li>JSON is the typical format of the returned data, but you will often need to experiment.</li>
  <li>Be nice! Some APIs will ask you to register. Most will kick you off if you make too many requests.</li>
</ul>
</div>