# Using a Data [Application Programming Interface](https://en.wikipedia.org/wiki/Application_programming_interface): Remotely Accessing [BLS Data](https://www.bls.gov/developers/) hello


As mentioned in the lecture, we can use data APIs to capture data from remote servers operated by data providers. To capture this data, we need to send a [`GET`](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods) request to the BLS server. The `GET` request, formulated as a [URL](https://en.wikipedia.org/wiki/URL) must contain all of the information needed to fully specify the data we would like to retrieve. The URL below is an example of a fully formed request (*for version 1.0* of the API):

`https://api.bls.gov/publicAPI/v1/timeseries/data/CFU0000008000`

In [1]:
# Methods for display of arbitrary HTML
from IPython.display import HTML

# Methods for capture of secure info (like registration keys)
import getpass

# Methods for capture of data returned by GET request
import requests

# Methods for turning API result into pandas DataFrame
import json
import numpy as np
import pandas as pd

# Methods for plotting
import bokeh.plotting as bp

# Display result in an iframe
def show_iframe(url, height=400, width=1000):
    display_string = '<iframe src={url} width={w} height={h}></iframe>'.format(url=url, w=width, h=height)
    print(display_string)
    return HTML(display_string)

show_iframe('https://api.bls.gov/publicAPI/v1/timeseries/data/CFU0000008000', height=100)

<iframe src=https://api.bls.gov/publicAPI/v1/timeseries/data/CFU0000008000 width=1000 height=100></iframe>


If you were to paste that URL into a browser, you would see the output contained in the [`iframe`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe). There are a couple things to note here:

1. The data returned are not in tabular format. [JavaScript Object Notation](https://www.json.org/) (JSON) is often used when passing information over the internet because it is lightweight.
2. We only specified the data category (`timeseries`) and the data series (`CFU0000008000`) in the `GET` request. The other elements (e.g. years and periodicity) are the defaults.
3. We built the entire `iframe` string via [string interpolation](https://dbader.org/blog/python-string-formatting).
4. We just embedded arbitrary [Hypertext Markup Language](https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML) (HTML) directly into our analytic environment (the [Jupyter Notebook](https://github.com/jupyter/notebook)).  This is pretty sweet.

## Using the BLS API
### Get the Registration Key

While we did execute the last query with v1.0 of the API, v2.0 provides [a wider range of series](https://www.bls.gov/developers/). To use it, however, we need to get a registration key that must be embedded inside of our query. This is a reasonably common approach to API management, insofar as it allows the provider to better balance the loads across user requests. To get your key, travel to the [registration page](https://data.bls.gov/registrationEngine/) and fill out the form.

In [2]:
show_iframe('https://data.bls.gov/registrationEngine/')

<iframe src=https://data.bls.gov/registrationEngine/ width=1000 height=400></iframe>


Upon completing the form, you will get your key via email. We can store it in a variable so that it is not visibly hard coded.

In [3]:
# Capture key
reg_key = getpass.getpass('Enter Registration Key: ')

Enter Registration Key: ········


### Define the Series Identifier

There is no general catalog of all available series provided by BLS, but there is a resource that provides [identifier formats by data program](https://www.bls.gov/help/hlpforma.htm). 

In [4]:
show_iframe('https://www.bls.gov/help/hlpforma.htm#CX', height=400)

<iframe src=https://www.bls.gov/help/hlpforma.htm#CX width=1000 height=400></iframe>


Suppose we wanted to capture unadjusted, average expenditures on healthcare by folks in the lowest income quintile over the 2012-2016 time period.  First, we need the series ID components:

`Prefix = CX`

`Seasonal Adjustment = U`

`Item = HEALTH`

`Demographic = LB01`

`Characteristic = 02`

`Process = M`

The series ID is then `CXUHEALTHLB0102M`.

### Make `GET` Request

In [5]:
'{a}_{b}'.format(a='c', b=1)

'c_1'

In [7]:
# Build series
series = ['CXUHEALTHLB0102M']

def capture_request(series, start, end, key=reg_key):
    # Capture content type of request
    headers = {'Content-type': 'application/json'}
    
    # Cast data request as JSON
    data = json.dumps({
        "seriesid": series,
        "startyear":str(start),
        "endyear":str(end),
        "registrationkey":str(key)
    })
    
    # Request data
    p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
    return json.loads(p.text)

# Capture and display JSON representation of the data
json_data = capture_request(series, 2012, 2016)

json_data

{'Results': {'series': [{'data': [{'footnotes': [{}],
      'period': 'A01',
      'periodName': 'Annual',
      'value': '2156',
      'year': '2016'},
     {'footnotes': [{}],
      'period': 'A01',
      'periodName': 'Annual',
      'value': '1930',
      'year': '2015'},
     {'footnotes': [{}],
      'period': 'A01',
      'periodName': 'Annual',
      'value': '1868',
      'year': '2014'},
     {'footnotes': [{}],
      'period': 'A01',
      'periodName': 'Annual',
      'value': '1790',
      'year': '2013'},
     {'footnotes': [{}],
      'period': 'A01',
      'periodName': 'Annual',
      'value': '1677',
      'year': '2012'}],
    'seriesID': 'CXUHEALTHLB0102M'}]},
 'message': [],
 'responseTime': 59,
 'status': 'REQUEST_SUCCEEDED'}

### Convert to [DataFrame](https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) and Plot

It is much easier to work with and plot data in tabular format. We can get there by converting the data, currently represented as a list of [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) a few layers down in the JSON object, into a [pandas](https://pandas.pydata.org/) DataFrame.

In [8]:
# Convert to DF
df_data = pd.DataFrame(json_data['Results']['series'][0]['data'])
print(df_data)

# Retain only the values and years (convert both values to int)
df_sub = df_data[['year', 'value']].astype(int)

# Set year to index
df_sub.set_index('year', inplace=True)

# Sort index
df_sub.sort_index(inplace=True)

df_sub

  footnotes period periodName value  year
0      [{}]    A01     Annual  2156  2016
1      [{}]    A01     Annual  1930  2015
2      [{}]    A01     Annual  1868  2014
3      [{}]    A01     Annual  1790  2013
4      [{}]    A01     Annual  1677  2012


Unnamed: 0_level_0,value
year,Unnamed: 1_level_1
2012,1677
2013,1790
2014,1868
2015,1930
2016,2156


In [9]:
df_sub.loc[2014:2015]

Unnamed: 0_level_0,value
year,Unnamed: 1_level_1
2014,1868
2015,1930


Now that we have our data in hand, we can plot the result.

In [14]:
# Create file path to hold figure we are about to make
bp.output_file('figs/bls_api_fig.html')

# Create plotting figure
fig = bp.figure(plot_width=500, plot_height=300)

# Add a line to the figure
fig.line(df_sub.index, df_sub['value'], line_width=3, color='#890d13')

# Annotate plot
fig.title.text = "Health Expenditures for Q1 Consumers Have Increased"
fig.xaxis.axis_label = "Year"
fig.yaxis.axis_label = "Health Expenditures"


# Show the figure
bp.save(fig)

show_iframe('figs/bls_api_fig.html', width=550, height=350)

<iframe src=figs/bls_api_fig.html width=550 height=350></iframe>
