# BLS public API

In this notebook we will:
1. Obtain an API key from the BLS
2. Add the key to APIkeys.py (or APIkeys.R)
3. Find which data sets and surveys are available on the BLS's API server
4. Learn how to construct an API GET request

**Note:** This part 1

## Obtaining an API key

To get your API key you need to register.  Please click [here](https://www.bls.gov/developers/). In the second paragraph there is a [link](https://data.bls.gov/registrationEngine/) to register for an API key. Click and fill in the details.

You will get an e-mail containing your API key as well a link to follow to validate your key.

The key should be 32 characters long.

## Adding the key to environment variables

Open your APIkeys.py (or APIkeys.R) file in a text editor (not in Jupyter!). 

Add the following line:

```Python
os.environ["BLS_API_key"] = "***************************"
```
or

```R
Sys.setenv(BLS_API_key = '**************************') 
```

Replace the astrics with your API key. Save and close.

**Run the script:**

```Python
%run APIkeys.py
```
or
```R
source('APIkeys.R')
```

In [1]:
source('APIkeys.R')

You can check that the new key is uploaded correctly.

```R
a = Sys.getenv('BLS_API_key')
print(a)
```

In [2]:
# If you want to run this cell, remove the # sign from in the following lines:
#a = Sys.getenv('BLS_API_key')
#print(a)

## BLS API documentation

We are ready to use the BLS API server! 

The cocumentation for the API is in this [link](https://www.bls.gov/developers/api_signature_v2.htm) and it includes code examples. 

First, let's load ```httr``` and ```jsonlite``` packages.

In [3]:
library(httr)
library(jsonlite)

### List of all surveys

We start by looking at all the surveys that the BLS has and that you can access. 

You can follow this [link](https://api.bls.gov/publicAPI/v2/surveys) to see a list of surveys in a JSON format.

Let's collect it into a variable.

In [4]:
surveys_url = "https://api.bls.gov/publicAPI/v2/surveys"
s = GET(surveys_url)
s_json = fromJSON(content(s,"text",encoding = "UTF-8"))

In [5]:
s_json

Unnamed: 0_level_0,survey_abbreviation,survey_name
Unnamed: 0_level_1,<chr>,<chr>
1,AP,Consumer Price Index - Average Price Data
2,BD,Business Employment Dynamics
3,BG,Collective Bargaining Agreements-State and Local Government
4,BP,Collective Bargaining Agreements-Private Sector
5,CC,Employer Costs for Employee Compensation
6,CD,Nonfatal cases involving days away from work: selected characteristics
7,CE,"Employment, Hours, and Earnings from the Current Employment Statistics survey (National)"
8,CF,Census of Fatal Occupational Injuries
9,CH,Nonfatal cases involving days away from work: selected characteristics (2003 - 2010)
10,CI,Employment Cost Index


In [6]:
surveys = s_json$Results$survey

In [7]:
# A list of all the surveys
surveys

Unnamed: 0_level_0,survey_abbreviation,survey_name
Unnamed: 0_level_1,<chr>,<chr>
1,AP,Consumer Price Index - Average Price Data
2,BD,Business Employment Dynamics
3,BG,Collective Bargaining Agreements-State and Local Government
4,BP,Collective Bargaining Agreements-Private Sector
5,CC,Employer Costs for Employee Compensation
6,CD,Nonfatal cases involving days away from work: selected characteristics
7,CE,"Employment, Hours, and Earnings from the Current Employment Statistics survey (National)"
8,CF,Census of Fatal Occupational Injuries
9,CH,Nonfatal cases involving days away from work: selected characteristics (2003 - 2010)
10,CI,Employment Cost Index


In [8]:
typeof(surveys)

In [9]:
# Example 1:
surveys[15,] #the 15th element of the list

Unnamed: 0_level_0,survey_abbreviation,survey_name
Unnamed: 0_level_1,<chr>,<chr>
15,CX,Consumer Expenditure Survey


In [10]:
#Example 2:
surveys[13,] #CPI for urband consumers

Unnamed: 0_level_0,survey_abbreviation,survey_name
Unnamed: 0_level_1,<chr>,<chr>
13,CU,Consumer Price Index - All Urban Consumers


In [11]:
for(row in 1:nrow(surveys))
    {
    r = surveys[row,]
    print(paste0(r$survey_name," (" , r$survey_abbreviation , ")" ))
}
    


[1] "Consumer Price Index - Average Price Data (AP)"
[1] "Business Employment Dynamics (BD)"
[1] "Collective Bargaining Agreements-State and Local Government (BG)"
[1] "Collective Bargaining Agreements-Private Sector (BP)"
[1] "Employer Costs for Employee Compensation (CC)"
[1] "Nonfatal cases involving days away from work: selected characteristics (CD)"
[1] "Employment, Hours, and Earnings from the Current Employment Statistics survey (National) (CE)"
[1] "Census of Fatal Occupational Injuries (CF)"
[1] "Nonfatal cases involving days away from work: selected characteristics (2003 - 2010) (CH)"
[1] "Employment Cost Index (CI)"
[1] "Employer Costs for Employee Compensation (CM)"
[1] "Nonfatal cases involving days away from work: selected characteristics (2011 forward) (CS)"
[1] "Consumer Price Index - All Urban Consumers (CU)"
[1] "Consumer Price Index - Urban Wage Earners and Clerical Workers (CW)"
[1] "Consumer Expenditure Survey (CX)"
[1] "Employee Benefits Survey (EB)"
[1] "Employme

In [12]:
l = nrow(surveys) #number of surveys
print(paste("Number of surveys:",l))
# The difference between paste() and paste0() is that the paste() function add a space in between elements by default, while paste0() doesn't. 
# paste(element1, element2, sep = "") is the same as paste0(element1, element2)

[1] "Number of surveys: 66"


### Aside:  parenthesis, brackets, etc.

**parenthesis** ```( )``` are used for:
- changing the order of operations in a mathemtical expression: ```z = 2 * (x + y)```
- invoking a function call ```cos(1.25)```
- defining tuples ``` t = (1,2,3)```

**Square brackets** ```[]``` are used for:
- accessing an element in a list (aka an array) ```surveys[15,]``` is the 15th row of the list ```surveys```, and ```surveys[,1]``` is the 1st column of the list ```surveys```.

**Dollar signs** ```$``` are used for:
- accessing a key in a named list ```s$survey_name```

**Curly brackets** ```{}``` are used for 
- denoting a block of code in a function / loop

We can look at a single survey:

In [13]:
# In R we can chain operations

cu = fromJSON(content(GET("https://api.bls.gov/publicAPI/v2/surveys/CU"),"text",encoding = "UTF-8"))

In [14]:
cu

Unnamed: 0_level_0,survey_name,survey_abbreviation,allowsNetChange,allowsPercentChange,hasAnnualAverages
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
1,Consumer Price Index - All Urban Consumers,CU,False,True,True


## Series ID and format

The full explanation of how to constuct the series (variable) name is [here](https://www.bls.gov/help/hlpforma.htm).

We will go over it in great details for the [CPI data](https://www.bls.gov/help/hlpforma.htm#CX) (survey id CU).

Each variable has a name of length up to 16 characters that has a very specific structure:

```
                      1         2
	             12345678901234567890
	Series ID    CUUR0000SA0L1E
	Positions       Value           Field Name
	1-2             CU              Prefix
	3               U               Not Seasonal Adjustment Code
	4               R               Periodicity Code
	5-8             0000            Area Code
	9               S               Base Code
	10-16           A0L1E           Item Code
```

Note:  ```CUUR0000SA0L1E``` is just an example.

### First two letters

Since we are going to use the CU survey, every variable name will start with the letters ```CU```.

### Third letter

It will always be ```U``` standing for not seasonaly adjusted.

> Indicates the adjustment of time series data to eliminate the effect of intrayear variations which tend to occur during the same period on an annual basis (i.e. Where U=Unadjusted and S=Seasonally Adjusted). 

### Forth letter

R=monthly and S=semi-annual data. We will use ```R```.

<p style="background-color:#ccccee;font-size: 18px;">
Therefore, so far, ALL the variables names for the variables that we will aceess will start with ```CUUR```.
    </p>

### DIgits/characters 5-8

These include the area codes used by the BLS (they have nothing to do with phone area codes). 

[List of area codes](https://download.bls.gov/pub/time.series/cu/cu.area)

For example:
- 0000	U.S. city average
- A104	Pittsburgh, PA
- A210	Cleveland-Akron, OH
- S12A	New York-Newark-Jersey City, NY-NJ-PA	
- S49A	Los Angeles-Long Beach-Anaheim, CA	

etc...

# Character 9

Indicates the 'base' of the series, i.e. which time period is normalized to 100.

>Indicates the designated reference date from which price change is measured, where the "current" base year is 1982-84=100 or more recent (S) and the "alternate" base year (A) is prior to the current base year. 

We will use ```S```.

## Characters 10-16

This part contains upto 7 characters indicatin item code.

[List of item codes](https://download.bls.gov/pub/time.series/cu/cu.item)

<p style="font-size: 24px;color:red;">
    Warning: The above webpage lists Character 9 AND characters 10-16!
</p>
    

Examples (characters 10-16):

- A0	All items (That's the CPI that is being published)
- AF111	Cereals and bakery products
- ARC	Recreation commodities	
- EFN02	Frozen noncarbonated juices and drinks

etc...

The length of the item code indicates if it's a category, sub-category. sub-sub-category, etc.

## Examples:

**Example 1:**  CUURA104SEGD03

<p style="font-size: 32px;-webkit-text-stroke: 2px black;letter-spacing: 10px;"> 
    CU|U|R|A104|S|EGD03
</p>

price index for urban consumers (CU), which is not seasonally adjusted (U), at a monthly frequency (R), for consumers living in the Pittsburgh (A104), using 1982-1984 as the base (S), for laundry and dry cleaning services (SEGD03).

**Example 2:**  CUUR0350SEFK
<p style="font-size: 32px;-webkit-text-stroke: 2px black;letter-spacing: 10px;"> 
    CU|U|R|0350|S|EFK
</p>

price index for urban consumers (CU), which is not seasonally adjusted (U), at a monthly frequency (R), for consumers living in the South East region (0350), using 1982-1984 as the base (S), for fresh fruits (SEFK).


### challange

Can you build the variable name for the price indext for urban consumers living in Dallas-Fort Worth-Arlington, TX, which is not seasonally adjusted, uses 1982-84 as the base for Lettuce?

# Retrieving data from the API


## Single series with no parameters

If you want to retrieve a single series without any aditional parameters (i.e. starting period and end period), you can use a simple GET request without using  your api key!

The following is an example 

Taking ```CUURA104SEGD03``` as an example, we would like to get this series of price indices from the BLS.

In [21]:
base_url = 'https://api.bls.gov/publicAPI/v2/timeseries/data/'

series = 'CUUR0000SA0L1E'

url = paste0(base_url, series) #use paste0() here because we don't want extra space in url 

r = fromJSON(content(GET(url),"text",encoding = "UTF-8"))

In [27]:
r

Unnamed: 0_level_0,seriesID,data
Unnamed: 0_level_1,<chr>,<list>
1,CUUR0000SA0L1E,"2021 , 2021 , 2021 , 2021 , 2021 , 2021 , 2021 , 2021 , 2021 , 2021 , 2021 , 2021 , 2020 , 2020 , 2020 , 2020 , 2020 , 2020 , 2020 , 2020 , 2020 , 2020 , 2020 , 2020 , 2019 , 2019 , 2019 , 2019 , 2019 , 2019 , 2019 , 2019 , 2019 , 2019 , 2019 , 2019 , M12 , M11 , M10 , M09 , M08 , M07 , M06 , M05 , M04 , M03 , M02 , M01 , M12 , M11 , M10 , M09 , M08 , M07 , M06 , M05 , M04 , M03 , M02 , M01 , M12 , M11 , M10 , M09 , M08 , M07 , M06 , M05 , M04 , M03 , M02 , M01 , December , November , October , September, August , July , June , May , April , March , February , January , December , November , October , September, August , July , June , May , April , March , February , January , December , November , October , September, August , July , June , May , April , March , February , January , 283.908 , 282.754 , 281.617 , 279.884 , 279.507 , 279.146 , 278.218 , 275.893 , 273.968 , 271.713 , 270.696 , 269.755 , 269.226 , 269.473 , 269.328 , 269.054 , 268.756 , 267.703 , 266.302 , 265.799 , 266.089 , 267.312 , 267.268 , 266.004 , 264.935 , 265.108 , 265.059 , 264.522 , 264.169 , 263.566 , 263.177 , 262.590 , 262.332 , 261.836 , 261.114 , 260.122"


In [26]:
r$Results$series$data

Unnamed: 0_level_0,year,period,periodName,value,footnotes
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<list>
1,2021,M12,December,283.908,
2,2021,M11,November,282.754,
3,2021,M10,October,281.617,
4,2021,M09,September,279.884,
5,2021,M08,August,279.507,
6,2021,M07,July,279.146,
7,2021,M06,June,278.218,
8,2021,M05,May,275.893,
9,2021,M04,April,273.968,
10,2021,M03,March,271.713,


**Be careful!** the number of requests you can make like that is limited.

```JSON
{'status': 'REQUEST_NOT_PROCESSED',
 'responseTime': 0,
 'message': ['Request could not be serviced, as the daily threshold for total number of requests allocated to the user has been reached.'],
 'Results': {}}
 ```