## Accessing Bureau of Labor Statistics data

The Bureau of Labor Statistics (BLS) has an API with sample code in many languages to allow people to directly access their data, provided you know the series id for the data you would like. The BLS sample code dumps the data into a csv for each series you may be trying to access. Sometimes, it is nice to be able to work with the data in a form you are familiar with and able to manipulate. I am personally more familiar with working in pandas for analytics. This is my code to grab a few series and take them from the json data format into a pandas data frame in a single function. We'll add in the series label and sort the dataframe too, just for fun.

In this notebook I also webscrape the BLS website's page that lists every area code available in their data series. Below I will demonstrate grabbing 2-3 counties worth of average weekly wage data and placing it into a pandas dataframe for analysis. I'd love to grab all counties at once but the BLS server has a rate limit of 500 requests/day. So we will work with just the first few from the scraped list.

In [1]:
import pandas as pd
import requests
import json

In [2]:
## webscrape function to grab area codes for unemployment
from bs4 import BeautifulSoup
def bls_areas():
    main_site = requests.get('https://data.bls.gov/cew/doc/titles/area/area_titles.htm')
    main_soup = BeautifulSoup(main_site.content, 'html.parser')
    table = main_soup.find(id='area_title')
    area_codes = {}
    rows = table.find_all('tr')
    for row in rows[1:]:
        data = row.find_all('td')
        area_codes[data[0].text] = data[1].text
    return area_codes

## function to grab the data from BLS website and transform into a pandas dataframe
def get_bls_data(series, start_year, end_year):
    headers = {'Content-type': 'application/json'}
    data = json.dumps({"seriesid": ['ENU0100740010','ENU0100940010','ENU0101140010'],"startyear":start_year, "endyear":end_year})
    p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
    json_data = json.loads(p.text)
    df = pd.DataFrame()
    for series in json_data['Results']['series']:
        seriesId = series['seriesID']
        for item in series['data']:
            year = item['year']
            period = item['period']
            value = item['value']
            df = df.append({'series_id':seriesId, 'year':year,'period':period,'value':value}, ignore_index=True)
    return df

In [3]:
area_codes = bls_areas()
area_codes = {k: area_codes[k] for k in sorted(area_codes.keys())[4:7]}
area_codes

{'01007': 'Bibb County, Alabama',
 '01009': 'Blount County, Alabama',
 '01011': 'Bullock County, Alabama'}

In [4]:
## Build the series codes via BLS formats
## https://www.bls.gov/help/hlpforma.htm#OCWC
series_code_prefix = 'ENU'
series_code_suffix = '40010'
series_codes = []

for k,v in area_codes.items():
    series_code = series_code_prefix + k + series_code_suffix
    series_codes.append(series_code)
    
print(series_codes)

['ENU0100740010', 'ENU0100940010', 'ENU0101140010']


In [5]:
## Get our series
wage_data = get_bls_data(series_codes, '2010','2019')
wage_data.head()

Unnamed: 0,period,series_id,value,year
0,Q04,ENU0100740010,903,2019
1,Q03,ENU0100740010,814,2019
2,Q02,ENU0100740010,808,2019
3,Q01,ENU0100740010,777,2019
4,Q04,ENU0100740010,842,2018


In [6]:
## sort all series ascending order by year
wage_data.sort_values(by=['year','period'], inplace=True)
## map county name to the series
for idx, row in wage_data.iterrows():
    area = row['series_id'][3:8]
    wage_data.loc[idx,'county_name'] = area_codes[area]
wage_data

Unnamed: 0,period,series_id,value,year,county_name
39,Q01,ENU0100740010,584,2010,"Bibb County, Alabama"
79,Q01,ENU0100940010,523,2010,"Blount County, Alabama"
119,Q01,ENU0101140010,509,2010,"Bullock County, Alabama"
38,Q02,ENU0100740010,657,2010,"Bibb County, Alabama"
78,Q02,ENU0100940010,554,2010,"Blount County, Alabama"
...,...,...,...,...,...
41,Q03,ENU0100940010,683,2019,"Blount County, Alabama"
81,Q03,ENU0101140010,721,2019,"Bullock County, Alabama"
0,Q04,ENU0100740010,903,2019,"Bibb County, Alabama"
40,Q04,ENU0100940010,729,2019,"Blount County, Alabama"


Now you have some data to work with that is labelled, sorted, and in a pandas dataframe.