EPOPs benchmark with BLS summary
=====

## Matching employment-population ratio data from the CPS with BLS Summary Data

-----

*September 7, 2017*<br>
*Brian Dew*<br>
*dew@cepr.net*<br>

This is a proof of concept that the technique for using the CPS to find the employment to population ratio for one subgroup of the population will match with the BLS summary statistics for the same group/time. Specifically, I look at 2010 and Women aged 25-54, (BLS series ID: LNS12300062)

In [1]:
import pandas as pd
import os

os.chdir('/home/domestic-ra/Working/CPS_ORG/EPOPs/')

List of columns to keep from large CPS file and which year's CPS file to use.

In [2]:
cols = ['year', 'female', 'age', 'educ', 'empl', 'orgwgt']
year = '2015'

Read into pandas DataFrame df the cepr cps org stata file, downloaded from [here](http://ceprdata.org/cps-uniform-data-extracts/cps-outgoing-rotation-group/).

In [3]:
df = pd.read_stata('Data/cepr_org_{}.dta'.format(year), columns=cols)

Filter the DataFrame to include only prime (25-54) age women.

In [4]:
df = df[(df['female'] == 1) & (df['age'].isin(range(25,55)))]

Calculate the employment-population ratio based as the weighted average of the `empl` variable, the weights in this case are the `orgwgts`.

In [6]:
epop = (df['orgwgt'] * df['empl']).sum() / df['orgwgt'].sum() * 100
print('CEPR ORG CPS: {}: Women, age 25-54: {:0.2f}'.format(year, epop))

CEPR ORG CPS: 2015: Women, age 25-54: 70.32


### BLS summary data for comparison

Request the BLS series equivalent to the CPS-determined value above.

In [8]:
import requests
import json
import config # file called config.py with my API key

series = 'LNU02300062'  # BLS Series ID of interest

# BLS API v1 url for series
url = 'https://api.bls.gov/publicAPI/v1/timeseries/data/{}'.format(series)
print(url)

https://api.bls.gov/publicAPI/v1/timeseries/data/LNU02300062


In [10]:
# Get the data returned by the url and series id
r = requests.get(url).json()
print('Status: ' + r['status'])

# Generate pandas dataframe from the data returned
df2 = pd.DataFrame(r['Results']['series'][0]['data'])

Status: REQUEST_SUCCEEDED


In [12]:
epop2 = df2[df2['year'] == year]['value'].astype(float).mean()
print('BLS Benchmark: {}: Women, age 25-54: {:0.2f}'.format(year, epop2))

BLS Benchmark: 2015: Women, age 25-54: 70.32


Show the 2017 values sorted to check against the epops_2017_from_monthly figures.

In [13]:
df2.set_index('year').loc['2017'].sort_values('periodName')

Unnamed: 0_level_0,footnotes,period,periodName,value
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017,[{}],M04,April,72.3
2017,[{}],M08,August,71.4
2017,[{}],M02,February,71.8
2017,[{}],M01,January,71.3
2017,[{}],M07,July,71.4
2017,[{}],M06,June,71.5
2017,[{}],M03,March,72.2
2017,[{}],M05,May,72.1
