## Matching employment-population ratio data from the CPS with BLS Summary Data

This is a proof of concept that the technique for using the CPS to find the employment to population ratio for one subgroup of the population will match with the BLS summary statistics for the same group/time. Specifically, I look at 2010 and Women aged 25-54, (BLS series ID: LNS12300062)

In [1]:
import pandas as pd

In [2]:
cols = ['year', 'female', 'age', 'educ', 'empl', 'fnlwgt', 'orgwgt']
year = '2015'

In [4]:
df = pd.read_stata('data/cepr_org_{}.dta'.format(year), columns=cols)

In [5]:
df = df[(df['age'] >= 25) & 
        (df['age'] <=54) &
        (df['female'] == 1)]

In [20]:
emp = df.groupby('empl').sum()['orgwgt'].ix[1]
pop = df.groupby('empl').sum()['orgwgt'].sum()
epop = emp/pop

print '{}: Women, age 25-54: {}'.format(year, str(round(epop * 100, 2)))

2015: Women, age 25-54: 71.17


In [19]:
df

Unnamed: 0,year,female,age,educ,empl,fnlwgt,orgwgt
2,2015,1,39,College,1.0,593.525513,2481.083984
18,2015,1,43,Advanced,0.0,799.666199,3025.380127
22,2015,1,41,Some college,0.0,1102.588135,4171.425781
25,2015,1,32,Some college,1.0,1148.501343,4607.502930
29,2015,1,37,College,1.0,593.525513,2481.083984
36,2015,1,29,Some college,1.0,1255.992554,5129.384766
38,2015,1,43,Some college,0.0,908.746399,3438.063721
40,2015,1,54,Advanced,1.0,582.943604,2336.285889
42,2015,1,51,College,1.0,1184.341553,4746.532715
47,2015,1,42,Advanced,1.0,573.069580,2344.332764


In [10]:
df.groupby('empl').sum()['fnlwgt'].iloc[1]

1.3439517e+08

#### Compare with BLS Summary Statistics

In [13]:
import requests
import json
import config # file called config.py with my API key

# BLS API v1 url
url = 'https://api.bls.gov/publicAPI/v1/timeseries/data/'

series = 'LNS12300062'

In [14]:
# get the data returned by the url and series id
r = requests.get('{}{}'.format(url, series))

# Generate pandas dataframe from the data returned
df2 = pd.DataFrame(r.json()['Results']['series'][0]['data'])

In [15]:
df2

Unnamed: 0,footnotes,period,periodName,value,year
0,[{}],M08,August,72.1,2017
1,[{}],M07,July,72.4,2017
2,[{}],M06,June,72.1,2017
3,[{}],M05,May,71.8,2017
4,[{}],M04,April,71.9,2017
5,[{}],M03,March,72.0,2017
6,[{}],M02,February,71.6,2017
7,[{}],M01,January,71.3,2017
8,[{}],M12,December,71.5,2016
9,[{}],M11,November,71.5,2016


In [16]:
round(df2[df2['year'] == year]['value'].astype(float).mean(), 2)

70.33