### bd econ CPS extract

bd_CPS_benchmark.ipynb

January 22, 2018

Contact: Brian Dew, @bd_econ

Requires: `cpsYYYY.ft` files for each year. The bd CPS files are generated bd_CPS_reader.ipynb

-----

See [readme](https://github.com/bdecon/econ_data/tree/master/bd_CPS) for documentation.

In [1]:
import pandas as pd
import numpy as np
import wquantiles
import os

os.chdir('/home/brian/Documents/CPS/data/clean/')

#### Benchmark 1

In October 1999, how many people were unemployed because of losing a job?

BLS: LNU03023621: 2,162,000

In [2]:
(pd.read_feather('cps1999.ft')
   .query('MONTH==10 and UNEMPTYPE == "Job Loser"')
   ['PWCMPWGT']).sum()

2161502.5

#### Benchmark 2

In February 2007, what share of age 25-54 women were employed?

BLS: LNU02300062: 72.6

In [3]:
df = (pd.read_feather('cps2007.ft')
        .query('MONTH==2 and 25 <= AGE <= 54 and FEMALE==1')
        .groupby('EMP')
        .PWCMPWGT.sum())

df[1] / df.sum()

0.7260245741606125

#### Benchmark 3

In May 2014, how many people have more than one job?

BLS: LNU02026619: 7,305,000

In [4]:
(pd.read_feather('cps2014.ft')
   .query('MONTH==5 and MJH==1')
   .PWCMPWGT).sum()

7304317.5

#### Benchmark 4

In 2017 Q1, what were the nominal usual weekly earnings?

BLS: $857

In [5]:
df = (pd.read_feather('cps2017.ft')
        .query('MONTH < 4 and WKWAGE > 0 and PRFTLF==1'))

wquantiles.median(df['WKWAGE'], df['PWORWGT'])

865.0

In [6]:
# Sophisticated version
def binned_wage(group):
    """Return BLS-styled binned median wage"""
    weight = 'PWORWGT'
    wage_var = 'WKWAGE'
    decile = 0.5
    bin_size = 50
    bins = list(np.arange(25, 3000, bin_size))
    bin_cut = lambda x: pd.cut(x[wage_var], bins, include_lowest=True)
    cum_sum = lambda x: x[weight].cumsum()
    dft = (group.sort_values(wage_var)
                .assign(WAGE_RANGE = bin_cut, CS = cum_sum))
    dec_point = dft[weight].sum() * decile
    dec_bin = (dft.iloc[(dft['CS'] - dec_point).abs().argsort()[:1]]
                  .WAGE_RANGE.values[0])
    wage_bins = list(dft['WAGE_RANGE'].unique())
    dec_loc = wage_bins.index(dec_bin)
    bin_below = dft[dft['WAGE_RANGE'] == wage_bins[dec_loc-1]].iloc[-1].CS
    bin_above = dft[dft['WAGE_RANGE'] == wage_bins[dec_loc]].iloc[-1].CS
    dec_value = ((((dec_point - bin_below) / 
                   (bin_above - bin_below)) * bin_size) + dec_bin.left)
    return dec_value

binned_wage(df)

859.45054372651

#### Benchmark 5

In April 2007, what was the unemployment rate for native born Hispanic or latino people?

BLS: LNU04073425: 5.6 

In [7]:
df = (pd.read_feather('cps2007.ft')
        .query('MONTH == 4 and FORBORN == 0 and WBHAO == "Hispanic"')
        .groupby('PEMLR')
        .BASICWGT.sum())

df[3:5].sum() / df[1:5].sum()

0.05556581205344538

#### Benchmark 6

In 2017, what was the union membership rate for black men?

BLS: LUU0204905200: 13.7

In [8]:
df = (pd.read_feather('cps2017.ft')
        .query('PEERNLAB > 0 and WBHAOM == "Black" and FEMALE == 0')
        .groupby('PEERNLAB')
        .PWORWGT.sum())

df[1] / df.sum()

0.13706622

#### Benchmark 7

In November 2015, on average, how many hours did usually employed full-time married (spouse present) men work?

BLS: LNU02533629: 44.1

In [9]:
df = (pd.read_feather('cps2015.ft')
        .query('MONTH == 11 and PRFTLF == 1 and PRMARSTA in [1, 2]'
               'and FEMALE == 0 and PRAGNA == 2 and PEHRACTT > 0'))

np.average(df['PEHRACTT'], weights=df['BASICWGT'])

44.10250044951661

#### Benchmark 8

In 2017, what was the median hourly wage for 45 to 54 year old female wage and salary workers paid hourly rates?

BLS: LEU0207640900: $15.16

In [10]:
df = (pd.read_feather('cps2017.ft')
        .query('45 <= AGE <54 and FEMALE == 1 and PRERNHLY > 0'
               'and PEIO1COW not in [6, 8]')
        .assign(HRWAGE_HRLY = lambda x: x['PRERNHLY'] / 100.0))

wquantiles.median(df['HRWAGE_HRLY'], weights=df['PWORWGT'])

15.0

In [11]:
# Sophisticated version
def binned_wage(group):
    """Return BLS-styled binned median wage"""
    weight = 'PWORWGT'
    wage_var = 'HRWAGE_HRLY'
    decile = 0.5
    bin_size = 0.5
    bins = list(np.arange(.25, 300, bin_size))
    bin_cut = lambda x: pd.cut(x[wage_var], bins, include_lowest=True)
    cum_sum = lambda x: x[weight].cumsum()
    dft = (group.sort_values(wage_var)
                .assign(WAGE_RANGE = bin_cut, CS = cum_sum))
    dec_point = dft[weight].sum() * decile
    dec_bin = (dft.iloc[(dft['CS'] - dec_point).abs().argsort()[:1]]
                  .WAGE_RANGE.values[0])
    wage_bins = list(dft['WAGE_RANGE'].unique())
    dec_loc = wage_bins.index(dec_bin)
    bin_below = dft[dft['WAGE_RANGE'] == wage_bins[dec_loc-1]].iloc[-1].CS
    bin_above = dft[dft['WAGE_RANGE'] == wage_bins[dec_loc]].iloc[-1].CS
    dec_value = ((((dec_point - bin_below) / 
                   (bin_above - bin_below)) * bin_size) + dec_bin.left)
    return dec_value

binned_wage(df)

15.144216310106446

#### Benchmark 9

How many women age 20-24 were employed in June 1992?

BLS: LNU02000038: 6,190,000

In [12]:
(pd.read_feather('cps1992.ft')
   .query('MONTH == 6 and SEX == 2 and 20 <= AGE <= 24'
          'and LFSR in [1, 2]')).FNLWGT.sum()

6144924.98

#### Benchmark 10

What was the unemployment rate in Febuary 1989?

BLS: LNU04000000: 5.6%

In [13]:
df = (pd.read_feather('cps1989.ft')
        .query('MONTH == 2 and AGE > 15')
        .groupby('LFSR')
        .FNLWGT.sum())

df[2:4].sum() / df[0:4].sum()

0.05666313114440966