Working with the Current Population Survey (CPS) in Python
=====

### Annual Social and Economic Supplement (ASEC)

-----

*Update: April 19, 2018*<br>
*Brian Dew*<br>
*@bd_econ*

The CPS ASEC, also called the March CPS, includes additional questions that cover income, poverty, health insurance coverage, and more. The [raw public use file](http://thedataweb.rm.census.gov/ftp/cps_ftp.html#cpsmarch) is fixed-width format and includes variables described in the associated data dictionary. Unlike the basic monthly CPS, the March CPS records are hierarchical. Person rows are nested in family rows which are nested in household rows. The third character of person rows is 3. 

**To do list**
* Emulate the published data technique for calculating income values in $2,500 bins.


Useful blog posts from Tom Augspurger: 

* [Part 1: Using Python to tackle the CPS](http://tomaugspurger.github.io/tackling%20the%20cps.html)
* [Part 2: Using Python to tackle the CPS](http://tomaugspurger.github.io/tackling%20the%20cps%20%28part%202%29.html)
* [Part 3: Using Python to tackle the CPS](http://tomaugspurger.github.io/tackling%20the%20cps%20%28part%203%29.html)
* [Part 4: Using Python to tackle the CPS](http://tomaugspurger.github.io/tackling%20the%20cps%20%28part%204%29.html)


### Import preliminaries

In [49]:
# Import packages
import pandas as pd
print(f'pandas {pd.__version__}')
import re, wquantiles

pandas 0.22.0


### Data file and data dictionary from Census FTP site

In [50]:
# Raw data from Census FTP site
datafile = 'data/asec2017_pubuse.dat'

# Data dictionary
dd_txt = 'data/08ASEC2017_Data_Dict_Full.txt'
dd = open(dd_txt, 'r', encoding='iso-8859-1').read()

### Obtain column and variable information from data dictionary

In [51]:
# Retrieve column info from dictionary
p = re.compile('D (\w+\-?\w+?)\s+(\d{1,2})\s+(\d+)\s+')
var_key = pd.DataFrame(p.findall(dd), columns=['Var', 'Len', 'Loc'])
var_key = var_key.apply(pd.to_numeric, errors='ignore')

# Filter out columns of interest
s = ['A_AGE', 'A_SEX', 'MARSUPWT', 'PRECORD', 'WEWKRS', 'MCAID']
s_key = var_key[var_key['Var'].isin(s)]

### Read file into memory

In [52]:
# Read raw fwf file
data = pd.read_fwf(datafile, header=None, names=list(s_key.Var),
                 colspecs=list(zip(s_key.Loc-1, s_key.Loc + s_key.Len-1)))

### Make calculation

Estimate from census site: 7,639,000

In [53]:
# Keep the person records (3)
df = data.loc[(data['PRECORD'] == 3) & 
              #(data['A_SEX'] == 2) & 
              (data['A_AGE'].between(25,54))]

# Filter by age and calculate total
total = int((df['MARSUPWT'].astype(float).sum() / 100.0).round(-3))

# Filter by those who have health insurance through medicaid
medicaid = int((df[(df['MCAID'] == 1)
                  ]['MARSUPWT'].astype(float).sum() / 100.0).round(-3))

# Filter those who worked full time full year and have medicaid
workft = int((df[(df['WEWKRS'] == 1) & 
                  (df['MCAID'] == 1)
                 ]['MARSUPWT'].astype(float).sum() / 100.0).round(-3))

# Filter those who worked full time full year and have medicaid
workaid = int((df[(df['WEWKRS'] != 5) & 
                  (df['MCAID'] == 1)
                 ]['MARSUPWT'].astype(float).sum() / 100.0).round(-3))

In [54]:
medicaid / total

0.1456346033580583

In [55]:
workaid / medicaid

0.5726477143012447

In [56]:
workaid

10535000