Basic Monthly CPS Files in Python 2.7
=====

## Calculate monthly employment to population ratio

-----

*September 5, 2017*<br>
*Brian Dew, dew@cepr.net*


Census CPS Monthly files can be downloaded [here](http://thedataweb.rm.census.gov/ftp/cps_ftp.html)

Data dictionary is found [here](http://thedataweb.rm.census.gov/pub/cps/basic/201701-/January_2017_Record_Layout.txt)

Cross check with BLS summary statistics [here](https://data.bls.gov/timeseries/LNU02300062)

To match the raw data to education categories use [this](http://ceprdata.org/wp-content/cps/programs/basic/cepr_basic_educ.do) file 

In [7]:
import pandas as pd
import os

In [8]:
colspecs = [(15,17), (17,21), (121,123), (128,130), (136,138), (392,394), (845,855)]
colnames = ['month', 'year', 'age', 'PESEX', 'PEEDUCA', 'PREMPNOT', 'fnlwgt']

educ_dict = {31: 'LTHS',
             32: 'LTHS',
             33: 'LTHS',
             34: 'LTHS',
             35: 'LTHS',
             36: 'LTHS',
             37: 'LTHS',
             38: 'HS',
             39: 'HS',
             40: 'Some college',
             41: 'Some college',
             42: 'Some college',
             43: 'College',
             44: 'Advanced',
             45: 'Advanced',
             46: 'Advanced',
            }

gender_dict = {1: 0, 2: 1}

empl_dict = {1: 1, 2: 0, 3: 0, 4: 0}

In [9]:
data = pd.DataFrame()

for file in os.listdir("C:\Working\Python\CPS\data"):
    if file.endswith(".dat"):
        df = pd.read_fwf('data/{}'.format(file), colspecs=colspecs, header=None)
        
        df.columns = colnames
        df['educ'] = df['PEEDUCA'].map(educ_dict)
        df['female'] = df['PESEX'].map(gender_dict)
        df['empl'] = df['PREMPNOT'].map(empl_dict)
        
        data = data.append(df)
        
        df = df[(df['age'] >= 25) & 
                (df['age'] <=54) &
                (df['female'] == 1)]
    
        emp = df.groupby('empl').sum()['fnlwgt'].iloc[1]
        pop = df.groupby('empl').sum()['fnlwgt'].astype('float').sum()
        epop_raw = emp/pop
        epop = str(round(epop_raw * 100, 1))
        print '{} 2017: Women, age 25-54: {}'.format(file[:3], epop) 

apr 2017: Women, age 25-54: 72.3
feb 2017: Women, age 25-54: 71.8
jan 2017: Women, age 25-54: 71.3
jul 2017: Women, age 25-54: 71.4
jun 2017: Women, age 25-54: 71.5
mar 2017: Women, age 25-54: 72.2
may 2017: Women, age 25-54: 72.0


In [10]:
len(data)

1044059

In [11]:
data.to_stata('C:\Working\EPOPs\data\cepr_org_2017.dta')

In [12]:
data = data[(data['age'] >= 25) & 
            (data['age'] <=54) &
            (data['female'] == 1)]
    
emp = data.groupby('empl').sum()['fnlwgt'].iloc[1]
pop = data.groupby('empl').sum()['fnlwgt'].astype('float').sum()
epop_raw = emp/pop
epop = str(round(epop_raw * 100, 1))
print '2017: Women, age 25-54: {}'.format(epop) 

2017: Women, age 25-54: 71.8
