This script matched MEPS medicaid benefit amount in 2014 to CPS individual records marked as beneficiaries by age, gender, income and census region. Please refer to the description in current directory for more details on methodology. The input donor micro dataset is from Medical Expenditure Panel Survey (MEPS), 2014 full year consolidated file, which contains individual level Medicare benefits. The output is a CPS-based file, with personal level ID from both CPS and MEPS, and match benefit amount.

In [1]:
import pandas as pd
import numpy as np
import random

In [2]:
PATH = 'WORKING DIRECTORY PATH'

In [3]:
# h171.csv is the MEPS 2014 full year consolidated file
# available from MEPS website
raw_MEPS = pd.read_csv(str(PATH + 'h171.csv'))

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
# variables for matching process
id_for_analysis = ['DUPERSID','PANEL', 'WAGEP14X', 'REGION14', 'PERWT14F', 'SEX','AGE14X', 'TOTMCD14']
MEPS_medicaid = raw_MEPS[id_for_analysis]

In [6]:
# Keep MEPS records with positive medicaid benefits
MEPS_medicaid['yes_to_md'] = np.where(MEPS_medicaid.TOTMCD14!=0, 1, 0)
MEPS_medicaid = MEPS_medicaid[MEPS_medicaid.yes_to_md==1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [8]:
# adjust variable value formats, preparing for the match
MEPS_medicaid.SEX = np.where(MEPS_medicaid.SEX=='2 FEMALE', 'Female', 'Male')
MEPS_medicaid.REGION14 = MEPS_medicaid.REGION14.str.split(' ', expand=True, n = 1).get(1).values
MEPS_medicaid.REGION14 = MEPS_medicaid.REGION14.str.capitalize()

In [15]:
# import CPS dataset and keep relevant variables
CPS = pd.read_csv('/Users/Amy/Dropbox/OSPC - Shared/CPS/cpsmar2014t.csv')
medicaid_columns = ['mcaid','peridnum','marsupwt', 'wsal_val', 'a_age', 'a_sex', 'gereg']
CPS = CPS[medicaid_columns]

In [17]:
# adjust variables to prepare for the match
CPS.wsal_val = np.where(CPS.wsal_val=="None or not in universe", 0, CPS.wsal_val)
CPS.wsal_val = pd.to_numeric(CPS.wsal_val)

# replacing range of certain age with specific number
# assigns random number in the range as actual age, not relevant in the match
CPS.a_age = np.where(CPS.a_age == "80-84 years of age", random.randrange(80, 84), CPS.a_age)
CPS.a_age = np.where(CPS.a_age == "85+ years of age", random.randrange(85, 95), CPS.a_age)
CPS.a_age = pd.to_numeric(CPS.a_age)

In [19]:
CPS['MEPS_ID'] = np.zeros(len(CPS))

In [20]:
# keep individuals who claim receiving medicaid
CPS = CPS[CPS.mcaid=='Yes']

In [21]:
len(CPS)

26117

In [22]:
Region = ['Northeast', 'South', 'Midwest', 'West']
Gender = ['Male', 'Female']

In [23]:
random.seed(1)

In [24]:
for this_area in Region:
    for this_gender in Gender:
        subset_CPS = CPS[(CPS.gereg==this_area)*(CPS.a_sex==this_gender)]
        MEPS_donor = MEPS_medicaid[(MEPS_medicaid.REGION14==this_area)*(MEPS_medicaid.SEX==this_gender)]
        
        for i, record in subset_CPS.iterrows():
            age_range = [record.a_age - 2, record.a_age + 2]
            income_range = [record.wsal_val - 100, record.wsal_val + 100]
            
            f1 = (MEPS_donor.AGE14X >= age_range[0])
            f2 = (MEPS_donor.AGE14X <= age_range[1])
            f3 = (MEPS_donor.WAGEP14X >= income_range[0])
            f4 = (MEPS_donor.WAGEP14X <= income_range[1])
            pool = MEPS_donor[f1 * f2 * f3 * f4]
            
            number_donors = len(pool)
            if number_donors < 1:
                # release income constraints and find the person w/ closest income
                pool = MEPS_donor[f1 * f2]
                number_donors = len(pool)
                
                if number_donors < 1:
                    if record.a_age < 85:
                        print 'dont have anyone in this age range'
                        print age_range
                        continue
                    else:
                        pool = MEPS_donor[MEPS_donor.AGE14X==85]

                closest_wage = min(pool.WAGEP14X, key=lambda x:abs(x-record.wsal_val))
                CPS.MEPS_ID[CPS.peridnum==record.peridnum] = pool.DUPERSID[pool.WAGEP14X==closest_wage].values[0]    

            else:
                row_number = random.randint(1, number_donors) - 1
                index = pool.DUPERSID.index[row_number]
                CPS.MEPS_ID[CPS.peridnum==record.peridnum] = pool.DUPERSID.loc[index] 

  unsupported[op_str]))
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [25]:
CPS.marsupwt[CPS.MEPS_ID!=0].sum()

54080496.60000048

In [26]:
CPS['DUPERSID'] = CPS.MEPS_ID
CPS = pd.merge(CPS, MEPS_medicaid, on='DUPERSID', how='left')

In [30]:
Matched_total = (CPS.marsupwt*CPS.TOTMCD14)[CPS.MEPS_ID!=0].sum()/1000000000

In [31]:
Medicaid_total_noninstitutional = 468.00 - 18.10 - 116.20 * 45 / 77
ratio = Medicaid_total_noninstitutional/Matched_total

In [33]:
CPS["MedicaidX"] = np.zeros(len(CPS))
CPS.MedicaidX = CPS.TOTMCD14 * ratio

In [34]:
CPS[['peridnum', 'DUPERSID', 'MedicaidX']].to_csv('medicaid14.csv', index=False)