This script matched MEPS medicare benefit amount in 2014 to CPS individual records marked as beneficiaries by age, gender, income and census region. Please refer to the description in current directory for more details on methodology. The input donor micro dataset is from Medical Expenditure Panel Survey (MEPS), 2014 full year consolidated file, which contains individual level Medicare benefits. The output is a CPS-based file, with personal level ID from both CPS and MEPS, and match benefit amount.

In [61]:
import pandas as pd
import numpy as np
import random

In [62]:
PATH = 'WORKING DIRECTORY PATH'

In [63]:
# h171.csv is the MEPS 2014 full year consolidated file
# available from MEPS website
raw_MEPS = pd.read_csv(str(PATH + 'h171.csv'))

id_for_analysis = ['DUPERSID','PANEL', 'WAGEP14X', 'REGION14', 'PERWT14F', 'SEX','AGE14X', 'TOTMCR14']
MEPS_medicare = raw_MEPS[id_for_analysis]

In [64]:
# Keep records with positive medicare expenses
MEPS_medicare['yes_to_mc'] = np.where(MEPS_medicare.TOTMCR14!=0, 1, 0)
MEPS_medicare = MEPS_medicare[MEPS_medicare.yes_to_mc==1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [65]:
# Adjust MEPS variable to prepare for the match
MEPS_medicare.SEX = np.where(MEPS_medicare.SEX=='2 FEMALE', 'Female', 'Male')

MEPS_medicare.REGION14 = MEPS_medicare.REGION14.str.split(' ', expand=True, n = 1).get(1).values
MEPS_medicare.REGION14 = MEPS_medicare.REGION14.str.capitalize()

In [66]:
# Import CPS
CPS = pd.read_csv('../../Dropbox/asec2014_pubuse.csv')
medicare_columns = ['mcare','peridnum','marsupwt', 'wsal_val', 'a_age', 'a_sex', 'gereg']
CPS = CPS[medicare_columns]

In [67]:
# prepare variables for the match
CPS.wsal_val = np.where(CPS.wsal_val=="None or not in universe", 0, CPS.wsal_val)
CPS.wsal_val = pd.to_numeric(CPS.wsal_val)

CPS.a_age = np.where(CPS.a_age == "80-84 years of age", random.randrange(80, 84), CPS.a_age)
CPS.a_age = np.where(CPS.a_age == "85+ years of age", random.randrange(85, 95), CPS.a_age)
CPS.a_age = pd.to_numeric(CPS.a_age)

In [68]:
CPS['MEPS_ID'] = np.zeros(len(CPS))

In [69]:
# Keep medicare recipients only
CPS = CPS[CPS.mcare=='Yes']

In [70]:
len(CPS)

18216

In [71]:
Region = ['Northeast', 'South', 'Midwest', 'West']
Gender = ['Male', 'Female']

In [72]:
random.seed(1)

In [73]:
for this_area in Region:
    for this_gender in Gender:
        subset_CPS = CPS[(CPS.gereg==this_area)*(CPS.a_sex==this_gender)]
        MEPS_donor = MEPS_medicare[(MEPS_medicare.REGION14==this_area)*(MEPS_medicare.SEX==this_gender)]
        
        for i, record in subset_CPS.iterrows():
            age_range = [record.a_age - 2, record.a_age + 2]
            income_range = [record.wsal_val - 100, record.wsal_val + 100]
            
            f1 = (MEPS_donor.AGE14X >= age_range[0])
            f2 = (MEPS_donor.AGE14X <= age_range[1])
            f3 = (MEPS_donor.WAGEP14X >= income_range[0])
            f4 = (MEPS_donor.WAGEP14X <= income_range[1])
            pool = MEPS_donor[f1 * f2 * f3 * f4]
            
            number_donors = len(pool)
            if number_donors < 1:
                # release income constraints and find the person w/ closest income
                pool = MEPS_donor[f1 * f2]
                number_donors = len(pool)
                
                if number_donors < 1:
                    if record.a_age < 85:
                        print 'dont have anyone in this age range'
                        print age_range
                        continue
                    else:
                        pool = MEPS_donor[MEPS_donor.AGE14X==85]

                closest_wage = min(pool.WAGEP14X, key=lambda x:abs(x-record.wsal_val))
                CPS.MEPS_ID[CPS.peridnum==record.peridnum] = pool.DUPERSID[pool.WAGEP14X==closest_wage].values[0]    

            else:
                row_number = random.randint(1, number_donors) - 1
                index = pool.DUPERSID.index[row_number]
                CPS.MEPS_ID[CPS.peridnum==record.peridnum] = pool.DUPERSID.loc[index] 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


dont have anyone in this age range
[-2, 2]
dont have anyone in this age range
[-1, 3]
dont have anyone in this age range
[9, 13]
dont have anyone in this age range
[10, 14]
dont have anyone in this age range
[24, 28]
dont have anyone in this age range
[24, 28]
dont have anyone in this age range
[24, 28]
dont have anyone in this age range
[24, 28]
dont have anyone in this age range
[24, 28]
dont have anyone in this age range
[24, 28]
dont have anyone in this age range
[10, 14]
dont have anyone in this age range
[10, 14]


In [74]:
CPS.marsupwt[CPS.MEPS_ID!=0].sum()

48956575.720000215

In [75]:
CPS['DUPERSID'] = CPS.MEPS_ID
CPS = pd.merge(CPS, MEPS_medicare, on='DUPERSID', how='left')

In [76]:
(CPS.marsupwt*CPS.TOTMCR14)[CPS.MEPS_ID!=0].sum()/1000000000

414.00183597926963

In [77]:
# scaler  576/417
ratio = 516.0000/417.0000

In [78]:
CPS["MedicareX"] = np.zeros(len(CPS))
CPS.MedicareX = CPS.TOTMCR14 * ratio

In [79]:
CPS[['peridnum', 'DUPERSID', 'MedicareX']].to_csv('medicare14.csv', index=False)