In [1]:
library(foreign)
library(data.table)

To assess whether this experiment was able to generate a random assignment of individuals into treatment and control, we're going to focus only on a few representative varaibles that are reported by David Reiley in the lecture: 

- An indicator for sex
- Whether the individual is nonwhite
- Age
- Family income
- Medical Spending
- General Health Index 
- Cholesterol 
- Mental Health Index 

The data is presently packaged across three tables, which means that we will have to organize this data some before we can start to work on assessing the randomization. 

First, we load the data

In [2]:
demographics <- read.dta('./Data/rand_initial_sample_2.dta')
# person_years <- read.dta('./Data/person_years.dta')
# spending     <- read.dta('./Data/annual_spend.dta')

And convert all the `data.frames` to data tables. 

In [3]:
demographics <- data.table(demographics)
# person_years <- data.table(person_years)
# spending     <- data.table(spending)

From the demographics, we're going to keep `age`, `sex`, and `race`, and `income1` indicators. As well, we're going to keep the `person` indicator and set this as the *key*, or index variable for this data. 

In [4]:
d <- demographics[ , .(person, female, blackhisp, age, educper, 
                       income1cpi, hosp, ghindx, cholest, diastol, 
                       systol, mhi, rand_plan_group)]

That `rand_plan_group` variable has a lot of long names in it with strange characters

In [5]:
d[ , table(rand_plan_group)]

rand_plan_group
     Free Care      25% Coins    Mixed Coins      50% Coins 95%/100% Coins 
          1295            432            333            257            759 
   Indv Deduct 
           881 

So, I'm going to make a shorter version

In [6]:
d[rand_plan_group == "Free Care",      short_plan := "free"]
d[rand_plan_group == "25% Coins",      short_plan := "25"]
d[rand_plan_group == "Mixed Coins",    short_plan := "mixed"]
d[rand_plan_group == "50% Coins",      short_plan := "50"]
d[rand_plan_group == "95%/100% Coins", short_plan := "95"]
d[rand_plan_group == "Indv Deduct",    short_plan := "deduct"]

In [7]:
summary_table <- d[ , .(prop_female = mean(female, na.rm = TRUE), 
                        prop_blackhist = mean(blackhisp, na.rm = TRUE), 
                        mean_age = mean(age, na.rm = TRUE), 
                        mean_income = mean(income1cpi, na.rm = TRUE),
                        mean_educper = mean(educper, na.rm = TRUE), 
                        mean_cholset = mean(cholest, na.rm = TRUE), 
                        mean_mhi = mean(mhi, na.rm = TRUE)), 
                   keyby = short_plan]
summary_table

short_plan,prop_female,prop_blackhist,mean_age,mean_income,mean_educper,mean_cholset,mean_mhi
,,,,,,206.9402,74.03323
25,0.5300926,0.1536313,33.86111,34911.66,12.27045,205.3929,75.60877
50,0.5214008,0.1090047,33.51751,36383.67,11.97598,209.1133,75.64553
95,0.5599473,0.1716667,32.361,31603.21,12.10483,207.3021,73.84584
deduct,0.5368899,0.1528752,32.92168,29498.82,11.94804,205.882,73.7257
free,0.5220077,0.1435794,32.79598,30627.02,11.84211,202.0558,74.7359
mixed,0.5525526,0.1615385,32.48649,26485.05,11.79791,202.2582,73.8319


This is a really informative table: 

- Across the columns the variables that we're inspecting;
- Down the rows are the plans were assigned to individuals. 

For example: 

- If we simply look at the proportion of female that are in each of the plans, it looks as though there are between 52% and 55% women across the plan types.
- In table 1.3, panel A, we see that in the catastrophic plan, the mean income is reported as being 31,603. This is the same level that we are reporting here in the 95/100 coins plan. 

But, this isn't exactly what David or the authors of *Mastering Metrics* were showing. This provides mean values for these data in each treatment group, but does not show tests for differences in the levels between the groups. 

In the cell below, we produce the same output that David Reiley shows by: 

- Subsetting to the 95% (catastrophic) and the individual deductable plans
- Calculating these same means
- Calculated the difference in these means

In [8]:
catastrophic_deductable <- d[short_plan %in% c("95", "deduct"), 
    .(prop_female = mean(female, na.rm = TRUE),
      prop_blackhist = mean(blackhisp, na.rm = TRUE), 
      mean_age = mean(age, na.rm = TRUE), 
      mean_income = mean(income1cpi, na.rm = TRUE) 
     ), 
   keyby = .(short_plan)]

In [9]:
catastrophic_deductable[ , .(diff_female = diff(prop_female), 
                             diff_blackhist = diff(prop_blackhist), 
                             diff_age = diff(mean_age), 
                             diff_income = diff(mean_income))]

diff_female,diff_blackhist,diff_age,diff_income
-0.0230574,-0.01879149,0.5606786,-2104.385


# Conduct this check with Regression 

The *Field Experiments* textbook authors suggest a more direct check for balance on these covaraites. Because the treatment that we are assigning to individuals is being assigned **at random** it should **NOT** be the case that any feature that we measure provides us leverage to predict whether one person will be in a particular treatment condition. 

That is -- if treatment are random, then I can't predict which one you've got! 

To conduct this check, let's use a concept from 203: The F-test. Here we're going to fit two models. 

- The first model has no model features, just an intercept. 
- The second model has many model features. 

We're going to test whether the many model features improve the ability of our model to predict the treatment condition using a F-test. The null hypothesis for this test is that the models predict the same amount of variance in the outcome, and rejecting the null would mean that one of the models outperforms the other. 

In [10]:
d <- na.omit(d, cols = c("female", "blackhisp", "age", "educper", "income1cpi"))

In [11]:
short_mod <- d[short_plan %in% c("95", "deduct"), 
              lm(I(short_plan == "95") ~ 1)]
long_mod <- d[short_plan %in% c("95", "deduct"), 
               lm(I(short_plan == "95") ~ female + blackhisp + age + educper + income1cpi)]

With these two models fit, we can actually look at the income indicator that David Reiley calls out in the lecture. It *does* in fact, seem that this income feature is associated with being in one or the other of the treatment  conditions, but as David points out, we're makign a *lot* of checks.

In [12]:
summary(long_mod)


Call:
lm(formula = I(short_plan == "95") ~ female + blackhisp + age + 
    educper + income1cpi)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.6437 -0.4608 -0.4037  0.5333  0.6400 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.741e-01  9.058e-02   4.130 3.86e-05 ***
female       4.105e-02  2.863e-02   1.434   0.1519    
blackhisp    7.073e-02  4.305e-02   1.643   0.1006    
age         -9.266e-04  1.316e-03  -0.704   0.4815    
educper      2.423e-03  5.291e-03   0.458   0.6470    
income1cpi   1.918e-06  9.207e-07   2.083   0.0375 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4981 on 1231 degrees of freedom
Multiple R-squared:  0.006487,	Adjusted R-squared:  0.002451 
F-statistic: 1.607 on 5 and 1231 DF,  p-value: 0.1551


To actually conduct the test for these models, we'll use the `anova` method, and pass a test argument that calls for an *F-test*. 

In [13]:
anova(long_mod, short_mod, test = "F")

Res.Df,RSS,Df,Sum of Sq,F,Pr(>F)
1231,305.3548,,,,
1236,307.3484,-5.0,-1.99364,1.607423,0.1550962


The results of this tell us the following: 

- The long model has used 5 more parameters than the short model (Df = 5)
- The F statistic generated is 1.6
- **Crucially** the probabilty of this difference arising under the null hypothesis is 15%. 

And so, we conclude from this test that there is not evidence to suggest that these features cause study participants to change their assigned treatment. Or, *there is no evidence to suggest that the randomization did not work in this case.* 

# Questions for Understanding 

Using the same data from above, conduct a test, using regression and an F-test, for whether the randomization produced balance between the 25% and 50% plan groups. 

In [14]:
short_mod <- ""
long_mod <- ""

In [15]:
# anova()