In [None]:
install.packages(c("lmtest","sandwich","xtable"))

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependency ‘zoo’




# Analyzing RCT data with Precision Adjustment

## Data

In this lab, we analyze the Pennsylvania re-employment bonus experiment, which was previously studied in "Sequential testing of duration data: the case of the Pennsylvania ‘reemployment bonus’ experiment" (Bilias, 2000), among others. These experiments were conducted in the 1980s by the U.S. Department of Labor to test the incentive effects of alternative compensation schemes for unemployment insurance (UI). In these experiments, UI claimants were randomly assigned either to a control group or one of five treatment groups. Actually, there are six treatment groups in the experiments. Here we focus on treatment group 4, but feel free to explore other treatment groups. In the control group the current rules of the UI applied. Individuals in the treatment groups were offered a cash bonus if they found a job within some pre-specified period of time (qualification period), provided that the job was retained for a specified duration. The treatments differed in the level of the bonus, the length of the qualification period, and whether the bonus was declining over time in the qualification period; see http://qed.econ.queensu.ca/jae/2000-v15.6/bilias/readme.b.txt for further details on data.


First, please load the data set.

In [None]:
## loading the data
Penn <- as.data.frame(read.table("/content/penn_jaedat.sec", header=T ))
n <- dim(Penn)[1]
p_1 <- dim(Penn)[2]
Penn<- subset(Penn, tg==4 | tg==0)
attach(Penn)

In [None]:
head(Penn)

Unnamed: 0_level_0,abdt,tg,inuidur1,inuidur2,female,black,hispanic,othrace,dep,q1,⋯,q5,q6,recall,agelt35,agegt54,durable,nondurable,lusd,husd,muld
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,10824,0,18,18,0,0,0,0,2,0,⋯,1,0,0,0,0,0,0,0,1,0
4,10824,0,1,1,0,0,0,0,0,0,⋯,1,0,0,0,0,0,0,1,0,0
5,10747,0,27,27,0,0,0,0,0,0,⋯,0,0,0,0,0,0,0,1,0,0
12,10607,4,9,9,0,0,0,0,0,0,⋯,0,0,0,1,0,0,0,0,0,1
13,10831,0,27,27,0,0,0,0,1,0,⋯,1,0,0,0,1,1,0,1,0,0
14,10845,0,27,27,1,0,0,0,0,0,⋯,1,0,0,0,1,0,0,1,0,0


Our treatment $D=T4$ is given by

In [None]:
T4 <- (tg==4)

**Exercise 1:** How many individuals got the "treatment"?

In [None]:
summary(T4)

   Mode   FALSE    TRUE 
logical    3354    1745 

### Model
To evaluate the impact of the treatments on unemployment duration, we consider the linear regression model:

$$
Y =  D \beta_1 + W'\beta_2 + \varepsilon, \quad E \varepsilon (D,W')' = 0,
$$

where $Y$ is  the  log of duration of unemployment, $D$ is a treatment  indicator,  and $W$ is a set of controls including age group dummies, gender, race, number of dependents, quarter of the experiment, location within the state, existence of recall expectations, and type of occupation.   Here $\beta_1$ is the ATE, if the RCT assumptions hold rigorously.


We also consider interactive regression model:

$$
Y =  D \alpha_1 + D W' \alpha_2 + W'\beta_2 + \varepsilon, \quad E \varepsilon (D,W', DW')' = 0,
$$
where $W$'s are demeaned (apart from the intercept), so that $\alpha_1$ is the ATE, if the RCT assumptions hold rigorously.

Under RCT, the projection coefficient $\beta_1$ has
the interpretation of the causal effect of the treatment on
the average outcome. We thus refer to $\beta_1$ as the average
treatment effect (ATE). Note that the covariates, here are
independent of the treatment $D$, so we can identify $\beta_1$ by
just linear regression of $Y$ on $D$, without adding covariates.
However we do add covariates in an effort to improve the
precision of our estimates of the average treatment effect.

### Analysis

We consider

*  classical 2-sample approach, no adjustment (CL)
*  classical linear regression adjustment (CRA)
*  interactive regression adjustment (IRA)

and carry out robust inference.

# Carry out covariate balance check

**Exercise 2:** To check if the distribution of covariates is the same under both treatment and control, run a regression of $T4$ on the covariates (plus interactions). Then, perform statistical tests of the estimated coefficients using the function *coeftest* from the package *lmtest* to analyze the dependency (correlation) of the treatment variable and the covariates. Interpret your findings.

In [None]:
m <- lm(T4~(female+black+othrace+factor(dep)+q2+q3+q4+q5+q6+agelt35+agegt54+durable+lusd+husd))
library(lmtest)
library(sandwich)
#coeftest(m,vcov = vcovHC(m, type="HC1"))
coeftest(m)



t test of coefficients:

                Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)   0.35682845  0.06269314  5.6917 1.329e-08 ***
female       -0.00562718  0.01377350 -0.4086   0.68289    
black         0.00775924  0.02095227  0.3703   0.71115    
othrace       0.14841671  0.07844127  1.8921   0.05854 .  
factor(dep)1  0.00092668  0.02141319  0.0433   0.96548    
factor(dep)2  0.01092135  0.01853403  0.5893   0.55571    
q2           -0.04474773  0.06202359 -0.7215   0.47066    
q3           -0.04556136  0.06188587 -0.7362   0.46163    
q4           -0.02424388  0.06192073 -0.3915   0.69542    
q5           -0.03375491  0.06165398 -0.5475   0.58407    
q6           -0.11270865  0.06581651 -1.7125   0.08687 .  
agelt35       0.03295492  0.01461287  2.2552   0.02416 *  
agegt54       0.02283078  0.02340103  0.9756   0.32929    
durable      -0.01653621  0.01904092 -0.8685   0.38519    
lusd          0.02977021  0.01620375  1.8372   0.06623 .  
husd         -0.00394010  0.01

Since we find significant coeffecients, we see that that even though this is a randomized experiment, balance conditions are failed.

# Model Specification

Consider the following model specifications:

In [None]:
# model specifications


# no adjustment (2-sample approach)
formula_cl <- log(inuidur1)~T4

# adding controls
formula_cra <- log(inuidur1)~T4+ (female+black+othrace+factor(dep)+q2+q3+q4+q5+q6+agelt35+agegt54+durable+lusd+husd)^2
# Omitted dummies: q1, nondurable, muld

**Exercise 3:** Fit an ols regression to both models and perform statistical tests of the estimated treatment effect using the function *coeftest* from the package *lmtest*. Compare the standard errors.

In [None]:
ols.cl <- lm(formula_cl)
ols.cra <- lm(formula_cra)


ols.cl = coeftest(ols.cl, vcov = vcovHC(ols.cl, type="HC1"))
ols.cra = coeftest(ols.cra, vcov = vcovHC(ols.cra, type="HC1"))

print(ols.cl[2,])
#print(ols.cra)
print(ols.cra[2,])

   Estimate  Std. Error     t value    Pr(>|t|) 
-0.08545541  0.03585569 -2.38331509  0.01719391 
   Estimate  Std. Error     t value    Pr(>|t|) 
-0.07968012  0.03559092 -2.23877661  0.02521432 


Next, consider the interactive specificaiton which corresponds to the approach introduced in Lin (2013).

In [None]:
#interactive regression model;

X = model.matrix (~ (female+black+othrace+factor(dep)+q2+q3+q4+q5+q6+agelt35+agegt54+durable+lusd+husd)^2)[,-1]
dim(X)
demean<- function(x){ x - mean(x)}
X = apply(X, 2, demean)
T4 = demean(T4)


**Exercise 4:** Regress Y on $T4\cdot X$ using ols to fit the interactive specification and have a look at the estimated coefficients

In [None]:
ols.ira = lm(log(inuidur1) ~ T4*X)
ols.ira= coeftest(ols.ira, vcov = vcovHC(ols.ira, type="HC1"))
print(ols.ira)


t test of coefficients:

                            Estimate  Std. Error  t value  Pr(>|t|)    
(Intercept)               2.03177499  0.01687682 120.3885 < 2.2e-16 ***
T4                       -0.07550055  0.03560489  -2.1205 0.0340132 *  
Xfemale                  -0.12942734  0.31860760  -0.4062 0.6845928    
Xblack                   -0.92100431  0.24598526  -3.7441 0.0001831 ***
Xothrace                 -3.29328494  0.63054001  -5.2230 1.834e-07 ***
Xfactor(dep)1            -0.76246913  0.50426996  -1.5120 0.1305919    
Xfactor(dep)2             0.01931125  0.35130594   0.0550 0.9561647    
Xq2                      -0.13711567  0.39125789  -0.3504 0.7260173    
Xq3                      -0.52117189  0.39083482  -1.3335 0.1824351    
Xq4                      -0.41958694  0.39156262  -1.0716 0.2839658    
Xq5                      -0.33617506  0.38967611  -0.8627 0.3883426    
Xq6                      -0.47296422  0.39510320  -1.1971 0.2313392    
Xagelt35                 -0.61482757  

## Results

**Exercise 5:**

Summarize your findings.

In [None]:
library(xtable)
table<- matrix(0, 2, 3)
table[1,1]<-  ols.cl[2,1]
table[1,2]<-  ols.cra[2,1]
table[1,3]<-  ols.ira[2,1]

table[2,1]<-  ols.cl[2,2]
table[2,2]<-  ols.cra[2,2]
table[2,3]<-  ols.ira[2,2]


colnames(table)<- c("CL","CRA","IRA")
rownames(table)<- c("estimate", "standard error")
tab<- xtable(table, digits=5)
tab

#print(tab, type="latex", digits=5)

Unnamed: 0_level_0,CL,CRA,IRA
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>
estimate,-0.08545541,-0.07968012,-0.07550055
standard error,0.03585569,0.03559092,0.03560489


Treatment group 4 experiences an average decrease of about $7.8\%$ in the length of unemployment spell.


Observe that regression estimators delivers estimates that are slighly more efficient (lower standard errors) than the simple 2 mean estimator, but essentially all methods have very similar standard errors. From IRA results we also see that there might be heterogeneity in the effect.  We also see the regression estimators offer slightly lower estimates -- these difference occur perhaps to due minor imbalance in the treatment allocation, which the regression estimators try to correct.


