In [1]:
library(data.table)
library(AER)
library(stargazer)

Loading required package: car
Loading required package: lmtest
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: sandwich
Loading required package: survival

Please cite as: 

 Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2. http://CRAN.R-project.org/package=stargazer 



# Estimate Section 5.8 
On page 157 of _Field Experiments_ Green and Gerber walk through estimating and reporting the compliers average causal effect (CACE) and intent to treat effect (ITT) for an experiment that they conducted in New Haven. We would like you to also produce these estimates. 

The next line of code loads their replication data. 

In [2]:
d <- fread("http://hdl.handle.net/10079/70rxwqn")

In this data, we select one-person households that were either pure controls or canvass only households. 

> *Yes, you're right* we are essentially limiting our inference to a very small subset of the US population who live by themselves. 

In [5]:
d <- d[onetreat == 1 & mailings == 0 & phongotv == 0 & persons == 1, ]

This data set has 26 variables, most of which we're not going to be occupied with in this notebook. Most of the variables have human-legible variable names, but we'll clean up a few. 

In [9]:
setnames(d, old = c('v98', 'persngrp', 'cntany'), c('VOTED', 'ASSIGNED', 'TREATED'))

In [10]:
str(d)

Classes ‘data.table’ and 'data.frame':	7090 obs. of  26 variables:
 $ V1        : chr  "3" "4" "10" "15" ...
 $ id1       : int  14210 6818 5039 20246 399 4152 20242 13269 21206 13117 ...
 $ persons   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ v98_1     : int  1 0 0 0 0 0 0 0 0 1 ...
 $ v98_2     : int  1 0 0 0 0 0 0 0 0 1 ...
 $ ASSIGNED  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ mailings  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ phongotv  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ward      : int  15 25 4 17 17 26 17 15 25 4 ...
 $ majpty1   : int  0 0 1 1 0 1 0 1 0 1 ...
 $ majpty2   : int  0 0 1 1 0 1 0 1 0 1 ...
 $ age1      : int  54 45 72 28 30 30 59 27 25 82 ...
 $ age2      : int  54 45 72 28 30 30 59 27 25 82 ...
 $ placebo   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ vote98    : int  1 0 0 0 0 0 0 0 0 1 ...
 $ VOTED     : int  1 0 0 0 0 0 0 0 0 1 ...
 $ TREATED   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pcntany   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ v96_1     : int  1 0 1 1 0 0 0 0 1 1 ...
 $ v96_0     : int  0 0 0 0 0 0 1 1 

Here is the minimum meta-data that we think you need to conduct this work: 
- The outcome variable, `VOTED`, in the New Haven voter mobilization experiments is score 1 if the subject voted and 0 otherwise. 
- The assigned treatment is called `ASSIGNED`, and is scored 1 if the subject was asssigend to receive a visit from a canvasser and 0 otherwise. 
- The measurement for compliance is called `TREATED` and is scored 1 if the subject received any contact from the canvassers. 

A quick cross-tab on the `ASSIGNED` and `TREATED` show that there was one-sided non-compliance. 

In [11]:
d[ , table(ASSIGNED, TREATED)]

        TREATED
ASSIGNED    0    1
       0 5645    0
       1 1050  395

# 1. Reproduce a Table 

Can you reproduce the table that is reported on page 150, table 5.2? 
- Don't worry about formating the table to be pretty/handome (or even just tidy). 
- Can you, using conditional statements and slices of the data, produce the values that are reported in this? 

We will do the first one for you. 

In [12]:
d[TREATED==1, .(turnout_rate = mean(VOTED), 
                number_contacted = .N)]

turnout_rate,number_contacted
0.5443038,395


# 2. Estimate the ITT, $\alpha$, and CACE using means 

In [13]:
ITT   <- ''
alpha <- ''
CACE <- ITT / alpha

ERROR: Error in ITT/alpha: non-numeric argument to binary operator


# 3. Produce the CACE using two least-squares models
1. Estimate one model `first` that is the first-stage regression in a 2SLS. 
2. Estimate one model `second` that is the second-stage regression in a 2SLS. 

(*Hint*: You shouldn't need it, but Green and Gerber walk us through this pretty clearly.) 

# 4. Produce the CACE using 2SLS 

Use the `ivreg` method from `AER` to estimate this CACE. 

1. Does this estimate match with what you estimated when you did this yourself? 
2. Use the `confint` method on the result of the model that you estimate. Does this confidence interval for the CACE overlap with zero? Given what you see, would you conclude that there is, or is not, a CACE of being treated? 