Eric Boxer  
UNI ecb2198  
#### HW 1

(1) What is a causal state? What does it mean for a causal state to be finely articulated?

A causal state in an experiment is the treatment, or lack of treatment (indicating a control), assigned to participants in the experiment. In an observational study, a causal state is the treatment, or lack of treatment, selected by members of the population under observation. More generally, a causal state defines the conditions for a population of interest.  These conditions have the potential to affect one or more measurable outcomes.  
Finely articulated causal states are those conditions which can be captured in a single measurement. For example, in an experiment of the effects of college enrollment on starting salary, causal states could be finely articulated as "enrolled in college" and "did not enroll in college". A similar study with poor articulation could aim to assess the effects of college on starting salary. With the more general objective causal states are unclear, would it be appropriate to differentiate between college dropouts and graduates, public or state college attendees, students who go on to higher-education?

(2) Why can’t we measure individual level treatment effects, $Y^1 - Y^0$, in practice?

On the individual level, for individual i only $Y_i^0$ **or** $Y_i^1$ is measurable. The existence of an outcome in a given causal state excludes the possibility of having a measurement from that individual in another causal state, for well-defined states.  
The impossibility of measuring individual outcomes in multiple causal states gives rise to the problem of how to move beyond $\delta_{Naive} = \mathbb{E}(Y^1 | d = 1) - \mathbb{E}(Y^0 | d = 0)$ to the Average Treatment Effect or otherwise understanding the casual relationships within observational data.

(3) Do you think the SUTVA might be violated (why? Why not? ) for the causal effect of X on Y
when:  
(a) X = “Having a college degree” and Y = “Future earnings”  
(b) X = “Vaccinating individual i in a local population” and Y = “person i’s chances of
getting sick abroad”

(a) Yes, SUTVA might be violated. Dilution of the positive salary effects of having a college degree could result from a greater number of job-seekers with degrees. We can also imagine a study being performed on a very large scale, in which the population of one city is incentivized to get a degree whereas a control city is left alone. There could then be a concentration effect in which the treatment city with its educated workforce develops into an attractive location for high-paying businesses and wages rise for all residents, with or without a college degree.  
(b) No, SUTVA would not be violated. By measuring the incidences of sickness abroad you would not have to worry about herd immunity, which would violate SUTVA if you measured incidences of sickness in the local population. When traveling, whether individual was vaccinated would not affect the likelihood that other subjects would contract a sickness, in general.

(4) Use the notebook posted for this assignment from class to measure the ATE, ATC and ATT
in the population when the naive estimator is unbiased.

When $\delta_{Naive}$ is unbiased, $\mathbb{E}[Y^1 | D = 1] - \mathbb{E}[Y^0 | D = 0] = \mathbb{E}[\delta] = ATE$  
The naive estimator is equal to the ATE, if sources of bias are removed (baseline bias and differential treatment bias). The naive estimator is unbiased if two assumptions hold:  
A1 $\mathbb{E}[Y^1 | D = 1] = \mathbb{E}[Y^1 | D = 0]$, and  
A2 $\mathbb{E}[Y^0 | D = 1] = \mathbb{E}[Y^0 | D = 0]$  
These equalities hold if there is no difference in the expected outcome of the treatment, or control, population from the causal state. This is reasonable if the treatment is independent of $Y^1, Y^0$ like in the case of a randomized trial.  
If the naive estimator is unbiased, not only is it equal to the ATE, but also (by subsitution using A2 and A1 respectively) to the ATT $\mathbb{E}[Y^1 | D = 1] - \mathbb{E}[Y^0 | D = 1]$ and the ATC $\mathbb{E}[Y^1 | D = 0] - \mathbb{E}[Y^0 | D = 0]$, since $Y^1, Y^0$ are independent of $D$.

In [1]:
import numpy as np
import pandas as pd

In [2]:
N = 1000000
n = 1

p_treated = 0.5
treatment = np.random.binomial(n, p_treated, size=N)

p_recovery = (0.5 + treatment) / 2. # = 0.25 + 0.5 * treatment 
# If treatment_i = 1, p_recovery_i = .75 and if treatment_i = 0, p_recovery_i = .25
recovery = np.random.binomial(n, p_recovery)

X = pd.DataFrame({'treatment': treatment, 
                  'recovery': recovery})[['treatment', 'recovery']]
X.head()

Unnamed: 0,treatment,recovery
0,0,0
1,0,0
2,1,1
3,0,0
4,0,0


In [3]:
X.groupby('treatment').mean()

Unnamed: 0_level_0,recovery
treatment,Unnamed: 1_level_1
0,0.249309
1,0.749671


In [4]:
ATE = X.groupby(('treatment')).mean()[['recovery']].values[1] - X.groupby(('treatment')).mean()[['recovery']].values[0]
print('ATE = {}'.format(ATE[0]))

ATE = 0.5003626896857405


In [5]:
print('ATT = ATC = {}'.format(ATE[0]))

ATT = ATC = 0.5003626896857405


In this example from ATE_demo.ipynb the naive estimator is assumed to be unbiased because treatment was randomly assigned as treatment = np.random.binomial()

(5) Let’s explore bias with simulated data:  
(a)
Copy and modify the data generating process to introduce bias in the ATT and ATE, but
leave the ATC unbiased (hint: define the [usually unmeasured] $Y^0$ and $Y^1$ for each unit,
then examine the assumptions A1 and A2).  
(b)
Use naive estimators to measure the (potentially biased) ATE, ATC and ATT.  
(c)
Which estimates are biased? Is the bias baseline bias, differential treatment effect bias,
or both?

Need $Y^1$ to be independent of $D$ and $Y^0$ to be dependent.

In [6]:
N = 1000000
n = 1

p_treated = 0.5
treatment = np.random.binomial(n, p_treated, size=N)

p_recovery = 0.75 - (1. - treatment) / 2.
recovery = np.random.binomial(n, p_recovery, size=N)

p_recovery_cf = 0.75
recovery_cf = np.random.binomial(n, p_recovery_cf, size=N)

X = pd.DataFrame({'treatment': treatment, 
                  'recovery': recovery,
                  'recovery_cf': recovery_cf})[['treatment', 'recovery', 'recovery_cf']]
X.head()

Unnamed: 0,treatment,recovery,recovery_cf
0,0,0,0
1,0,0,0
2,0,0,1
3,1,1,0
4,1,0,1


In [7]:
N = 1000000
n = 1

p_treated = 0.5
treatment = np.random.binomial(n, p_treated, size=N)

p_recovery = 0.75 - (1 - treatment) / 2.
recovery = np.random.binomial(n, p_recovery, size=N)

p_recovery_cf = 0.75
recovery_cf = np.random.binomial(n, p_recovery_cf, size=N)

X = pd.DataFrame({'treatment': treatment, 
                  'recovery': recovery,
                  'recovery_cf': recovery_cf})[['treatment', 'recovery', 'recovery_cf']]
X.head()

Unnamed: 0,treatment,recovery,recovery_cf
0,1,1,1
1,1,0,1
2,1,1,0
3,0,1,1
4,1,1,0


In [8]:
X.groupby('treatment').mean()

Unnamed: 0_level_0,recovery,recovery_cf
treatment,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.250069,0.748357
1,0.749597,0.750245


$\mathbb{E}[Y^1|D = 1]$

In [9]:
X[X['treatment'] == 1]['recovery'].mean()

0.7495969854914777

$\mathbb{E}[Y^1|D = 0]$

In [10]:
X[X['treatment'] == 0]['recovery_cf'].mean()

0.7483570591458707

$\mathbb{E}[Y^0|D = 1]$

In [11]:
X[X['treatment'] == 1]['recovery_cf'].mean()

0.7502450088203175

$\mathbb{E}[Y^0|D = 0]$

In [12]:
X[X['treatment'] == 0]['recovery'].mean()

0.2500689975160894

Assumptions:  
A1 $\mathbb{E}[Y^1 | D = 1] = \mathbb{E}[Y^1 | D = 0]$, and  
A2 $\mathbb{E}[Y^0 | D = 1] = \mathbb{E}[Y^0 | D = 0]$  

In [13]:
A1 = abs(X[X['treatment'] == 1]['recovery'].mean() - X[X['treatment'] == 0]['recovery_cf'].mean()) < 0.01
print('A1 is {}'.format(A1))

A1 is True


In [14]:
A2 = abs(X[X['treatment'] == 1]['recovery_cf'].mean() - X[X['treatment'] == 0]['recovery'].mean()) < 0.01
print('A2 is {}'.format(A2))

A2 is False


In [15]:
delta = X[X['treatment'] == 1]['recovery'].mean() - X[X['treatment'] == 0]['recovery'].mean()
print('delta = {}'.format(delta))

delta = 0.4995279879753883


In [16]:
ATE = p_treated * (X[X['treatment'] == 1]['recovery'].mean() - X[X['treatment'] == 1]['recovery_cf'].mean()) + (1 - p_treated) * (X[X['treatment'] == 0]['recovery_cf'].mean() - X[X['treatment'] == 0]['recovery'].mean())
print('ATE = {}'.format(ATE))

ATE = 0.24882001915047072


In [17]:
ATT = X[X['treatment'] == 1]['recovery'].mean() - X[X['treatment'] == 1]['recovery_cf'].mean()
print('ATT = {}'.format(ATT))

ATT = -0.0006480233288398418


In [18]:
ATC = X[X['treatment'] == 0]['recovery_cf'].mean() - X[X['treatment'] == 0]['recovery'].mean()
print('ATC = {}'.format(ATC))

ATC = 0.4982880616297813


(a)(b) $\delta_{Naive} = ATC$ but $\ne ATE$ and $\ne ATT$.  
Assumption A1 is true but A2 is false, which was to be expected because $Y^1$ is independent of treatment but $Y^0$ is dependent.

(c) $\delta_{Naive}$ is biased for the $ATE$ and $ATT$ but unbiased for the $ATC$.

Baseline Bias $= \mathbb{E}[Y^0 | D = 1] - \mathbb{E}[Y^0 | D = 0]$  
Since Assumption A2 is false we must have some amount of baseline bias

In [19]:
BB = X[X['treatment'] == 1]['recovery_cf'].mean() - X[X['treatment'] == 0]['recovery'].mean()
print('Baseline Bias = {}'.format(BB))

Baseline Bias = 0.5001760113042282


Differential Treatment Bias $= (1 - \pi)\{\mathbb{E}[\delta | D = 1] - \mathbb{E}[\delta | D = 0]\}$

In [20]:
DTB = (1 - p_treated) * (ATT - ATC)
print('Differential Treatment Bias = {}'.format(DTB))

Differential Treatment Bias = -0.24946804247931056


$\delta_{Naive} = $ ATE + Baseline Bias + Differential Treatment Bias

In [21]:
abs(delta - ATE - BB - DTB) < 0.01

True

There is Baseline Bias and Differential Treatment Bias