# Potential Outcomes Chapter

## Glossary
$Y^1$ - Potential Outcome where treatment occurred  
$Y^0$ - Potential outcome where treatment did not occur  
$Y_i$ - Actual outcome, which is distinct from potential outcome   
$D_i$ - Equal 1 if treament occured, equals zero if it did not  
$\delta_i$ = Casual effect, equal to $Y_i^1 - Y_i^0$  
$ATE$ = Average treatment effect $E[\delta_i]$  
 



## Quotes
* One could argue the entire enterprise of of causal inference is about developing a reasonable strategy for negating the role that the selection bias is playing in estimating causal effects

# Recreating 4.1.3 

In [14]:
import pandas as pd
from io import StringIO

In [15]:
tables = """
Patients	y1	y0	delta
1 	7 	1 	6
2 	5 	6 	-1
3 	5 	1 	4
4 	7 	8 	-1
5 	4 	2 	2
6 	10 	1 	9
7 	1 	10 	 -9
8 	5 	6 	-1
9 	3 	7 	-4
10 	9 	8 	1
"""

In [16]:
table = pd.read_csv(StringIO(tables), delimiter="\t")
table

Unnamed: 0,Patients,y1,y0,delta
0,1,7,1,6
1,2,5,6,-1
2,3,5,1,4
3,4,7,8,-1
4,5,4,2,2
5,6,10,1,9
6,7,1,10,-9
7,8,5,6,-1
8,9,3,7,-4
9,10,9,8,1


In [17]:
table.columns

Index(['Patients', 'y1', 'y0', 'delta'], dtype='object')

## 

In [18]:
expectation_y1 = table["y1"].mean()
expectation_y1

5.6

In [19]:
expectation_y0 = table["y0"].mean()
expectation_y0

5.0

In [20]:
ate = expectation_y1 - expectation_y0
ate

0.5999999999999996

## Perfect selection of ideal outcomes
The assumption here is that the doctor here is able to pick the best outcome every time

Surgery is D=1, versus Chemo is D=0

In [21]:
post_treatment_table_str = """
Patients	Y	D
1 	7 	1
2 	6 	0
3 	5 	1
4 	8 	0
5 	4 	1
6 	10 	1
7 	10 	0
8 	6 	0
9 	7 	0
10 	9 	1
"""

In [22]:
post_treatment_table = pd.read_csv(StringIO(post_treatment_table_str), delimiter="\t")
post_treatment_table

Unnamed: 0,Patients,Y,D
0,1,7,1
1,2,6,0
2,3,5,1
3,4,8,0
4,5,4,1
5,6,10,1
6,7,10,0
7,8,6,0
8,9,7,0
9,10,9,1


Average treatment effect for treated group. Note in this case all patients are treated in the medical sense, but in the casual inference sense the surgery folks are the ones receiving the new treatment

### Average treatment effect for treated group
We compare the outcome that they would get with treatment, with the outcome they would get without treatment

In [23]:
treated_outcomes = table[post_treatment_table["D"] == 1].mean()
att = treated_outcomes["y1"] - treated_outcomes["y0"]
att

4.4

### Average treatment effect for untreated group

In [24]:
untreated_outcomes = table[post_treatment_table["D"] == 0].mean()
atu = untreated_outcomes["y1"] - untreated_outcomes["y0"]
atu

-3.2

###  Simple difference in means

Or simple difference in outcomes, I'm not sure why he switches it halfway through

In [25]:
sdo = treated_outcomes["y1"].mean() - untreated_outcomes["y0"].mean()
sdo

-0.40000000000000036

### Selection Bias

In [35]:
selection_bias = table[post_treatment_table["D"] == 1]["y0"].mean() - \
                  table[post_treatment_table["D"] == 0]["y0"].mean()

selection_bias

-4.800000000000001

### Heterogeneous treatment effect bias 

In [30]:
hteb = (1-.5)*(att - atu)
hteb

3.8000000000000003

#### Finalizing the algebra

In [36]:
sdo, ate + selection_bias + hteb

(-0.40000000000000036, -0.4000000000000008)

## Rough  Notes
So in this perfect universe we know that
* The people given surgery live 4.4 years longer
* The people given chemo live 3.2 years longer
  * Negative is good
  * Negative because because we are calculating the post surgery lifespan, but in our case its good because they got chemo meaning they lived 3.2 years longer

However if we directly compare the two groups it seems that chemo group gets more lives after surgery which while true is misleading
* "It’s biased because the individuals units were optimally sorting into their best treatment option, creating fundamental differences between treatment and control group that are a direct function of the potential outcomes themselves"

### Difference in outcomes is not always the same as ATE

From the difference in outcomes we need to decuple three things
* Average treatment effect (which is what we want)
* Selection Bias
* Heterogenous treatment effect bias
  * Accounts for the difference in the size of the groups
  * If we end up with a group where everybody is treated then the ATT

### Morgan and Winship Causal Inference Reference
They use different terms that make more sense to me

Page 59 of counterfactuals and causal inference calls second terms
* **Baseline bias**, instead of selection bias
* **Differential treatment effect bias**, instead of heterogeneous treatment effect bias

The example used si , we want to understand the effect of education on an individuals mental ability. college is the treatment
* ATE is the actual effect
* Individuals that attend college may already be smarter (baseline bias)
* Those who attend college may apt to get more mental ability from college, than those who didn't attend college even if they did