# Week 2: Potential Outcomes

The concept of potential outcomes are foundational to the reasoning system that we're going to use for the course. They allow us to think about, with considerably clarity, the comparisons that we want to make, and **exactly** why these comparisons are *causal*. 

But, despite the somewhat obtuse language that we might use when we're talking about potential outcomes (e.g. "The potential outcome to treatment, when a unit is assigned to treatment.") the math surrounding potential outcomes is actually pretty straightforward. 

# Create Data 
Let's make some toy data that we can use through this demo. 

Suppose that we have 1,000 individuals that we can observe in our study. Furthermore, suppose that for each of these people, there is some *latent*, or unmeasured, response to treatment that they *would* have to treatment if we assigned them to take the treatemnt. 

You might think of this as an unknowable population-level parameter that we're trying to estimate from empirical data. In this worksheet, we're going to make the data so that the causal parameter $\tau$ has a mean value of 2. But, you could imagine that some other treatment has a mean value of 100. Or, for some third treatment it could have a mean value of zero, in which case we would say there there is *no* treatment effect. 

For concreteness, suppose that the treatment is assigning people to drink [coffee](https://www.aldeacoffee.com).

In [1]:
import pandas as pd 
import numpy as np 

In [2]:
NROWS = 1000

d = pd.DataFrame({
    'id'  : np.arange(0,NROWS), 
    'tau' : np.random.normal(loc=2, scale=2, size=NROWS) } 
)

Let's build more into our data, like, that some of the people are tall, while others are short; and some are young, while others are old. 


In [7]:
d['height'] = np.random.choice(a=['tall', 'short'], replace=True, size=NROWS)
d['age']    = np.random.choice(a=['young', 'old'] , replace=True, size=NROWS)

In [23]:
d.head()

Unnamed: 0,id,tau,height,age
0,0,2.789977,short,young
1,1,3.40478,short,young
2,2,0.325702,short,young
3,3,2.452018,tall,old
4,4,-1.759545,tall,young


In [8]:
d.head()

Unnamed: 0,id,tau,height,age
0,0,2.789977,short,young
1,1,3.40478,short,young
2,2,0.325702,short,young
3,3,2.452018,tall,old
4,4,-1.759545,tall,young


In just exactly the same way, we can also think of our units having levels of *potential outcomes to control*. That is, we can suppose that people have some level of the outcome in the case that they do not drink any coffee. For concreteness, suppose that the outcome is people's number of minutes of 241 coding they can accomplish, before they fall asleep. 

There might (or might not) be a relationship between our non-experimentally assigned outcomes. In the example we're working with here, suppose that there is no relationship between height and minutes of coding; but that there is a positive relationship between age and minutes of coding. 

This might be represented in our data in the following way: The mean number of minutes that someone can work is 10, plus, if they're old (ahem... seasoned, disciplined) they are able to work for an additional 5 minutes on avergae. But, also assume that there is some noise in this relationship. 

## Potential Outcomes to Control 

In [26]:
d['y0'] = 10 + (d['age'] == 'old') * 2 \
  + np.random.normal(size=NROWS, loc=0, scale=1)

In [28]:
d.head()

Unnamed: 0,id,tau,height,age,y0
0,0,2.789977,short,young,9.324339
1,1,3.40478,short,young,9.783787
2,2,0.325702,short,young,11.425327
3,3,2.452018,tall,old,13.536563
4,4,-1.759545,tall,young,8.098727


Notice that there is no relationship between height and potential outcomes to control; and also notice that the "noise" in the relationship is represented in the draw from the normal distribution with mean 0. 

## Potential Outcomes to Treatment 
If we know people's potential outcomes to control, and we already know each persons *causal effect*, then I suppose we also know their potential outcomes to **treatment**, right? 

In [29]:
d['y1'] = d['y0'] + d['tau']

And so, we can represent this **science table** -- the set of all potentially realizable outcomes. 

In [30]:
d.head()

Unnamed: 0,id,tau,height,age,y0,y1
0,0,2.789977,short,young,9.324339,12.114316
1,1,3.40478,short,young,9.783787,13.188567
2,2,0.325702,short,young,11.425327,11.751029
3,3,2.452018,tall,old,13.536563,15.988582
4,4,-1.759545,tall,young,8.098727,6.339182


## Questions for Understanding 
1. On average, will people who are older tend to be taller, shorter, or about the same height, and people who are younger? 
2. On average, will people who are older tend to have higher or lower potential outcomes to control? 
3. On average, will people who are older tend to have higher or lower potential outcomes to treatment? 
4. **Most importantly**: If these are *potential outcomes* then, can we empirically observe any of these outcomes? 


In [37]:
(d.loc[d['age'] == 'old',   'height'] == 'tall')

3     True
5    False
6     True
8     True
9    False
Name: height, dtype: bool

In [36]:
np.mean(d.loc[d['age'] == 'old',   'height'] == 'tall') \ 
  - np.mean(d.loc[d['age'] == 'young', 'height'] == 'tall')

0.012000000000000011

In [43]:
d[['age', 'height']].groupby('age').count()

Unnamed: 0_level_0,height
age,Unnamed: 1_level_1
old,500
young,500


In [50]:
d[['age', 'y0']].groupby('age').mean()

Unnamed: 0_level_0,y0
age,Unnamed: 1_level_1
old,12.0026
young,10.043181


# Run Your Experiment 

To this point, we've been working in the *potential outcomes space*. One way that you might think about this is as though these are the measurement that our population of people are going to walk into the expeiment having -- but we haven't measured them yet, so we don't know what they are! That is, suppose we have the whole cohort of students who are enrolled in 241, and they're going to start the coffee drinking experiment in Week 3. In Week 2, they all are either young/old, short/tall, and have innate abilities to focus, *we as the experimenters just don't know them yet!*. 

When we run the experiment, we accomplish several things: 

1. We measure outcomes from our subjects;
2. We intervene in their lives to force a particular experience; 
3. As a result of our intervention, we *reveal* either potential outcomes to treatment or control for each subject, and we measure this. 

The first of these is easy: For every subject, no matter whether they are in treatment or control, we measure the trait that we care about. But, the second two require consierably more care, and are the focus of the class. 

In this experiment, the way that we are interviening in people's lives is to either give them *coffee* or *decaf coffee*. 


In [81]:
d['treat'] = np.random.choice(a=[0,1], replace=True, size=NROWS)

Then, and only then, we're also able to make some of their potential outcomes measurable. 

In [82]:
d['Y'] = d['y0'] * (d['treat'] == 0) \
  + d['y1'] * (d['treat'] == 1)

What do we have then? 

In [53]:
d.head()

Unnamed: 0,id,tau,height,age,y0,y1,treat,Y
0,0,2.789977,short,young,9.324339,12.114316,0,9.324339
1,1,3.40478,short,young,9.783787,13.188567,0,9.783787
2,2,0.325702,short,young,11.425327,11.751029,0,11.425327
3,3,2.452018,tall,old,13.536563,15.988582,0,13.536563
4,4,-1.759545,tall,young,8.098727,6.339182,1,6.339182


0.997

## Why only some of? 

Why are we only able to measure some of people's potential outcomes? As David has identified in the async videos, we're only able to measure the potential outcomes that are consistent with the treatment that we actually give people. 

And, so, while the *science table* might contain information about each persons potential outcomes to treamtent and control, the table of data that we're ever going to be able to generate has a more restricted set. Call this observable data set `obs`, and it is a subset of all the data that might be out there.

In [83]:
obs = d[['id', 'height', 'age', 'treat', 'Y']]
obs.head()

Unnamed: 0,id,height,age,treat,Y
0,0,short,young,0,9.324339
1,1,short,young,1,13.188567
2,2,short,young,1,11.751029
3,3,tall,old,0,13.536563
4,4,tall,young,0,8.098727


Where did all the data go?? We no longer have access to people's causal effect ($\tau$), nor do we have access to their potential outcomes to control and treatment. Instead, we're left only with their *realized potential outcomes* that match the condition they were assigned to. 

## Question of Understanding 
1. What happens to these values that are no longer in our dataset? Because we can't measure them, does that mean that they don't exist? Does it mean that they *never* existed? 
2. If we can't see any person's $\tau$, then how should we generate an estimate of this population-level parameter? 



# Estimating Causal Quantities from All of the Data 

Suppose for a moment that we *could* observe parts of the science table. In particular, suppose that we had access to everybody's potential outcomes to treatment and control, but not their treatment effect. Could we make an estimate about the average treatment effect then? Sure! 


In [84]:
calculated_effects = d['y1'] - d['y0']
calculated_effects.head()

0    2.789977
1    3.404780
2    0.325702
3    2.452018
4   -1.759545
dtype: float64

And so, what is the average of these?

In [64]:
calculated_effects.mean()

2.0864820681426584

That's kind of trivial, since we just built the `calculated_tau` back from the difference between the `y_1` and `y_0` measurements; but, it works. 

## Question for understanding
1. Is the average height of people who are in treatment more or less than the average height for people who are in control? 
2. Is the average potential outcome to control, `y_0`, of people who are in treatment more or less than the average potential outcome to control for people who are in control? 
3. Is the average potential outcome to treatment, `y_1`, of people who are in treatment more or less than the average potential outcome to treatment for people who are in control?
4. Is the average potential outcome to treatment, `y_1` of *people who are in treamtent* more or less than the average potential outcome to control, `y_0` of *people who are in treatment*.



In [65]:
d[['height', 'treat', 'id']].groupby(['height', 'treat']).count().reset_index()

Unnamed: 0,height,treat,id
0,short,0,239
1,short,1,261
2,tall,0,243
3,tall,1,257


In [66]:
d.groupby('treat').mean()

Unnamed: 0_level_0,id,tau,y0,y1,Y
treat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,495.543568,2.12151,10.967492,13.089002,10.967492
1,503.181467,2.053889,11.074439,13.128328,13.128328


In [None]:
d[d['treat'] == 1].mean()

# Estimating Causal Quantities from the Observable Data 

Of course, we don't get to see each of these potential outcomes instead, we only get to see either one or the other. Which one do we get to see? Well the potential outcome for the treatment condition thatwe assign the person to!

For someone in control, for example, we can see only their potential outcome to control. 

In [68]:
obs.loc[obs['treat']==1, 'Y'].mean()

13.128327815138526

Is this the same as off the *science table* it sure should be! 

In [69]:
obs.loc[obs['treat']==0, 'Y'].mean() == d.loc[d['treat'] == 0, 'y0'].mean()

True

The exact same logic applies to the potential outcomes to treamtent as well: 

In [75]:
obs.loc[obs.treat==1, 'Y'].mean() == d.loc[d.treat==1, 'y1']

False

# The Big Punchline! 

Because the *observable* set of realised potential outcomes are unbiased estimates of the *unobservable* potential outcomes, we're able to generate an unbiased estimate of the causal effect, using only causal data! 

In general the framework looks like this: 

1. Use observable data to estiamte unobservable potential outcomes. 
2. Use unobservable potential outcomes to estiamte causal effects. 


In [85]:
obs.groupby('treat').Y.mean()

treat
0    11.048545
1    13.094732
Name: Y, dtype: float64

In [86]:
obs.groupby('treat').Y.mean().diff()

treat
0         NaN
1    2.046187
Name: Y, dtype: float64

In [87]:
d.tau.mean()

2.0864820681426584