In [1]:
import numpy as np
import pandas as pd

# Generating data for Potential Outcomes vs. Pearlian Frameworks

In the potential outcomes framework, we talk in terms of the potential outcomes, $Y^0$ and $Y^1$. In the Pearlian framework, we can talk about these same variables, but usually just talk about how the distribution of $Y$ changes under an intervention. In the PO framework, we'll tend to want to explicitly define the potential outcomes. In the Pearlian framework, we'll tend to want to make the graph structure easily readable from the data-generating process.

Let's start by generating data for a randomized experiment, where $D$ is determined at random, and is a cause of $Y$.

In [2]:
N = 10000

d = np.random.binomial(1, p=0.5, size=N)
y0 = np.random.normal(0., size=N)
y1 = np.random.normal(1., size=N)

y = (d==0)*y0 + (d==1)*y1

df = pd.DataFrame({'D': d, 'Y': y})

In [3]:
df.corr()

Unnamed: 0,D,Y
D,1.0,0.443839
Y,0.443839,1.0


In [4]:
df.groupby('D').mean()

Unnamed: 0_level_0,Y
D,Unnamed: 1_level_1
0,-0.006768
1,0.986682


Questions:


1. Do the potential outcomes depend on $D$ here?

2. Does $Y$ depend on $D$?

3. Why don't we include the potential outcomes in df, which represents our measured data?

4. What is the treatment effect?

In [5]:
(y1 - y0).mean()

0.9911454595812232

### What's a slightly more Pearlian way to write the same process?

In [6]:
N = 10000

d = np.random.binomial(1, p=0.5, size=N)
y = np.random.normal(d)

df = pd.DataFrame({'D': d, 'Y': y})

In [7]:
df.corr()

Unnamed: 0,D,Y
D,1.0,0.45727
Y,0.45727,1.0


In [9]:
df.groupby('D').mean()

Unnamed: 0_level_0,Y
D,Unnamed: 1_level_1
0,-0.007907
1,1.020193


Questions:

1. Do the potential outcomes still exist?

2. What are the structural functions?

3. What does the causal graph for this process look like?

In [11]:
y0 = df['Y'][df['D'] == 0]
y1 = df['Y'][df['D'] == 1]

In [12]:
y0.mean()

-0.007906528866828934

In [13]:
y1.mean()

1.0201930213573114

In [14]:
df['Y'].mean()

0.5067601059753755

In [15]:
df.groupby('D').count()

Unnamed: 0_level_0,Y
D,Unnamed: 1_level_1
0,4994
1,5006
