## 19. Causal Inference

In this chapter we discuss causation.  Roughly speaking "$X$ causes $Y$" means that changing the value of $X$ will change the distribution of $Y$.  When $X$ causes $Y$, $X$ and $Y$ will be associated but the reverse is not, in general, true.

We will consider two frameworks for discussing causation.  The first uses notation of **counterfactual** random variables.  The second, used in the next chapter, uses **directed acyclic graphs**.

### 19.1 The Counterfactual Model

Suppose that $X$ is a binary treatment variable where $X = 1$ means "treated" and $X = 0$ means "not treated".

Let $Y$ be some outcome variable such as the presence or absence of disease.  To distinguish the statement "$X$ is associated with $Y$" from the statement "$X$ causes $Y$" we need to enrich our probabilistic vocabulary.  We will decompose the response $Y$ into a more fine-grained object.

We introduce two new random variables $(C_0, C_1)$ called **potential outcomes** with the following interpretation: $C_0$ is the outcome if the subject is not treated ($X = 0$) and $C_1$ is the outcome if the subject is treated ($X = 1$).  Then,

$$ Y = \begin{cases}
C_0 & \text{if } X = 0 \\
C_1 & \text{if } X = 1
\end{cases}$$

We can express the relationship between $Y$ and $(C_0, C_1)$ more succintly by

$$ Y = C_X $$

This equation is called the **consistency relationship**.

Here is a toy dataset to make the relationship clear:

$$
\begin{array}{cccc}
X & Y & C_0 & C_1 \\
\hline
0 & 4 & 4 & * \\
0 & 7 & 7 & * \\
0 & 2 & 2 & * \\
0 & 8 & 8 & * \\
\hline
1 & 3 & * & 3 \\
1 & 5 & * & 5 \\
1 & 8 & * & 8 \\
1 & 9 & * & 9
\end{array}
$$

The asterisks denote unobserved values.  When $X = 0$ we don't observe $C_1$ in which case we say that $C_1$ is a **counterfactual** since it is the outcome you would have had if, counter to the fact, you had been treated ($X = 1$).  Similarly, when $X = 1$ we don't observe $C_0$ and we say that $C_0$ is counterfactual.

Notice that there are four types of subjects:

$$
\begin{array}{lcc}
\text{Type} & C_0 & C_1 \\
\hline
\text{Survivors}       & 1 & 1 \\
\text{Responders}      & 0 & 1 \\
\text{Anti-responders} & 1 & 0 \\
\text{Doomed}          & 0 & 0
\end{array}
$$

Think of all of the potential outcomes $(C_0, C_1)$ as hidden variables that contain all the relevant information about the subject.

Define the **average causal effect** or **average treatment effect** to be

$$ \theta = \mathbb{E}(C_1) - \mathbb{E}(C_0) $$

The parameter $\theta$ has the following interpretation:  $\theta$ is the mean if everyone were treated ($X = 1$) minus the mean if everyone were not treated ($X = 0$).  There are other ways of measuring the causal effect.  For example, if $C_0$ and $C_1$ are binary, we define the **causal odds ratio**

$$ \frac{\mathbb{P}(C_1 = 1)}{\mathbb{P}(C_1 = 0)} \div \frac{\mathbb{P}(C_0 = 1)}{\mathbb{P}(C_0 = 0)}$$

and the **causal relative risk**

$$ \frac{\mathbb{P}(C_1 = 1)}{\mathbb{P}(C_0 = 1)} $$

The main ideas will be the same whatever causal effect we use.  For simplicity, we shall work with the average causal effect $\theta$.

Define the **association** to be

$$ \alpha = \mathbb{E}(Y | X = 1) - \mathbb{E}(Y | X = 0)$$

Again we could use the odds ratio or other summaries if we wish.

**Theorem 19.1 (Association is not equal to Causation)**.  In general, $\theta \neq \alpha$.

**Theorem 19.3**.  Suppose we randomly assign subjects to treatment and that $\mathbb{P}(X = 0) > 0$ and $\mathbb{P}(X = 1) > 0$.  Then $\alpha = \theta$.  Hence, any consistent estimator of $\alpha$ is a consistent estimator of $\theta$.  In particular, a consistent estimator is

$$ \hat{\theta} = \hat{\mathbb{E}}(Y | X = 1) - \hat{\mathbb{E}}(Y | X = 0) = \overline{Y}_1 - \overline{Y}_0 $$

is a consistent estimator of $\theta$, where

$$
\begin{array}{ll}
\hat{Y}_1 = \frac{1}{n_1} \sum_{i: X_i = 1} Y_i
&
\hat{Y}_0 = \frac{1}{n_0} \sum_{i: X_i = 0} Y_i \\
n_1 = \sum_{i=1}^n X_i
&
n_0 = \sum_{i=1}^n (1 - X_i)
\end{array}
$$

**Proof**.  Since $X$ is randomly assigned, $X$ is independent of $(C_0, C_1)$.  Hence,

$$
\begin{align}
\theta &= \mathbb{E}(C_1) - \mathbb{E}(C_0) \\
&= \mathbb{E}(C_1 | X = 1) - \mathbb{E}(C_0 | X = 0) \\
&= \mathbb{E}(Y | X = 1) - \mathbb{E}(Y | X = 0) \\
&= \alpha
\end{align}
$$

The consistency follows from the law of large numbers.

If $Z$ is a covariate, we define the **conditional causal effect** by

$$ \theta_z = \mathbb{E}(C_1 | Z = z) - \mathbb{E}(C_0 | Z = z) $$

In a randomized experiment, $\theta_z = \mathbb{E}(Y | X = 1, Z = z) - \mathbb{E}(Y | X = 0, Z = z)$ and we can estimate the conditional causal effect using appropriate sample averages.

**Summary**

- Random variables: $(C_0, C_1, X, Y)$
- Consistency relationship: $Y = C_X$
- Causal Effect: $\theta = \mathbb{E}(C_1) - \mathbb{E}(C_0)$
- Association: $\alpha = \mathbb{E}(Y | X = 1) - \mathbb{E}(Y | X = 0)$
- Random Assignment:  $(C_0, C_1) \text{ ⫫ } X \Longrightarrow \theta = \alpha$