# Learning Goals
- What is a DAG?
- How do DAGs relate to SEMs?
- What does the do operator do?
- How can we use that to define causal effects?
- What causal structure is added by this approach?
- How can we use DAGs to reason about which variables we should adjust for in Regression or matching to estimate the total causal effect?

In [None]:
 ### Structural Equation Models

- Define Structural Equation Models
- What is the additional structure imposed by causality

#### Definition SEM


- A structural equation model is a set of equations and a joint distribution of noise terms
\begin{equation*}
S:= (S, \mathbb{p}^N)
\end{equation*}
- An equation defines a variable in terms of its parents and noise terms (N_j):
$S_j: X_j := F_j(Par_j, N_j) \qquad j = 1,...,p$
- Parents of $X_j$ can be all other variables
$Par_j \subset \{X_j, ..., X_p\} \setminus \{X_j\}$
- Noise terms are independent
- To get a causal graph we go through the model equation by equation and draw directed edges from each RHS variable into the corresponding left-hand side variable
- We write $:=$ instead of $=$ to stress that a SEM is more than a system of equations. What is to the left and the right of the equal sign matters when we talk about interventions.

#### Example 1 
Consider the following SEM 
\begin{aligned}
&S_1: X := N_x\\
&S_2: Y := -6X + N_y\\
&N_x, N_y \sim^{i.i.d} N(0,1)  
\end{aligned}

- What is the covariance between $X$ and $Y$?
Plug in error terms and calculate
\begin{equation*}
cov(XY) := E[N_x(-6N_x+N_y)] - 0 \cdot 0
\end{equation*}
-What is the joint distribution of $X$ and $Y$?
\begin{aligned}
(X, Y) \sim N \left(\left(
\begin{matrix}
  0 \\
  0
  \end{matrix}
\right), \left(\begin{matrix}
   1& -6 \\
  -6& 37
  \end{matrix} \right) \right)
\end{aligned}
- What is the corresponding DAG?

In [8]:
#### Example 2

\begin{aligned}
S_1: X_1 &:= f_1(X_3, N_1)\\
S_2: X_2 &:= f_2(X_1, N_2)\\
S_3: X_3 &:= f_3(N_3) \\
S_4: X_4 &:= f_4(X_2, X_3, N_4) \\
\end{aligned}
The $N_i$ are jointly independent

- Draw the corresponding Graph and verify that it doesn't have cycles
- How can I simulate the model?
    - Draw noise terms
    - Start at the source and plug in until you reach the end
- Why Can I simulate the model?
    - Because the graph does not have cycles
- Why does the graph imply a unique joint distribution?
    - Using the same strategy as for simulation I can express every variable as a function of the noise terms
    example: $X_4 = f_4(f_2(f_1(f_3(N_3), N_1), N_2), f_3(N_3), N_4)$
    - Why does this uniquely pin down the joint distribution Philip?

### Do operator and Intervention distributions
- The SEM $S$ implies a joint distribution of variables $\mathbb{P^X_{S}}$
- The intervention distribution $\mathbb{P}^{X|do(X_j = f^{'}_j(Par_j, N_j)}_{S}$ is the joint distribution implied by a SEM where we replace $f_j$ by  $f^{'}_j$
- The probability of an event $A$ under the intervention distribution is often denoted by $P(A|do(X_j = f^{'}_j(Par_j, N_j))$
- The expected value of $X_i$ under the intervention distribution is often denoted as $E(X_i|do(X_j = f^{'}_j(Par_j, N_j))$
- Common special cases is $do(X_j = a)$, where a is a constant 

#### Example 1 Revisited
- We apply $do(X = 3)$ to the structural model from Example 1?
- The modified model is
\begin{aligned}
&S_1: X := 3\\
&S_2: Y := -6X + N_y\\
&N_x, N_y \sim^{i.i.d} N(0,1)  
\end{aligned}
- What is the marginal distribution of Y?
\begin{aligned}
    \mathbb{P}^Y_S = N(0, 17) \neq \mathbb{P}^{Y|do(X = 3)}_S = N(-18, 1) = \mathbb{P}^{Y|X = 3}_S
\end{aligned}
- What is the marginal distribution of X if we intervene on Y ( \mathbb{P}^{X|do(Y = 3)})?
- The modified model is
\begin{aligned}
&S_1: X := N_x\\
&S_2: Y := 3 \\
&N_x, N_y \sim^{i.i.d} N(0,1)  
\end{aligned}
- Thus the marginal distribution of X is the distribution of $N_x$
\begin{aligned}
   \mathbb{P}^{X|do(Y = 3)}) = N(0,1)
\end{aligned}
- What is the distribution of $X$ conditional on $Y=3$?
Recall the SEM from Exercise 1
\begin{aligned}
 &S_1: X := N_x \\
 &S_2: Y := -6N_x + N_y \\
 &N_x, N_y \sim^{i.i.d} N(0,1)  
\end{aligned}
Solve $S_2$ for $N_x$ and plug into $S_1$ to get
\begin{aligned}
X = N_x = \frac{N_y-Y}{6}  \Rightarrow  \mathbb{P}^{X|Y = 3} = N\left(-\frac{1}{2}, \frac{1}{36} \right) \neq \mathbb{P}^{X|do(Y = 3)} = N(0,1)
\end{aligned}
- Is this correct Philipp?


#### Causal Effects and the Difference Between Conditioning and Intervening
- Draw Graph again
- Interventions on Y do not affect X, however Interventions on X do affect Y. That is the case because the directed edge goes from $X$ to $Y$.
- We use this observation to define a causal effect.  Given SEM $S$ there is a (total) causal effect from $X$ to $Y$ iff $X \not\!\perp\!\!\!\perp Y$ in $\mathbb{P}^{X|do(X = N^{'}_x)}_S$ for some variable $N^{'}_x$
- Philipp: What if the variable is a number?
- If we want to quatify the causal effect we can compare how probabilities for a specific event change or look at the Average causal effect on $Y$ if we change $X$ from $x^{''}$ to $x^{'}$: $E[Y|X=do(x^{'})]-E[Y|X=do(x^{''})]$
- The do operator only influences the distribution of children of the variable we intervene on. Conditioning also influences parents.



### The Correct Model
A SEM is correct if the implied distribution is correct and the intervention distributions correspond to distributions obtained from actual interventions.

#### Adjustment Sets (Theorem)
Simplified Notation: $X$ is a Treatment and $Y$ is an outcome. We want to identify $𝑝𝑆,𝑑𝑜(𝑋=𝑥)(𝑦)$ by adjusting for variables in Z.
-Backdoor Adjustment:
    - Z blocks alls paths from $X$ to $Y$ entering $X$ through the backdoor
    - Z contains no descendant of X
-Parent Adjustment: Z contains all parents of $X$ (completely model the selection process)
-Towards Necesity: Any Z with 
    -$Z$ contains no descendant of any node on a directed Path from $X$ to $Y$ (except fo descendants of $X$ that are on a directed path from $X$ to $Y$) AND 
    -Z blocks all non-directed paths from X to Y
    
### Example:
- Backdoor: $\{A\}, \{B\}$
- Parent: $\{C, A\}$
- All: $\{A\}, \{B\}, \{A, B\}, \{A, F\}, \{B, F\}, \{A, C\},
\{B, C\}, \{A, B, C\}, \{A, B, F\}$

### Estimation
If $Z$ is a valid adjustment set $(Y_a, Y_b) \perp X | Z$, therefore we can estimate the average causal effect by controlling for $Z$ in matching or, under linearity, regression.

