In [1]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

import numpy as np
import pandas as pd
import scipy.stats as stats
import seaborn as sns

# Causal graphs


Graph notation less general than potential outcome framework, but 

* thinking about causal systems
* uncover identification strategies

> It is useful to separate the inferential problem into statistical and identification components. Studies of identification seek to characterize the conclusions sthat could be drawsn if one could use the sampling process to obtain an unlimited number of observations. (Manski, 1995)

The two most crucial ingredients for an identification analysis are:

* The set of assumptions about causal relationships that the analysis is willing to assert based on theory and past research, including assumptions about relationships between variables that have not been observed but that are related both to the cause and outcome of interest.

* The pattern of informatin one can assume would be contained in the joint distribution of the variables in the observed dataset if all memebers of the population had been included in the sample that generated the dataset.

$\rightarrow$ causal graphs offer an effective and efficient representation for both

## Basic elements of causal graphs

* nodes
* edges
* paths
    * parent and child
    * decendent

<img src="material/graph_with_cycle.png" height="200" width=200 />

Two representations of the joint depdendence of $A$ and $B$ on an unobserved common cause.

<img src="material/graph_shorthand_unobserved_common_cause.png" height="500" width=500 />

Let's look at some basic patterns that will turn out to appear frequently.

* chain of mediation
* fork of mutual causation

$\rightarrow$ unconditional association

* fork of mutual dependence, **collider variable**

$\rightarrow$ no unconditional association, but conditionnal on **collider variable**


<img src="material/basic_causal_relationships.png" height="200" width=200 />

### Conditioning and confounding

<img src="material/confounding_variable.png" height="500" width=500 />

* $C$ is a **confounding variable** that affects both the dependent and independent variable.

* Conditioning is a modelig strategy that allows to determine causal effects in the presence of observed confounders.

$\rightarrow$ What happens if $C$ is unobserved?

How about an example from educational choice where we have observed and unobserved confounders?

<img src="material/fig-confounders-education.png" height=500 width=500 />

## Graphs and structural equations

Let's look at another example and assume we are interested in the effect of parental background (P), charter schools (D), and neighborhoods (N) on test scores (Y).

We could set up the following **linear** regression equations:

\begin{align*}
D & = \alpha_D + b_P P + \epsilon_D \\
Y & = \alpha_Y + b_D D + b_P P + + b_N N + \epsilon_Y
\end{align*}

<img src="material/fig-equivalent-representations-standard.png" height=500 width=500 />

<img src="material/fig-equivalent-representations-magnified.png" height=500 width=500 />

We can set up the same *nonparametric* structural equations for both representations:

\begin{align*}
P & = f_P(\epsilon_1)    \\
N & = f_N(\epsilon_3) \\
D & = f_D(P, \epsilon_2) \\
Y & = f_Y(P, D, N, \epsilon_4)
\end{align*}

How to simulate a sample from a set of structural equations?

In [19]:
# parametrization of linear equations
alpha_D = 1
alpha_Y = 1

beta_P = 0.8
beta_N = 0.7
beta_D = -0.3

# distributional assumptions
get_unobservable = np.random.normal
get_observable = np.random.uniform

num_agents = 10000
data = np.tile(np.nan, (num_agents, 4))
for i in range(num_agents):
    P = get_observable()
    N = get_observable()
    D = alpha_D + beta_P * P + get_unobservable()
    Y = alpha_Y + beta_D * D + beta_P * P + beta_N * N + get_unobservable()
    data[i, :] = [Y, D, P, N]

df = pd.DataFrame(data, columns=['Y', 'D', 'P', 'N'])
df.head()

Unnamed: 0,Y,D,P,N
0,0.990626,3.186022,0.762263,0.354792
1,0.552768,0.012214,0.040949,0.494129
2,0.670591,1.09638,0.413915,0.419244
3,3.022789,1.122093,0.908745,0.865778
4,0.708393,3.449747,0.897842,0.200958


Now let"s see if we can uncover the structural parameters by a simple ordinary-least-squares regression and thus go full circle from a parametric structural equation model to a causal graph.

In [24]:
import statsmodels.api as sm
from patsy import dmatrices

y, x = dmatrices('Y ~ D + P + N', data = df)
model_spec = sm.OLS(y, x)
model_spec.fit().summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,0.134
Model:,OLS,Adj. R-squared:,0.134
Method:,Least Squares,F-statistic:,516.9
Date:,"Sun, 14 Apr 2019",Prob (F-statistic):,2.66e-312
Time:,17:24:30,Log-Likelihood:,-14148.0
No. Observations:,10000,AIC:,28300.0
Df Residuals:,9996,BIC:,28330.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.0257,0.028,36.506,0.000,0.971,1.081
D,-0.3066,0.010,-30.571,0.000,-0.326,-0.287
P,0.7658,0.036,21.462,0.000,0.696,0.836
N,0.7098,0.035,20.540,0.000,0.642,0.778

0,1,2,3
Omnibus:,1.752,Durbin-Watson:,2.024
Prob(Omnibus):,0.417,Jarque-Bera (JB):,1.773
Skew:,0.031,Prob(JB):,0.412
Kurtosis:,2.982,Cond. No.,8.75
