In [1]:
%matplotlib inline

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

from auxiliary import *

np.random.seed(123)

# Regression estimators of causal effects

**Overivew**

* Regression as a descriptive tool

* Regression adjustment as a strategy to to estimate causal effects
    * Regression models and omitted-variable bias
    * Potential outcomes and omitted-variable bias
    * Regression as adjustment for otherwise omitted variables

* Regression as conditional-variance-weighted matching

* Regression as an implementation of a perfect stratification

* Regression as supplemental adjustment when matching

* Extensions and other perspectives
    * Regression estimators for many-valued causes
    * The challenge of regression specification

* Conclusion
    

We start with different ways of using regression

* descriptive tools
    * Anscombe quartet
    
* estimating causal effects

* Freedman's paradox

## Regression as a descriptive tool

Goldberger (1991) motivates least squares regression as a technique to estimate a best-fitting linear approximation to a conditional expectation function that may be nonlinear in the population.

<img src="material/regression_demonstration_one.png" height=300 width=300 />

How does the functional form of the conditiional expectation look like?

What does the difference between the two lines tell us about treatment effect heterogeneity?

We will fit four different prediction models using ordinary least squares.

\begin{align*}
&\hat{Y} = \beta_0 + \beta_1 D + \beta_2 S \\
&\hat{Y} = \beta_0 + \beta_1 D + \beta_2 S \\
&\hat{Y} = \beta_0 + \beta_1 D + \beta_2 S_1 + \beta_3 S_2 \\
&\hat{Y} = \beta_0 + \beta_1 D + \beta_2 S_1 + \beta_3 S_2 + \beta_4 S_1 * D + \beta_5 S_2 * D  
\end{align*}


### Anscombe quartet


So what does the data behind these regressions look like?

## Regression adjustment as a strategy to estimate causal effects

### Regression models and omitted-variable bias

\begin{align*}
Y = \alpha + \delta D + \epsilon
\end{align*}

* $\delta$ is interpreted as an invariant, structural causal effect that applies to all members of the population.

* $\epsilon$ is a summary random variable that represents all other causes of $Y$.

\begin{align*}
\hat{\delta}_{OLS, \text{bivariate}} = \frac{Cov_N(y_i, d_i)}{Var_N(d_i)}
\end{align*}

It now depends on the correlation between $\epsilon$ and $D$ whether $\hat{\delta}$ provides an unbiased and consistent estimate of the true causal effect

<img src="material/omitted-variable-bias.png" height=300 width=300 />

We now move to the potential outcomes model to clarify the connection between **omitted-variable bias** and **self-selection bias**. 

### Potential outcomes and omitted-variable bias

\begin{align*}
Y = \underbrace{\mu^0}_{\alpha} + \underbrace{(\mu^1 - \mu^0)}_{\delta} D + \underbrace{\{\nu^0 + D(\nu^1 - \nu^0 )\}}_{\epsilon}, 
\end{align*}

where $\mu^0\equiv E[Y^0]$, $\mu^1\equiv E[Y^1]$, $\nu^0\equiv Y^0 - E[Y^0]$, and $\nu^1\equiv Y^1 - E[Y^1]$.

What induces a correlation between $D$ an $\{\nu^0 + D(\nu^1 - \nu^0 )\}$?

* **baseline bias**, there is a net baseline difference in the hypotheticsl no-treatment state that is correlated with treatment uptake
$\rightarrow$ $D$ is correlated with $\nu_0$

* **differential treatment bias**, there is a net treatment effect difference that is correlated with treatment updatake
$\rightarrow$ $D$ is correlated with $D(\nu^1 - \nu^0 )$

<img src="material/regression_demonstration_two.png" height=300 width=300 />

### Regression as adjustment for otherwise omitted variables

<img src="material/observable-regression-adjustment.png" height=300 width=300 />

<img src="material/regression_demonstration_three.png" height=300 width=300/>

We will now look at two datasets that are observationally equivalent but regression adjustment for observable $X$ does only work in one of them.

<img src="material/regression_demonstration_four.png" height=300 width=300 />

Now we condition on $X$ to see where conditioning might help in obtaining an unbiased estimate of the true effect.

<img src="material/regression_demonstration_five.png" height=300 width=300 />

To summarize: Regression adjustment by $X$ will yield a consistent and unbiased estimate of the ATE when:

* $D$ is mean independent of (and therefore uncorrelated with) $v^0 + D(v^1 - v^0)$ for each subset of respondent identified by distinct values on teh variables in $X$

* the causal effect of $D$ does not vary with $X$ 

* a fully flexible parameterization of $X$ is used

## Freedman's paradox

> In statistical analysis, Freedman's paradox, named after David Freedman, is a problem in model selection whereby predictor variables with no relationship to the dependent variable can pass tests of significance – both individually via a t-test, and jointly via an F-test for the significance of the regression. (Wikipedia)

We fill a dataframe with random numbers. Thus there is no causal relationshop between the dependent and independent  variables.

Now we run a simple regression of the random independent variables on the dependent variable.

We use this to inform a second regression where we only keep the variables that were significant at the 25% level.

What to make of this exercise?