## Session 4: Selection on observables 
____

We can mimic the experimental context with observational data if the
following assumptions hold

### 1. Assumptions

1. **Conditional independence assumption** (a.k.a unconfoundedness,
a.k.a exogeneity)
- Formally: ${Y(0), Y(1)} \perp D | X$
- Meaning: D is as good as randomly assigned among subjects with the
sames values of X


2. **Common support assumption**
- Formally: $0 < p(X) < 1$, where $p(X) := P(D=1 | X)$ is the
so-called propensity score
- Meaning: For any combination of covariate values occurring in the
population, there are both treated and nontreated subjects.


3. Covariates are not affected by the treatment, but measured at or
prior to the treatment assignment
- Formally: $X(1) = X(0) = X$

Comments:

- Conditional independence:
  - Unlikely to be satisfied in many empirical context: ability, intelligence, motivation, self-confidence, extrovertedness are rarely measured.
  - Plausibility of this assumption needs to be scrutinized based on theoretical arguments, domain knowledge, or previous empirical findings.

- Common support:
  - This assumption rules out that the covariates deterministically predicts the treatment

### 2. Identification of treatment effects

Note that $E[Y(D)|X] = E[Y|D,X]$; hence, we have

- The causal effects among subjects with the same values $x$ of observed covariates $$\begin{align*}\mathrm{CATE} &= E[Y(1)|X=x] - E[Y(0)|X=x] \\ &= E[Y|D=1, X=x] - E[Y|D=0, X=x],\end{align*}$$
- The average (homogeneous) treatment effect by averaging CATE across all values that $X$ takes **in the population** $$\mathrm{ATE} = E[\mathrm{CATE}]$$
- The average treatment effect on the treated by averaging CATE across all values that $X$ takes **among the treated population** $$\begin{align*}\mathrm{ATET} &= E[E[Y|D=1,X=x]|D=1] - E[E[Y|D=0,X=x]|D=1] \\ &= E[Y|D=1] - E[E[Y|D=0,X=x]|D=1]\end{align*}$$


### 3. Estimation of treatment effects

Since $E[Y(D)|X] = E[Y|D,X]$, we can use a regression model to estimate $E[Y|D,X]$

$$E[Y|D,X] = \alpha + \beta_D D + \beta_{X_1} X_1 + \ldots \beta_{X_K} X_K, \qquad \mathrm{ where } \qquad \hat{\beta}_D =\frac{\widehat{\mathrm{Cov}}(Y,D|X)}{\widehat{\mathrm{Var}}(D|X)}. \qquad (*)$$

Comments:
- In general, $\mathrm{Cov}(D,X) \neq 0$ holds; so we expect a larger
variance of $\hat{\beta}_D$ relative to the experimental context where
$\mathrm{Cov}(D,X) = 0$
- Model misspecification (e.g., Y and X is not linear as in $(*)$)
implies that $\hat{\beta}_D$ is biased and inconsistent, in contrast
with the experimental context.
- Omission of interactions between D and some or all variables in $X$
causes $\hat{\beta}_D \neq \mathrm{CATE}$ and implicilty assumes that
average effects are homogeneous, i.e. $\mathrm{ATE} = \mathrm{CATE} =
\mathrm{ATET}$.

### 4. Practice

- [Imbens - Xu](https://arxiv.org/abs/2406.00827)
- [Replica Imbens - Xu](https://yiqingxu.org/tutorials/lalonde/)

### References

- Huber, M. (2023). Causal analysis: Impact evaluation and Causal Machine Learning with applications in R. MIT Press.

___