## What does "doubly robust" mean? 

Doubly robust methods estimate two models:
- an *outcome model*
$$\mu_d(X_i) = E(Y_i \mid D_i = d, X_i)$$
- and a *exposure model* (or treament model or propensity score):
$$\pi(X_i) = E(D_i \mid X_i)$$

where $\mu_d(\cdot)$ is the model of control or treatment $D_i = d=\{0, 1\}$, $X_i$ is a vector of covariates for unit $i = 1, \ldots, N$, $Y_i$ is the outcome, and $\pi(\cdot)$ is the exposure model. Note that covariates included in $X_i$ can be different for the two models. 

An estimator is called "doubly robust" if it achieves constistent estimation of the ATE (or whatever estimand we're interested in) as long as *at least one* of these two models is consistently estimated. This means that the outcome model can be completely misspecified, but as long as the exposure model is correct, our estimation of the ATE will be consistent. This also means that the exposure model can be completely wrong, as along as the outcome model is correct.  


## Origins of Doubly Robust Methods

According to Bang and Robins (2005), doubly robust methods have their origins in missing data models Robins, Rotnitzky, and Zhao (1994) and Rotnitzky, Robins, and Scharfstein (1998) developed augmented orthogonal inverse probability-weighted (AIPW) estimators in missing data models, and Scharfstein, Rotnitzky, and Robins (1999) showed that AIPW was doubly robust and extended to causal inference.  

But Kang and Schafer (2007) argue that doubly robust methods are older. They cite work by Cassel, Särndal, and Wretman (1976, 1977), who proposed “generalized regression estimators” for population means from surveys where sampling weights must be estimated.  

Arguably, doubly robust methods go back even further than this. The form of doubly robust methods is similar to residual-on-residual regression, which dates back to Frisch, Waugh, and Lovell (1933) famous FWL theorem:
$$\beta_D = \frac{\text{Cov}(\tilde Y_i, \tilde D_i)}{\text{Var}(\tilde D_i)}$$
where $\tilde D_i$ is the residual part of $D_i$ after regressing it on $X_i$, and $\tilde Y_i$ is the residual part of $Y_i$ after regressing it on $X_i$.  

There are also links between doubly robust methods and matching with regression adjustment. This work goes back to at least Rubin (1973), who suggested that regression adjustment in matched data produces less biased estimates that either matching (exposure adjustment) or regression (outcome adjustment) do by themselves. 

## Assumptions

Most doubly robust methods require almost all of the standard assumptions necessary for all methods that depend on selection on observables. Although some doubly robust methods relax one or two of these, the six standard assumptions are:
1. Consistency
2. Positivity/overlap
3. One version of treatment
4. No interference
5. IID observations
6. Conditional ignorability: $\{Y_{i0}, Y_{i1}\} \perp \!\!\! \perp D_i \mid X_i$

Special attention should be paid to Assumption 6: doubly robust methods will not work if we do not measure an important confounder that affects both treatment and exposure. But notably, the doubly robust methods covered in this tutorial make no functional form assumptions. Most use flexible machine learning algorithms to estimate both the outcome and exposure models, with regularization (often through cross-fitting) to avoid overfitting.  

If these six assumptions are met, and we use the right estimator, we get double robustness: consistent estimation if either treatment or outcome model correct.


## A simple demonstration: augmented inverse probability weights

