# The Overlap Learner
This brief piece of documentation is intended to present the Overlap Learner framework more formally, linking to the work of [Li et. al. (2014)](https://arxiv.org/pdf/1404.1785.pdf). [Li et. al. (2014)](https://arxiv.org/pdf/1404.1785.pdf) propose a unified framework for balancing weights, and a new weighting scheme called the "overlap weights", to carry out causal inference in presence of observational data. 

## The Setup
Consider a binary treatment $Z_i \in \{0, 1\}$, a continuous outcome $Y_i \in \mathbb{R}$ (or binary as well), and a set of features $\mathbf{X}_i \in \mathcal{X}$. Propensity score, defined as $\pi(x_i) = \mathbb{P} (Z_i = 1 | X_i = x_i) $, is very often use in observational studies to recover balance in the two treatment groups and mimic as much as possible a randomized experiment setting. A usual approach is to weight the outcome $Y_i$ by the Inverse Propensity Score Weights (IPW), defined as:

\begin{equation}
  w_z(x_i)=
  \begin{cases}
    \frac{1}{\pi(x_i)} & \text{if} ~~ Z_i=1 \\
    \frac{1}{1 - \pi(x_i)} & \text{if} ~~ Z_i=0
  \end{cases}
\end{equation}

IPW scheme has several advantages, but runs into severe issues of "exploding" bias and variance when $\pi(x_i)$ assumes extreme values (close to 0 or 1) and covariates distribution is unbalanced between the treated and control groups. In these cases, overlap assumption $0 < \pi(x_i) <1$ is threatened. [Li et. al. (2014)](https://arxiv.org/pdf/1404.1785.pdf) tackle issues of non-overlap scenarios by proposing a new set of "overlap weights" defined as:

\begin{equation}
  w_z(x_i)=
  \begin{cases}
    1 - \pi(x_i) & \text{if} ~~ Z_i=1 \\
    \pi(x_i) & \text{if} ~~ Z_i=0
  \end{cases}
\end{equation}

The rationale behind "overlap weights" is that they attempt to recreate balance in the two treated groups by weighting each units by their probability of being assigned to the opposite treatment group, and by doing this they assigns more emphasis on the population closest to a randomized experiment (i.e. on units that could have been seen in either group with similar probability - overlapping).

## The O-Learner
The O-Learner exploits the idea of overlap weights to develop a "Meta-Learner" algorithm for the estimation of Individual/Heterogeneous Treatment Effects (ITE) when dealing with observational studies where overlap assumption is often violated. The O-Learner can make use of more or less any base machine learning regression/classification algorithm (linear regression, tree ensembles, neural nets, etc.) found in the `sklearn` library.

O-Learner fits Conditional Average Treatment Effect (CATE), defined as $\tau (x_i) = \mathbb{E} [Y^{(1)} - Y^{(0)} | X_i = x_i]$, where $Y^{(Z_i)}$ is the potential outcome for the realization of $Z_i$, in three steps. 

1. The first step consists in fitting a probabilistic classifier to get estimates of the PS $\pi(x_i)$ (regressing $X_i$, or a different subset of covariates $W_i$, on $Z_i$).

2. Then it construct the overlap-weighted outcome, defined as: 

\begin{equation}
  Y_{i, O}=
  \begin{cases}
    Y_i (1 - \pi(x_i)) & \text{if} ~~ Z_i=1 \\
    Y_i \pi(x_i) & \text{if} ~~ Z_i=0
  \end{cases}
\end{equation}
