# 06 Matching

Recall the idea in the chapter of propensity score, some data have extreme propensity scores and should be discarded. A similar idea is used in matching. We match treated data and controlled data into pairs. And then we discard data that are not matched. 

## Exact Matching

Assume the confounding variable $c\in C$ is discrete (categorical) variable. For each treated units $w_1,\dotsc,w_n$, we find a control unit $w_i^*$ such that $c(w_i)=c(w_i^*)$ for $j=1,\dotsc,n$. If there are multiple control units, we choose one of them randomly. The matched control units are denoted by $w_1^*,\dotsc,w_n^*$.

**The unmatched control units are discarded.** The matched data set is $\{(w_1,w_1^*), \dotsc, (w_n,w_n^*)\}$.

For a control unit $w$, denote $I(w)= 1$ if $w$ is matched and $0$ otherwise. 

After exact matching, we can assume that

$$\mathbb P(c|A=1)=\mathbb P(c|A=0,\ I=1)$$



### Coarsened Exact Matching

For continous variables, we can perform stratification to convert them to discrete variables. Note that stratification faces bias-variance tradeoff. More strata leads to less bias but larger variance. 

This is known as coarsened exact matching (CEM).

### Average Treatment Effect of Treated

Average treatment effect of treated (ATT) is defined as
$$\text{ATT}=\mathbb E[Y^1 - Y^0|A=1].$$


Similarly we can define the average treatment of controlled (ATC):
$$\text{ATC}=\mathbb E[Y^1 - Y^0|A=0].$$


### Average Treatment Effect

Having computed $\text{ATT}$ and $\text{ATC}$, we can compute the average treatment effect (ATE):

$$\begin{aligned}\text{ATE}&=\mathbb E[Y^1 - Y^0] = \mathbb E[Y^1 - Y^0|A=1]\cdot \mathbb P(A=1) + \mathbb E[Y^1 - Y^0|A=0]\cdot\mathbb P(A=0)\\ &= \text{ATT}\cdot \mathbb P(A=1) + \text{ATC}\cdot \mathbb P(A=0)\end{aligned}$$




### Estimator

When we use the exact matching, we have 
$$\widehat{\text{ATT}} = \widehat{\mathbb E}[Y^1-Y^0|A=1] = \frac{1}{n}\sum_{i=1}^n(Y_i - Y_i^*).$$

The variance is estimated by 
$$\widehat{\text{Var}}(\widehat{\text{ATT}}) = \frac{1}{n}\sum_{i=1}^n(Y_i - Y_i^* - \widehat{\text{ATT}})^2.$$

<br>

### Other Matching Methods

If the control units are adequate, we can match each treated unit to multiple controls. 

Another problem is whether match the control units with replacement.

## Inexact Matching

Instead of matching exact confounding variables, we can match any units with close confounding variables.

### Distance

We shall define the distance between confounding variables.

#### Propensity Score Distance

Recall we can fit a logistic regression for propensity score: $e(c) = \mathbb P(A=1|C=c) = \frac{1}{1+\exp(-\beta_0-\beta^Tc)}$. Then, we use 

$$\text{dist}(c_1,c_2) = |\beta^T(c_1 - c_2)|.$$

It is the distance between the logit of propensity scores.

#### Mahalanobis Distance

We can also use Mahalanobis distance. Let $\Sigma$ to be the covariance matrix of confounding variables of each unit, where $\Sigma_{ij} = {\rm Cov}(c_i,c_j)$.

$$\text{dist}(c_1,c_2) = \sqrt{(c_1-c_2)^T\Sigma^{-1}(c_1-c_2)}.$$



### Bias Correction

Fit a model $\mu_0(c) = \mathbb E(Y^0|C=c)$, representing the effect of $c$ on $Y^0$. We will now assume $\mu_0$ is linear, i.e. $\mu_0(c) = \beta_0 + \beta^Tc$. For each pair of matched units $(w_i,w_i^*)$, we have 
$${\text{ATT}}_i = \mathbb E(Y_i^1 - Y_i^0)=\mathbb E(Y_i^1 - Y_i^{*0}) +\mathbb E( Y_i^{*0}-Y_i^0)=\widehat{\text{ATT}}_i+(\mu_0(c_i^*) - \mu_0(c_i)).$$

The final term $(\mu_0(c_i^*) - \mu_0(c_i))=\beta^T(c_i^*-c_i)$ is the bias. We can correct the bias by adding it to the estimator $\widehat{\text{ATT}}_i$:
$$\widehat{\text{ATT}}_{i\ \text{bc}} = \widehat{\text{ATT}}_i + \beta^T(c_i^*-c_i).$$

As a result, the overall estimators are 
$$\left\{\begin{aligned} &
\widehat{\text{ATT}}_{\text{bc}}= \frac{1}{n}\sum_{i=1}^n\widehat{\text{ATT}}_{i\ \text{bc}}\\  &
\widehat{\text{Var}}(\widehat{\text{ATT}}_{\text{bc}})= \frac{1}{n}\sum_{i=1}^n(\widehat{\text{ATT}}_{i\ \text{bc}} - \widehat{\text{ATT}}_{\text{bc}})^2.
\end{aligned}\right.$$