# Estimating treatment effects with matching


The goal of this notebook is to estimate causal treatment effects using observational i.e. nonrandomized data.

One of the methods to achieve this goal is to use matching. The aim of any matching method is to reduce the bias of
an observational data set. This mean reducing the dissimilarity between the covariates distribution in the
treated $p(x \mid t = 1)$ and control $p(x \mid t = 0)$ groups. In theory, under the strong ignorability assumption,
the matched data set will mimic an RCT. Hence we can treat $X$ as independent of $T$ resulting
in $p^{t=1}(x) \approx p^{t=0}(x)$ [last sentence to be checked and citation needed].

The most popular matching based method is the propensity score matching [1]. It uses an estimatand of the true
propensity score to match the data. Hence observations with closest propensity scores will be matched.

There is a typical workflow that a causal study should follow. We will present it in this notebook based on the
package:
https://github.com/laurencium/causalinference
and with summary of workflow:
https://laurencewong.com/software/conclusion

### Loading the data

We load synthetic data using uberml package for making causal inference. We can choose from five different scenarios
to generate the data:
1. difficult nuisance components and an easy treatment effect;
2. a randomized trial; 3 an easy propensity and a difficult baseline;
4. unrelated treatment and control groups;
5. a hidden confounder biasing treatment.

In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from causalml.dataset import synthetic_data

y, X, treatment, true_ite, expected_outcome, true_propensity = synthetic_data(mode=1, n=1000, p=10, sigma=1.0)

# As the mean propensity doesn't say lot it would be nice to plot the true propensity to see the overlap between
# control and treated.
# It should be done like: http://ethen8181.github.io/machine-learning/ab_tests/causal_inference/matching.html


def calculate_ate(ite):
    return ite.mean().round(2), ite.std().round(2)

def calculate_propensity(propensity):
    return propensity.mean().round(2), propensity.std().round(2)

print("The true ATE of the generated data is",
      calculate_ate(true_ite)[0],
      "with standard deviation equal to",
      calculate_ate(true_ite)[1],
      ".")

print("The average propensity score value is equal to",
      calculate_propensity(true_propensity)[0],
      "with standard deviation equal to",
      calculate_propensity(true_propensity)[1],
      ".")


## 1. Design Phase

We begin by a design phase. Let's instatiate the causal model

In [None]:
from causalinference import CausalModel

causal = CausalModel(y, treatment, X)

### a. Accessing the covariate balance

We begin with accessing the initial covariate balance. This allows us to choose the right method and also
access how successful we were in de-biasing the data.

There is no agreement on how to access balance. We can investigate the difference in momements of each covariate.
A popular metric proposed by [citation needed] is the normalized difference in a covariate averages:

$$\frac{\overline{X}_i^{t=1} - \overline{X}_i^{t=0}}{\sqrt{\frac{1}{2}(s_i^{t=1})^2 + (s_i^{t=0})^2}}$$

This quantity can be easily access using the package and is showed under the Nor-diff column

To do: it would be interesting to see if it is related to the IPM term.

In [None]:
print(causal.summary_stats)

print("The maximum distance is", np.abs(causal.summary_stats['ndiff']).max().round(3))


To match on propensity we need to estimate the propensity score.

### 2. Estimating the propensity score

With the package we have two options of estimating the propensity score.

In [None]:
# Estimate propensity to improve the balance

causal.est_propensity()
print(causal.propensity)

In [None]:
# Use an algorithm for variable selection from Imbens to improve the balance
# Details in https://laurencewong.com/software/propensity-score
causal.est_propensity_s()
print(causal.propensity)

In [None]:
# To do in the analysis: supply your own propensity model.


In [None]:
# We may want to exclude cases that almost surely receive the treatment or not in order to analyze only more similiar
# observations. The logic behind this step is that regions with high propensity corresponds to the regions
# with a lack of overlap.

causal.trim_s()
causal.cutoff
print(causal.summary_stats)

#As you can see the number of observation have changed.

### 4. Stratification

An easy method to access whether the propensity score is helpful is to perform a stratification on the data set.
If the bins have less bias than the whole data then we are doing a good job. We again can choose from two methods.

In [None]:

# Now to access if the propensity is helpful we should see how it balances the strata
causal.blocks = 5
causal.stratify()
print(causal.strata)


In [None]:
# Using a data drive algorithm outlined in https://laurencewong.com/software/stratification
causal.reset()
causal.est_propensity_s()
causal.trim_s()
causal.stratify_s()
print(causal.strata)

In [None]:
# We can print the maximum imbalance for each bin

for stratum in causal.strata:
    print(np.absolute(stratum.summary_stats['ndiff']).max())

## 2. Analysis phase


What we could do now is to fit an OLS models to each of the stratified sub-samples and weight the resulted model to
obtain the first estimate of ATE.

In [None]:
causal.est_via_blocking()
print(causal.estimates)

However we can do better by using a matching estimator. This package matches however not on the propensity
score, but tries to find the best match in the covariate space by using nearest neighborhood matching. After performing the matching the algorithm takes
the average of the outcomes in both groups.

In [None]:
#invoke matching estimator

causal.est_via_matching()
print(causal.estimates)

We can improve the above estimate by adjusting for bias.

Let $m$ be the matching function. In general $X_i$ and $X_{m(i)}$ will not be similiar so the matching estimator will
be additionally biased.

The package is not really explaining what does adjusting on bias means. Under the hood it is modifying each ITE
by approximated by the dot product of the matching discrepancy (i.e., X-X_matched) and the
coefficients from the bias correction regression

In [None]:
causal.est_via_matching(bias_adj=True)
print(causal.estimates)

## 3. Conclusions

This approach i.e. vanilla k-NN matching on covariates it's certainly not satisfactory. The next step is to look
how we can perform propensity score matching. However with propensity scores there is a problem reported
in the literature, namely it can increase the balance easily:
https://gking.harvard.edu/files/gking/files/psnot.pdf

Further steps would include matching on the logit of the propensity score and look at what is done in case
studies of performing matching:

https://www.tandfonline.com/doi/pdf/10.1080/00273171.2011.540480

https://sci-hub.do/http://jhr.uwpress.org/content/50/2/373.full.pdf+html


There are also notebooks available to check for some ideas
http://www.degeneratestate.org/posts/2018/Mar/24/causal-inference-with-python-part-1-potential-outcomes/

Talks to watch:
https://www.youtube.com/watch?v=rBv39pK1iEs
https://www.youtube.com/watch?v=gaUgW7NWai8

## 4. Further considerations
When implementing an estimator in the future consider using this
https://scikit-learn.org/stable/developers/develop.html
https://sklearn-template.readthedocs.io/en/latest/user_guide.html

There are clever algorithm like Generative Matching that can exploit a loss function of our choice
and we could choose just the IPM or KL or whatever and this is worth analyzing. This is however written in R.

## 5. Uber's causal inference package

https://github.com/uber/causalml

In [None]:
from causalml.dataset import synthetic_data
from causalml.propensity import ElasticNetPropensityModel
import pandas as pd

In [None]:
y, X, treatment, _, _, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

df = pd.DataFrame(X, columns=['x'+ str(i) for i in range(X.shape[1])])
df['treatment'] = treatment
df['outcome'] = y
df.head()


In [None]:


pm = ElasticNetPropensityModel()
ps = pm.fit_predict(X, treatment)

df['propensity'] = ps

df.head()

In [None]:
from causalml.match import NearestNeighborMatch, create_table_one

"""
    Propensity score matching based on the nearest neighbor algorithm.
    Attributes:
        caliper (float): threshold to be considered as a match.
        replace (bool): whether to match with replacement or not
        ratio (int): ratio of control / treatment to be matched. used only if
            replace=True.
        shuffle (bool): whether to shuffle the treatment group data before
            matching
        random_state (numpy.random.RandomState or int): RandomState or an int
            seed
"""

psm = NearestNeighborMatch(replace=False,
                           ratio=1,
                           random_state=42)

In [None]:
matched = psm.match(data=df,
                    treatment_col= 'treatment',
                    score_cols= ['propensity'])

In [None]:
matched.treatment.value_counts()

In [None]:
create_table_one(data=matched,
                 treatment_col= 'treatment',
                 features=matched.columns.tolist()[0:-3])

In [None]:
# Now we can estimate the treatment effect simply by taking the average

def ate_matched(df, treatment_col = 'treatment', outcome_col = 'outcome'):
    df_control = df[df[treatment_col] == 0]
    df_treated = df[df[treatment_col] == 1]
    ate = df_treated[outcome_col].mean() - df_control[outcome_col].mean()
    return ate.round(2)

print("The average treatment effect estimated by propensity score matching is equal to",
      ate_matched(matched))

To do:
1. Look how it performs on multiple iterations with randomized data generating function. For this first make data modular.
2. Maybe we could use the cross validation metrics to tune the parameters
3. A sensible approach would be to stratify the sample into small bins based on some specified loss and then use
a blocking estimator or even estimate the pscore to improve the balance.

