# Estimating treatment effects with matching


The goal of this notebook is to estimate causal treatment effects using observational i.e. nonrandomized data.

One of the methods to achieve this goal is to use matching. The aim of any matching method is to reduce the bias of
an observational data set. This mean reducing the dissimilarity between the covariates distribution in the
treated $p(x \mid t = 1)$ and control $p(x \mid t = 0)$ groups. In theory, under the strong ignorability assumption,
the matched data set will mimic an RCT. Hence we can treat $X$ as independent of $T$ resulting
in $p^{t=1}(x) \approx p^{t=0}(x)$ [last sentence to be checked and citation needed].

The most popular matching based method is the propensity score matching [1]. It uses an estimatand of the true
propensity score to match the data. Hence observations with closest propensity scores will be matched.

There is a typical workflow that a causal study should follow. We will present it in this notebook based on the
package:
https://github.com/laurencium/causalinference
and with summary of workflow:
https://laurencewong.com/software/conclusion

### Loading the data

We load synthetic data using uberml package for making causal inference. We can choose from five different scenarios
to generate the data:
1. difficult nuisance components and an easy treatment effect;
2. a randomized trial; 3 an easy propensity and a difficult baseline;
4. unrelated treatment and control groups;
5. a hidden confounder biasing treatment.

In [48]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from causalml.dataset import synthetic_data

y, X, treatment, true_ite, expected_outcome, true_propensity = synthetic_data(mode=1, n=1000, p=10, sigma=1.0)

# As the mean propensity doesn't say lot it would be nice to plot the true propensity to see the overlap between
# control and treated.
# It should be done like: http://ethen8181.github.io/machine-learning/ab_tests/causal_inference/matching.html


def calculate_ate(ite):
    return ite.mean().round(2), ite.std().round(2)

def calculate_propensity(propensity):
    return propensity.mean().round(2), propensity.std().round(2)

print("The true ATE of the generated data is",
      calculate_ate(true_ite)[0],
      "with standard deviation equal to",
      calculate_ate(true_ite)[1],
      ".")

print("The average propensity score value is equal to",
      calculate_propensity(true_propensity)[0],
      "with standard deviation equal to",
      calculate_propensity(true_propensity)[1],
      ".")


The true ATE of the generated data is 0.5 with standard deviation equal to 0.21 .
The average propensity score value is equal to 0.53 with standard deviation equal to 0.3 .


## 1. Design Phase

We begin by a design phase. Let's instatiate the causal model

In [49]:
from causalinference import CausalModel

causal = CausalModel(y, treatment, X)

### a. Accessing the covariate balance

We begin with accessing the initial covariate balance. This allows us to choose the right method and also
access how successful we were in de-biasing the data.

There is no agreement on how to access balance. We can investigate the difference in momements of each covariate.
A popular metric proposed by [citation needed] is the normalized difference in a covariate averages:

$$\frac{\overline{X}_i^{t=1} - \overline{X}_i^{t=0}}{\sqrt{\frac{1}{2}(s_i^{t=1})^2 + (s_i^{t=0})^2}}$$

This quantity can be easily access using the package and is showed under the Nor-diff column

To do: it would be interesting to see if it is related to the IPM term.

In [50]:
print(causal.summary_stats)

print("The maximum distance is", np.abs(causal.summary_stats['ndiff']).max().round(3))



Summary Statistics

                       Controls (N_c=469)         Treated (N_t=531)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        1.127        1.060        1.994        1.135        0.866

                       Controls (N_c=469)         Treated (N_t=531)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0        0.391        0.287        0.597        0.250        0.766
             X1        0.387        0.294        0.606        0.247        0.806
             X2        0.517        0.290        0.494        0.287       -0.077
             X3        0.512        0.283        0.492        0.277       -0.071
             X4        0.513        0.293        0.501        0.287       -0.040
      

To match on propensity we need to estimate the propensity score.

### 2. Estimating the propensity score

With the package we have two options of estimating the propensity score.

In [51]:
# Estimate propensity to improve the balance

causal.est_propensity()
print(causal.propensity)


Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept     -2.549      0.440     -5.792      0.000     -3.411     -1.686
            X0      3.038      0.280     10.867      0.000      2.490      3.586
            X1      3.162      0.279     11.326      0.000      2.615      3.710
            X2     -0.347      0.257     -1.351      0.177     -0.851      0.157
            X3     -0.278      0.268     -1.039      0.299     -0.803      0.247
            X4     -0.101      0.256     -0.393      0.694     -0.603      0.401
            X5      0.055      0.254      0.215      0.829     -0.443      0.552
            X6      0.058      0.264      0.219      0.827     -0.460      0.576
            X7     -0.038      0.256     -0.148      0.883     -0.541      0.465
            X8      0.071      0.261      0.271      0.787     -0.

In [52]:
# Use an algorithm for variable selection from Imbens to improve the balance
# Details in https://laurencewong.com/software/propensity-score
causal.est_propensity_s()
print(causal.propensity)


Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept     -7.252      0.694    -10.447      0.000     -8.613     -5.891
            X1     14.377      1.562      9.205      0.000     11.316     17.439
            X0     12.625      1.534      8.230      0.000      9.618     15.631
            X2     -0.253      0.269     -0.941      0.347     -0.780      0.274
         X1*X1     -8.472      1.150     -7.367      0.000    -10.726     -6.218
         X0*X0     -6.883      1.155     -5.959      0.000     -9.147     -4.620
         X1*X0     -4.209      1.110     -3.791      0.000     -6.385     -2.033



In [53]:
# To do in the analysis: supply your own propensity model.


In [54]:
# We may want to exclude cases that almost surely receive the treatment or not in order to analyze only more similiar
# observations. The logic behind this step is that regions with high propensity corresponds to the regions
# with a lack of overlap.

causal.trim_s()
causal.cutoff
print(causal.summary_stats)

#As you can see the number of observation have changed.


Summary Statistics

                       Controls (N_c=354)         Treated (N_t=520)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        1.201        1.063        2.012        1.126        0.811

                       Controls (N_c=354)         Treated (N_t=520)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0        0.461        0.289        0.608        0.242        0.548
             X1        0.466        0.291        0.615        0.242        0.557
             X2        0.528        0.290        0.493        0.287       -0.120
             X3        0.524        0.282        0.492        0.277       -0.115
             X4        0.510        0.290        0.501        0.287       -0.032
      

### 4. Stratification

An easy method to access whether the propensity score is helpful is to perform a stratification on the data set.
If the bins have less bias than the whole data then we are doing a good job. We again can choose from two methods.

In [55]:

# Now to access if the propensity is helpful we should see how it balances the strata
causal.blocks = 5
causal.stratify()
print(causal.strata)



Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.095     0.345       145        30     0.219     0.241     0.240
         2     0.346     0.601        99        76     0.465     0.483     0.449
         3     0.604     0.747        67       107     0.685     0.680     0.457
         4     0.748     0.823        25       150     0.778     0.785     0.539
         5     0.823     0.885        18       157     0.855     0.852     0.909



In [56]:
# Using a data drive algorithm outlined in https://laurencewong.com/software/stratification
causal.reset()
causal.est_propensity_s()
causal.trim_s()
causal.stratify_s()
print(causal.strata)


Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.095     0.258        93        18     0.171     0.195     0.156
         2     0.259     0.420        80        29     0.332     0.357     0.392
         3     0.421     0.486        35        20     0.451     0.453     0.324
         4     0.486     0.562        24        30     0.521     0.527     0.308
         5     0.562     0.685        44        65     0.632     0.635     0.534
         6     0.686     0.756        39        70     0.720     0.729     0.481
         7     0.756     0.803        17        92     0.775     0.780     0.455
         8     0.803     0.885        22       196     0.847     0.844     0.840



In [57]:
# We can print the maximum imbalance for each bin

for stratum in causal.strata:
    print(np.absolute(stratum.summary_stats['ndiff']).max())

0.2731891092980303
0.2893986947620062
0.5381418203734094
0.35627254386135665
0.3155245070547748
0.2236168152314016
0.42489371336658127
0.5727853595981692


## 2. Analysis phase


What we could do now is to fit an OLS models to each of the stratified sub-samples and weight the resulted model to
obtain the first estimate of ATE.

In [58]:
causal.est_via_blocking()
print(causal.estimates)


Treatment Effect Estimates: Blocking

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.516      0.079      6.502      0.000      0.361      0.672
           ATC      0.409      0.092      4.445      0.000      0.229      0.589
           ATT      0.589      0.091      6.465      0.000      0.411      0.768



However we can do better by using a matching estimator. This package matches however not on the propensity
score, but tries to find the best match in the covariate space by using nearest neighborhood matching. After performing the matching the algorithm takes
the average of the outcomes in both groups.

In [59]:
#invoke matching estimator

causal.est_via_matching()
print(causal.estimates)


Treatment Effect Estimates: Blocking

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.516      0.079      6.502      0.000      0.361      0.672
           ATC      0.409      0.092      4.445      0.000      0.229      0.589
           ATT      0.589      0.091      6.465      0.000      0.411      0.768

Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.675      0.123      5.505      0.000      0.435      0.916
           ATC      0.680      0.141      4.829      0.000      0.404      0.956
           ATT      0.672      0.140      4.814      0.000      0.399      0.946



We can improve the above estimate by adjusting for bias.

Let $m$ be the matching function. In general $X_i$ and $X_{m(i)}$ will not be similiar so the matching estimator will
be additionally biased.

The package is not really explaining what does adjusting on bias means. Under the hood it is modifying each ITE
by approximated by the dot product of the matching discrepancy (i.e., X-X_matched) and the
coefficients from the bias correction regression

In [60]:
causal.est_via_matching(bias_adj=True)
print(causal.estimates)


Treatment Effect Estimates: Blocking

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.516      0.079      6.502      0.000      0.361      0.672
           ATC      0.409      0.092      4.445      0.000      0.229      0.589
           ATT      0.589      0.091      6.465      0.000      0.411      0.768

Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.612      0.123      4.975      0.000      0.371      0.853
           ATC      0.591      0.141      4.194      0.000      0.315      0.868
           ATT      0.626      0.140      4.468      0.000      0.351      0.900



## 3. Conclusions

This approach i.e. vanilla k-NN matching on covariates it's certainly not satisfactory. The next step is to look
how we can perform propensity score matching. However with propensity scores there is a problem reported
in the literature, namely it can increase the balance easily:
https://gking.harvard.edu/files/gking/files/psnot.pdf

Further steps would include matching on the logit of the propensity score and look at what is done in case
studies of performing matching:

https://www.tandfonline.com/doi/pdf/10.1080/00273171.2011.540480

https://sci-hub.do/http://jhr.uwpress.org/content/50/2/373.full.pdf+html


There are also notebooks available to check for some ideas
http://www.degeneratestate.org/posts/2018/Mar/24/causal-inference-with-python-part-1-potential-outcomes/

Talks to watch:
https://www.youtube.com/watch?v=rBv39pK1iEs
https://www.youtube.com/watch?v=gaUgW7NWai8

## 4. Further considerations
When implementing an estimator in the future consider using this
https://scikit-learn.org/stable/developers/develop.html
https://sklearn-template.readthedocs.io/en/latest/user_guide.html

There are clever algorithm like Generative Matching that can exploit a loss function of our choice
and we could choose just the IPM or KL or whatever and this is worth analyzing. This is however written in R.

## 5. Uber's causal inference package

https://github.com/uber/causalml

In [61]:
from causalml.dataset import synthetic_data
from causalml.propensity import ElasticNetPropensityModel
import pandas as pd

In [62]:
y, X, treatment, _, _, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

df = pd.DataFrame(X, columns=['x'+ str(i) for i in range(X.shape[1])])
df['treatment'] = treatment
df['outcome'] = y
df.head()


Unnamed: 0,x0,x1,x2,x3,x4,treatment,outcome
0,0.806558,0.999063,0.685922,0.489712,0.064321,1,3.029459
1,0.35188,0.087048,0.3649,0.124443,0.151241,0,0.364078
2,0.286686,0.473618,0.008759,0.922285,0.702581,0,0.698016
3,0.732209,0.518169,0.11259,0.837357,0.683949,1,0.948612
4,0.897502,0.236969,0.592283,0.808822,0.076285,1,2.087665


In [63]:


pm = ElasticNetPropensityModel()
ps = pm.fit_predict(X, treatment)

df['propensity'] = ps

df.head()

Unnamed: 0,x0,x1,x2,x3,x4,treatment,outcome,propensity
0,0.806558,0.999063,0.685922,0.489712,0.064321,1,3.029459,0.917274
1,0.35188,0.087048,0.3649,0.124443,0.151241,0,0.364078,0.153071
2,0.286686,0.473618,0.008759,0.922285,0.702581,0,0.698016,0.388563
3,0.732209,0.518169,0.11259,0.837357,0.683949,1,0.948612,0.723073
4,0.897502,0.236969,0.592283,0.808822,0.076285,1,2.087665,0.631282


In [64]:
from causalml.match import NearestNeighborMatch, create_table_one

"""
    Propensity score matching based on the nearest neighbor algorithm.
    Attributes:
        caliper (float): threshold to be considered as a match.
        replace (bool): whether to match with replacement or not
        ratio (int): ratio of control / treatment to be matched. used only if
            replace=True.
        shuffle (bool): whether to shuffle the treatment group data before
            matching
        random_state (numpy.random.RandomState or int): RandomState or an int
            seed
"""

psm = NearestNeighborMatch(replace=False,
                           ratio=1,
                           random_state=42)

In [65]:
matched = psm.match(data=df,
                    treatment_col= 'treatment',
                    score_cols= ['propensity'])

In [66]:
matched.treatment.value_counts()

1    246
0    246
Name: treatment, dtype: int64

In [67]:
create_table_one(data=matched,
                 treatment_col= 'treatment',
                 features=matched.columns.tolist()[0:-3])

Unnamed: 0_level_0,Control,Treatment,SMD
Variable,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
n,246,246,
x0,0.50 (0.32),0.50 (0.26),-0.0129
x1,0.51 (0.31),0.51 (0.27),0.0066
x2,0.48 (0.28),0.51 (0.28),0.1127
x3,0.49 (0.28),0.48 (0.29),-0.0357
x4,0.49 (0.29),0.48 (0.30),-0.0069


In [68]:
# Now we can estimate the treatment effect simply by taking the average

def ate_matched(df, treatment_col = 'treatment', outcome_col = 'outcome'):
    df_control = df[df[treatment_col] == 0]
    df_treated = df[df[treatment_col] == 1]
    ate = df_treated[outcome_col].mean() - df_control[outcome_col].mean()
    return ate.round(2)

print("The average treatment effect estimated by propensity score matching is equal to",
      ate_matched(matched))

The average treatment effect estimated by propensity score matching is equal to 0.7


To do:
1. Look how it performs on multiple iterations with randomized data generating function. For this first make data modular.
2. Maybe we could use the cross validation metrics to tune the parameters
3. A sensible approach would be to stratify the sample into small bins based on some specified loss and then use
a blocking estimator or even estimate the pscore to improve the balance.

