# Python: IRM and APO Model Comparison

In this simple example, we illustrate how the (binary) [DoubleMLIRM](https://docs.doubleml.org/stable/guide/models.html#binary-interactive-regression-model-irm) model and the [DoubleMLAPOS](https://docs.doubleml.org/stable/guide/models.html#average-potential-outcomes-apos-for-multiple-treatment-levels) differ.

More specifically, we focus on the `causal_contrast()` method of [DoubleMLAPOS](https://docs.doubleml.org/stable/guide/models.html#average-potential-outcomes-apos-for-multiple-treatment-levels) in a binary setting to highlight, when both methods coincide.

In [1]:
import numpy as np
import pandas as pd
import doubleml as dml

from sklearn.linear_model import LinearRegression, LogisticRegression

from doubleml.datasets import make_irm_data

## Data

We rely on the [make_irm_data](https://docs.doubleml.org/stable/api/generated/doubleml.datasets.make_irm_data.html) go generate data with a binary treatment.

In [2]:
n_obs = 2000

np.random.seed(42)
df = make_irm_data(
    n_obs=n_obs,
    dim_x=10,
    theta=5.0,
    return_type='DataFrame'
)

df.head()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,y,d
0,0.54106,0.195963,0.83575,1.180271,0.655959,1.044239,-1.214769,-2.160836,-2.196655,-2.037529,4.704558,1.0
1,1.340235,2.31942,2.193375,1.139508,1.458814,0.493195,1.31387,0.127337,0.4184,1.070433,6.113952,1.0
2,-0.563563,-1.480199,0.943548,-0.400113,0.757559,0.131483,-0.57416,0.067212,-0.427654,-1.342117,-0.226479,0.0
3,-0.044176,-2.122421,-1.526582,-1.892828,-1.777867,-1.425325,0.020272,0.487524,-0.579197,-0.752909,0.367366,0.0
4,-1.896263,-1.198493,-1.285483,-1.361623,-2.921778,-2.026966,-1.107156,-0.498122,0.568287,1.004542,0.913585,0.0


First, define the ``DoubleMLData`` object.

In [3]:
dml_data = dml.DoubleMLData(
    df,
    y_col='y',
    d_cols='d'
)

## Learners and Hyperparameters

To simplify the comparison and keep the variation in learners as small as possible, we will use linear models.

In [4]:
n_folds = 5
n_rep = 1

dml_kwargs = {
    "obj_dml_data": dml_data,
    "ml_g": LinearRegression(),
    "ml_m": LogisticRegression(random_state=42),
    "n_folds": n_folds,
    "n_rep": n_rep,
    "normalize_ipw": True,
    "trimming_threshold": 1e-2,
    "draw_sample_splitting": False,
}

**Remark:**
All results rely on the exact same predictions for the machine learning algorithms. If the more than two treatment levels exists the `DoubleMLAPOS` model fit multiple binary models such that the combined model might differ.

Further, to remove all uncertainty from sample splitting, we will rely on externally provided sample splits.

In [5]:
from doubleml.utils import DoubleMLResampling

rskf = DoubleMLResampling(
    n_folds=n_folds,
    n_rep=n_rep,
    n_obs=n_obs,
    stratify=df['d'],
)
all_smpls = rskf.split_samples()

## Average Treatment Effect

Comparing the effect estimates for the `DoubleMLIRM` and `causal_contrasts` of the `DoubleMLAPOS` model, we can numerically equivalent results for the ATE.

In [6]:
dml_irm = dml.DoubleMLIRM(**dml_kwargs)
dml_irm.set_sample_splitting(all_smpls)
print("Training IRM Model")
dml_irm.fit()

print(dml_irm.summary)

Training IRM Model
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  5.002585  0.066201  75.566262    0.0  4.872833  5.132337


In [7]:
dml_apos = dml.DoubleMLAPOS(treatment_levels=[0,1], **dml_kwargs)
dml_apos.set_sample_splitting(all_smpls)
print("Training APOS Model")
dml_apos.fit()
print(dml_apos.summary)

print("Evaluate Causal Contrast")
causal_contrast = dml_apos.causal_contrast(reference_levels=[0])
print(causal_contrast.summary)

Training APOS Model
       coef   std err           t     P>|t|     2.5 %    97.5 %
0  0.037800  0.045201    0.836262  0.403008 -0.050793  0.126392
1  5.040385  0.048482  103.965037  0.000000  4.945363  5.135407
Evaluate Causal Contrast
            coef   std err          t  P>|t|     2.5 %    97.5 %
1 vs 0  5.002585  0.066201  75.566262    0.0  4.872833  5.132337


For a direct comparison, see

In [8]:
print("IRM Model")
print(dml_irm.summary)
print("Causal Contrast")
print(causal_contrast.summary)

IRM Model
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  5.002585  0.066201  75.566262    0.0  4.872833  5.132337
Causal Contrast
            coef   std err          t  P>|t|     2.5 %    97.5 %
1 vs 0  5.002585  0.066201  75.566262    0.0  4.872833  5.132337


## Average Treatment Effect on the Treated

For the average treatment effect on the treated we can adjust the score in `DoubleMLIRM` model to `score="ATTE"`.

In [9]:
dml_irm_atte = dml.DoubleMLIRM(score="ATTE", **dml_kwargs)
dml_irm_atte.set_sample_splitting(all_smpls)
print("Training IRM Model")
dml_irm_atte.fit()

print(dml_irm_atte.summary)

Training IRM Model
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  5.541136  0.082383  67.260366    0.0  5.379668  5.702605


In order to consider weighted effects in the `DoubleMLAPOS` model, we have to specify the correct weight, see [User Guide](https://docs.doubleml.org/stable/guide/heterogeneity.html#weighted-average-treatment-effects).

As these weights include the propensity score, we will use the predicted propensity score from the previous `DoubleMLIRM` model.


In [10]:
p_hat = df["d"].mean()
m_hat = dml_irm_atte.predictions["ml_m"][:, :, 0]

weights_dict = {
    "weights": df["d"] / p_hat,
    "weights_bar": m_hat / p_hat,
}

dml_apos_atte = dml.DoubleMLAPOS(treatment_levels=[0,1], weights=weights_dict, **dml_kwargs)
dml_apos_atte.set_sample_splitting(all_smpls)
print("Training APOS Model")
dml_apos_atte.fit()
print(dml_apos_atte.summary)

print("Evaluate Causal Contrast")
causal_contrast_atte = dml_apos_atte.causal_contrast(reference_levels=[0])
print(causal_contrast_atte.summary)

Training APOS Model
       coef   std err           t     P>|t|     2.5 %    97.5 %
0  0.044364  0.072137    0.615000  0.538555 -0.097022  0.185751
1  5.585498  0.040281  138.663329  0.000000  5.506548  5.664447
Evaluate Causal Contrast
            coef   std err          t  P>|t|     2.5 %    97.5 %
1 vs 0  5.541133  0.082736  66.974012    0.0  5.378975  5.703292


The point estimates are equal but on closer comparison the standard errors and confidence intervals are larger in the causal contrast example.

In [11]:
dml_irm_weighted_atte = dml.DoubleMLIRM(score="ATE", weights=weights_dict, **dml_kwargs)
dml_irm_weighted_atte.set_sample_splitting(all_smpls)
print("Training IRM Model")
dml_irm_weighted_atte.fit()

print(dml_irm_weighted_atte.summary)

Training IRM Model
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  5.541133  0.082736  66.974012    0.0  5.378975  5.703292


In summary, see

In [12]:
print("IRM Model ATTE Score")
print(dml_irm_atte.summary.round(4))
print("IRM Model (Weighted)")
print(dml_irm_weighted_atte.summary.round(4))
print("Causal Contrast (Weighted)")
print(causal_contrast_atte.summary.round(4))

IRM Model ATTE Score
     coef  std err        t  P>|t|   2.5 %  97.5 %
d  5.5411   0.0824  67.2604    0.0  5.3797  5.7026
IRM Model (Weighted)
     coef  std err       t  P>|t|  2.5 %  97.5 %
d  5.5411   0.0827  66.974    0.0  5.379  5.7033
Causal Contrast (Weighted)
          coef  std err       t  P>|t|  2.5 %  97.5 %
1 vs 0  5.5411   0.0827  66.974    0.0  5.379  5.7033


## Sensitivity Analysis

There exist also slight differences with respect to the bounds in the sensitivity analysis. 

In [13]:
dml_irm.sensitivity_analysis()
print(dml_irm.sensitivity_summary)


------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.03; cf_d=0.03, rho=1.0

------------------ Bounds with CI    ------------------
   CI lower  theta lower     theta  theta upper  CI upper
d  4.770463     4.917327  5.002585     5.087842  5.221186

------------------ Robustness Values ------------------
   H_0    RV (%)    RVa (%)
d  0.0  79.97688  57.230886


In [14]:
causal_contrast.sensitivity_analysis()
print(causal_contrast.sensitivity_summary)


------------------ Scenario          ------------------
Significance Level: level=0.95
Sensitivity parameters: cf_y=0.03; cf_d=0.03, rho=1.0

------------------ Bounds with CI    ------------------
        CI lower  theta lower     theta  theta upper  CI upper
1 vs 0  4.770463     4.917327  5.002585     5.087842  5.221186

------------------ Robustness Values ------------------
        H_0    RV (%)    RVa (%)
1 vs 0  0.0  79.97688  57.230886
