# Python: GATE Sensitivity Analysis

In this simple example, we illustrate how the [DoubleML](https://docs.doubleml.org/stable/index.html) package can be used to perfrom a sensitivity analysis for group average treatment effects in the [DoubleMLIRM](https://docs.doubleml.org/stable/guide/models.html#interactive-regression-model-irm) model.


## Data

In [38]:
import numpy as np
import pandas as pd
import doubleml as dml

from doubleml.datasets import make_heterogeneous_data
from lightgbm import LGBMRegressor, LGBMClassifier
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.linear_model import Lasso, LogisticRegression

In [39]:
n_obs = 10000
p = 5

data_dict = make_heterogeneous_data(n_obs, p, binary_treatment=True)
data = data_dict['data']
# add random covariate
data['Z'] = np.random.normal(size=(n_obs, 1))
ite = data_dict['effects']

group = data['X_0'] >= 0.6

In [40]:
ite.mean()

4.440737992009818

In [41]:
true_group_effect = ite[group].mean()
print(true_group_effect)

4.930144582545341


In [42]:
weights = group.to_numpy() / group.mean()
print(weights)

[0.         2.50752257 0.         ... 2.50752257 2.50752257 2.50752257]


In [43]:
dml_data = dml.DoubleMLData(data, 'y', 'd')
print(dml_data)


------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X_0', 'X_1', 'X_2', 'X_3', 'X_4', 'Z']
Instrument variable(s): None
No. Observations: 10000

------------------ DataFrame info    ------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Columns: 8 entries, y to Z
dtypes: float64(8)
memory usage: 625.1 KB



In [44]:
ml_g = LGBMRegressor()
ml_m = LGBMClassifier()

#ml_g = RandomForestRegressor()
#ml_m = RandomForestClassifier()

#ml_g = Lasso()
#ml_m = LogisticRegression()

In [45]:
dml_irm_obj = dml.DoubleMLIRM(
    dml_data,
    ml_g,
    ml_m,
    n_folds=5,
    n_rep=5,
    trimming_threshold=0.01, 
    weights=weights)

In [46]:
dml_irm_obj.fit()
print(dml_irm_obj)


------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X_0', 'X_1', 'X_2', 'X_3', 'X_4', 'Z']
Instrument variable(s): None
No. Observations: 10000

------------------ Score & algorithm ------------------
Score function: ATE
DML algorithm: dml2

------------------ Machine learner   ------------------
Learner ml_g: LGBMRegressor()
Learner ml_m: LGBMClassifier()
Out-of-sample Performance:
Learner ml_g0 RMSE: [[0.60936811]
 [0.60738428]
 [0.61016368]
 [0.61030198]
 [0.60422889]]
Learner ml_g1 RMSE: [[0.59984306]
 [0.59739267]
 [0.59931823]
 [0.59814635]
 [0.60062622]]
Learner ml_m RMSE: [[0.47054079]
 [0.46905376]
 [0.47059454]
 [0.46987946]
 [0.47042586]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 5
Apply cross-fitting: True

------------------ Fit summary       ------------------
       coef   std err          t  P>|t|     2.5 %   97.5 %
d  4.887055  0.064836  75.376067

In [47]:
dml_irm_obj.sensitivity_analysis()

<doubleml.double_ml_irm.DoubleMLIRM at 0x19f36d5db50>

In [48]:
dml_irm_obj.sensitivity_plot()

In [49]:
dml_irm_obj.sensitivity_benchmark(benchmarking_set=['Z'])

Unnamed: 0,cf_y,cf_d,rho,delta_theta
d,0.001473,0.0,-0.399529,-0.00475


In [50]:
dml_irm_obj

<doubleml.double_ml_irm.DoubleMLIRM at 0x19f36d5db50>

In [51]:
group_atte = (data['d'] == 1) * group
print(ite[group_atte].mean())

4.93234567628151


In [52]:
weights_atte = group_atte.to_numpy() / group_atte.mean()
m_0 = dml_irm_obj.predictions['ml_m'].squeeze()

weights_bar_atte = m_0 / group_atte.mean()
weight_dict = {'weights': weights_atte, 'weights_bar': weights_bar_atte}

In [53]:
dml_irm_obj = dml.DoubleMLIRM(
    dml_data,
    ml_g,
    ml_m,
    n_folds=5,
    n_rep=5,
    trimming_threshold=0.01, 
    weights=weight_dict)

In [54]:
dml_irm_obj.fit()
print(dml_irm_obj)


------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X_0', 'X_1', 'X_2', 'X_3', 'X_4', 'Z']
Instrument variable(s): None
No. Observations: 10000

------------------ Score & algorithm ------------------
Score function: ATE
DML algorithm: dml2

------------------ Machine learner   ------------------
Learner ml_g: LGBMRegressor()
Learner ml_m: LGBMClassifier()
Out-of-sample Performance:
Learner ml_g0 RMSE: [[0.60631069]
 [0.60874138]
 [0.60910365]
 [0.61089856]
 [0.612007  ]]
Learner ml_g1 RMSE: [[0.59874609]
 [0.59925509]
 [0.59658092]
 [0.59912497]
 [0.5985661 ]]
Learner ml_m RMSE: [[0.46954985]
 [0.46927345]
 [0.46987798]
 [0.46963963]
 [0.46985117]]

------------------ Resampling        ------------------
No. folds: 5
No. repeated sample splits: 5
Apply cross-fitting: True

------------------ Fit summary       ------------------
       coef   std err          t  P>|t|     2.5 %    97.5 %
d  4.892076  0.090342  54.15072

In [55]:
dml_irm_obj.sensitivity_benchmark(benchmarking_set=['Z'])

Unnamed: 0,cf_y,cf_d,rho,delta_theta
d,0.00254,0.0,0.027307,0.000493


In [56]:
dml_irm_obj.sensitivity_analysis()

ValueError: sensitivity_elements sigma2 and nu2 have to be positive. Got sigma2 [[[0.36132641]
  [0.36266366]
  [0.36059536]
  [0.36337279]
  [0.36333196]]] and nu2 [[[-21.48602295]
  [-19.00970374]
  [-20.52117574]
  [-20.55675177]
  [-20.36472436]]]. Most likely this is due to low quality learners (especially propensity scores).