# Quickstart Example for **AutoDoubleML**
This notebook demonstrates how to use the wrapper library for automated nuisance estimation and double machine learning.

In [1]:
from doubleml.datasets import make_plr_CCDDHNR2018, make_irm_data
from autodml.AutoDoubleMLPLR import AutoDoubleMLPLR
from autodml.AutoDoubleMLIRM import AutoDoubleMLIRM

We make example data with a function provided by ``DoubleML``. The `AutoDoubleMLPLR` objects inherits all methods and attributes from `DoubleMLPLR`, but it does not require nuisance estimators.
Instead, we pass a `time` argument, which is either an `int` of maximal nuisance tuning time in seconds equally distributed on all learners or a `dict` of tuning time in seconds per learner.

In [2]:
obj_dml_data = make_plr_CCDDHNR2018()
autodml_obj = AutoDoubleMLPLR(obj_dml_data, time = 20)

In [3]:
print(autodml_obj.fit())

Optimizing learners: ml_l for 10s, ml_m for 10s. Please wait.

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out

------------------ Machine learner   ------------------
Learner ml_l: ExtraTreesRegressor(max_features=np.float64(0.9586055955836026),
                    max_leaf_nodes=8, n_estimators=4, n_jobs=-1,
                    random_state=12032022)
Learner ml_m: LGBMRegressor(colsample_bytree=np.float64(0.9520950269114992),
              learning_rate=np.float64(0.34574139203168747), max_bin=127,
              min_child_samples=3, n_estimators=1, n_jobs=-1, num_leaves=4,
              reg_alpha=np.float64(0.004577823970660193),
              reg_l

The `evaluate_learners()` method can be used to track the nuisance learner performance.

In [4]:
autodml_obj.evaluate_learners()

{'ml_l': array([[1.18614578]]), 'ml_m': array([[1.2459035]])}

By providing a `dict` with keys being the names of the nuisance components and values being an int of tuning time in seconds for each component, we can also customize time per component.

In [5]:
time_dict =  {
    'ml_l' : 42,
    'ml_m' : 24,
}
autodml_obj = AutoDoubleMLPLR(obj_dml_data, time = time_dict)

autodml_obj.fit()

print(autodml_obj)

Optimizing learners: ml_l for 42s, ml_m for 24s. Please wait.

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: partialling out

------------------ Machine learner   ------------------
Learner ml_l: ExtraTreesRegressor(max_features=np.float64(0.9874127485181794),
                    max_leaf_nodes=10, n_estimators=4, n_jobs=-1,
                    random_state=12032022)
Learner ml_m: XGBRegressor(base_score=None, booster=None, callbacks=[], colsample_bylevel=1.0,
             colsample_bynode=None, colsample_bytree=1.0, device=None,
             early_stopping_rounds=None, enable_categorical=False,
             eval_metric=None, feature_types=None, gamma=None,
       

In [6]:
# evaluate learner fit
autodml_obj.evaluate_learners()

{'ml_l': array([[1.19836097]]), 'ml_m': array([[1.07781908]])}

The implementation is also available for `DoubleMLIRM` with similar syntax.

In [7]:
obj_dml_data = make_irm_data()
autodml_obj = AutoDoubleMLIRM(obj_dml_data, time = 30, score="ATE")
autodml_obj.fit()
print(autodml_obj)

Optimizing learners: ml_g0 for 10s, ml_g1 for 10s, ml_m for 10s. Please wait.

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: ATE

------------------ Machine learner   ------------------
Learner ml_g: XGBRegressor(base_score=None, booster=None, callbacks=[], colsample_bylevel=1.0,
             colsample_bynode=None, colsample_bytree=1.0, device=None,
             early_stopping_rounds=None, enable_categorical=False,
             eval_metric=None, feature_types=None, gamma=None, grow_policy=None,
             importance_type=None, interaction_constraints=None,
             learning_rate=np.float64(0.29999999999999993), max_bin=None,
             max_cat_threshold=Non

In [8]:
# evaluate learner fit
autodml_obj.evaluate_learners()

{'ml_g0': array([[1.10701786]]),
 'ml_g1': array([[1.11205585]]),
 'ml_m': array([[0.43664026]])}

In [9]:
# Custom split for training times
time_dict =  {
    'ml_g0' : 42,
    'ml_g1' : 42,
    'ml_m' : 24,
}
autodml_obj = AutoDoubleMLIRM(obj_dml_data, time = time_dict)


In [10]:
print(autodml_obj.fit())

Optimizing learners: ml_g0 for 42s, ml_g1 for 42s, ml_m for 24s. Please wait.

------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11', 'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20']
Instrument variable(s): None
No. Observations: 500

------------------ Score & algorithm ------------------
Score function: ATE

------------------ Machine learner   ------------------
Learner ml_g: LGBMRegressor(colsample_bytree=np.float64(0.7334731365290879),
              learning_rate=np.float64(0.2855822078594019), max_bin=511,
              min_child_samples=19, n_estimators=1, n_jobs=-1, num_leaves=4,
              reg_alpha=np.float64(0.0034571866620827637),
              reg_lambda=np.float64(7.587522733199777), verbose=-1)
Learner ml_m: ExtraTreesClassifier(criterion=np.str_('entropy'),
                     max_features=np.float64(0.49694211775732616),
 

In [11]:
# evaluate learner fit
autodml_obj.evaluate_learners()

{'ml_g0': array([[1.01989452]]),
 'ml_g1': array([[1.21723149]]),
 'ml_m': array([[0.42165199]])}