## Tuning Parameters for Python A/B Testing with DoubleML

Here, we illustrate how to tune the parameters of lasso, random forest and gradient boosting. The results are the same as in the A/B Test Main notebook [TODO: Insert link]. The baseline learner linear regression requires no tuning and is therefore not included in this notebook.

Tuning in [DoubleML](https://docs.doubleml.org/stable/index.html) builds upon the functionalities of the Python package scikit learn. The hyperparameter-tuning is performed using either an exhaustive search over specified parameter values implemented in [sklearn.model_selection.GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) or via a randomized search implemented in [sklearn.model_selection.RandomizedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

We will perform parameter tuning in the A/B Case Study example based on the models:

* Partially linear regression (PLR)
* Interactive regression model (IRM)

for
* logistic regression ([LogisticRegressionCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html)),
* lasso ([LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html)),
* random forests ([RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)),
* extreme gradient boosting ([XGBRegressor/XGBClassifier](https://xgboost.readthedocs.io/en/latest/python/python_api.html)). 

In [40]:
import pandas as pd
df = pd.read_csv('data/high42.csv', sep=',', na_values=".")

#code can also be loaded using a URL of the data
#url = 'https://raw.githubusercontent.com/DoubleML/doubleml-hiwi-sandbox/master/case_studies/AB_testing/data/high42.CSV?token=AQWGKQUK2PJ4ANBW5GFDG2LAY5ZN4'
#df = pd.read_csv(url)

In [41]:
#pip install -U DoubleML

In [42]:
import numpy as np
import pandas as pd
import doubleml as dml

from sklearn.preprocessing import PolynomialFeatures

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression

from sklearn.linear_model import LassoCV, LogisticRegressionCV
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

from xgboost import XGBClassifier, XGBRegressor

import matplotlib.pyplot as plt

### The Data Backend: `DoubleMLData`

In [43]:
# Set up basic model: Specify variables for data-backend
features_base = list(df.columns.values)[2:]

# Initialize DoubleMLData (data-backend of DoubleML)
data_dml_base = dml.DoubleMLData(df,
                                 y_col='Y',
                                 d_cols='A',
                                 x_cols=features_base)

#print(data_dml_base)

## 1. Tuning: Interactive Regression Model (IRM)
### Lasso

We start our tuning exercise with the interactive regression model with the nuisance parts being estimated by lasso. In a first step we set up our learners based on the [sklearn.linear_model](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model) package. Here, we don't have to specify parameters.  In a second step we initalize our model object, here we use the class [DoubleMLIRM](https://docs.doubleml.org/dev/api/generated/doubleml.DoubleMLIRM.html). We specify the linear model [LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV) in a pipeline with a standard scaler applied. 

For the lasso regression we set 5 folds of cross validation and the maximum number of iterations to 10000.  [LogisticRegressionCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html) with L1 penalty is a lassso classifier. A liblinear `solver` is chosen, which is specifically good for small datasets. A grid of `Cs`, which control the strength of regularization, is created that ranges from 0 to 1. The best `C` is selected by the cross-validation, defined by the `cv` parameter. Note that if we don't specify any parameters for the learners, default values are used.

Using the assigned learners, we initialize the `DoubleMLIRM` model.

In [44]:
# Lasso
Cs = 0.0001*np.logspace(0, 4, 10) 
lasso = make_pipeline(StandardScaler(), LassoCV(cv=5, max_iter=10000))     

lasso_class = make_pipeline(StandardScaler(), LogisticRegressionCV(cv=5, penalty='l1', solver='liblinear',      
                                                 Cs = Cs, max_iter=1000))          


# Initialize DoubleMLIRM model
np.random.seed(123)

dml_irm_lasso = dml.DoubleMLIRM(data_dml_base,
                          ml_g = lasso,
                          ml_m = lasso_class,
                          trimming_threshold = 0.025,                          
                          n_folds = 3)

When one doesn't know, which hyperparameters to set when assigning the learner, trying different parameters or tuning is necessery.

### 1. Parameters to Tune

All parameters of a learner can be accessed via a the field [.get_params()](https://docs.doubleml.org/dev/api/generated/doubleml.DoubleMLPLR.html#doubleml.DoubleMLPLR.get_params). For example in the case of the [LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV) the parameters that can tuned are:

In [45]:
print(lasso.get_params().keys())

dict_keys(['memory', 'steps', 'verbose', 'standardscaler', 'lassocv', 'standardscaler__copy', 'standardscaler__with_mean', 'standardscaler__with_std', 'lassocv__alphas', 'lassocv__copy_X', 'lassocv__cv', 'lassocv__eps', 'lassocv__fit_intercept', 'lassocv__max_iter', 'lassocv__n_alphas', 'lassocv__n_jobs', 'lassocv__normalize', 'lassocv__positive', 'lassocv__precompute', 'lassocv__random_state', 'lassocv__selection', 'lassocv__tol', 'lassocv__verbose'])


### 2. Tuning Space

Now, we have to specify at least

1. which parameters we would like to tune
2. the tuning space, i.e., a grid of values for these parameters, 

To specify the exact tuning set up in our example, we have to create objects that address points 1. and 2. These are then passed as arguments to the method [tune()](https://docs.doubleml.org/dev/api/generated/doubleml.DoubleMLPLR.html#doubleml.DoubleMLPLR.tune). To pass the tuning space to the model object, we collect the tuning spaces for all nuisance parts in a named list. The names of the arguments must match the names of the nuisance components. Since tuning takes time the `tune()` method has been commented out. If you uncomment it, it will yield a `C` of $4$ to be the best regularization parameter. A random state of $0$ is assigned for replication.

In [46]:
tuning_space_ml_g = {}                        #no tuning here (empty grid)
tuning_space_ml_m = {
    'logisticregressioncv__Cs' : np.logspace(0.001, 1, 20, dtype =int),
    'logisticregressioncv__random_state' : [0]
    }

tuning_spaces = ({"ml_g" : tuning_space_ml_g, 
                     "ml_m" : tuning_space_ml_m})

#to tune lasso, uncomment next two lines
#dml_irm_lasso.tune(tuning_spaces, search_mode = 'randomized_search',  tune_on_folds = True,
#                    n_folds_tune=3, n_iter_randomized_search = 50, set_as_params = True, n_jobs_cv = -1)

In [47]:
print(dml_irm_lasso.get_params('ml_g0'))
print(dml_irm_lasso.get_params('ml_g1'))
print(dml_irm_lasso.get_params('ml_m'))

{'A': [None]}
{'A': [None]}
{'A': [None]}


Next, the best paramter of C is assigned using [.set_ml_nuisance_params](https://docs.doubleml.org/dev/api/generated/doubleml.DoubleMLPLR.html#doubleml.DoubleMLPLR.set_ml_nuisance_params) method and the model is fit. If you notice only the parameter `C=4` is set for the nuisance fuctnion `ml_m`, while the parameters of the other functions (`mlg0` and `ml_g1`) are left to their defaults. The result of the estimate is the same as in the main notebook.

In [48]:
# Set nuisance-part specific parameters
dml_irm_lasso.set_ml_nuisance_params('ml_g0', 'A', {})
dml_irm_lasso.set_ml_nuisance_params('ml_g1', 'A', {})
dml_irm_lasso.set_ml_nuisance_params('ml_m', 'A', {
    'logisticregressioncv__Cs': 4, 'logisticregressioncv__random_state': 0})

#fit model with above set parameters
dml_irm_lasso.fit(store_predictions=True) 
lasso_summary = dml_irm_lasso.summary

lasso_summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
A,0.834276,0.070511,11.831913,2.669875e-32,0.696078,0.972475


### Random Forest

When the treated and untreated groups are unbalanced it could be hard to find overlap in order to calculate the ATE. To reduce the disproportionate impact of extreme propensity score weights in the interactive model, we trim the propensity scores, which are close to the bounds. The trimming threshold for the propensitiy scores is set to $0.025$ meaning that propensity score below $2.5%$ or abobe $97.5%$ are removed. This will remove some of the variance in the model, but will add bias back.

A random forest regression and classifier are called.

In [49]:
# Random Forest
randomForest = RandomForestRegressor()
randomForest_class = RandomForestClassifier()

np.random.seed(123)
dml_irm_forest = dml.DoubleMLIRM(data_dml_base,
                                 ml_g = randomForest,
                                 ml_m = randomForest_class,
                                 trimming_threshold = 0.025,
                                 n_folds = 3)

A tuning grid of parameters is created for the two learners. Uncommenting the [`.tune`](https://docs.doubleml.org/dev/api/generated/doubleml.DoubleMLIRM.html?highlight=tune#doubleml.DoubleMLIRM.tune) method will begin the tuning, which can take more than an hour.

In [50]:
#RF Tuning 

tuning_space_ml_g = {
 'max_depth': [8, 10, 12, 15],
'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4, 5],
 'min_samples_split': [4, 5, 6, 10],
 'n_estimators': [100, 200, 300, ]}

tuning_space_ml_m = {
 'max_depth': [3, 5, 6, 7, 10],
'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [5, 6, 7, 8],
 'min_samples_split': [2, 3, 5],
 'n_estimators': [100, 200]}

tuning_spaces = ({"ml_g" : tuning_space_ml_g, 
                     "ml_m" : tuning_space_ml_m})

import warnings
warnings.filterwarnings("ignore")

type(tuning_spaces)
#dml_irm_forest.tune(tuning_spaces, search_mode = 'randomized_search', tune_on_folds = True,
#                    n_folds_tune=3, n_iter_randomized_search = 50, set_as_params = True, return_tune_res = True, n_jobs_cv = -1)

dict

Using the [.get_params](https://docs.doubleml.org/dev/api/generated/doubleml.DoubleMLIRM.html?highlight=get_params#doubleml.DoubleMLIRM.get_params) method the tuned parameters are printed, which are then hardcoded into [.set_ml_nuisance_params](https://docs.doubleml.org/dev/api/generated/doubleml.DoubleMLPLR.html#doubleml.DoubleMLPLR.set_ml_nuisance_params) method. This way one does not have to wait for the tuning every time.

In [51]:
print(dml_irm_forest.get_params('ml_g0'))
print(dml_irm_forest.get_params('ml_g1'))
print(dml_irm_forest.get_params('ml_m'))

{'A': [None]}
{'A': [None]}
{'A': [None]}


The results are the same as those in the main notebook.

In [52]:
#Set nuisance-part specific parameters
dml_irm_forest.set_ml_nuisance_params('ml_g0', 'A', {
    'max_features': 200, 'n_estimators': 250})
dml_irm_forest.set_ml_nuisance_params('ml_g1', 'A', {
    'max_features': 200, 'n_estimators': 250})
dml_irm_forest.set_ml_nuisance_params('ml_m', 'A', {
    'max_features': 200, 'n_estimators': 250})

dml_irm_forest.fit(store_predictions=True) 
forest_summary = dml_irm_forest.summary

forest_summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
A,0.86203,0.08229,10.475455,1.119983e-25,0.700743,1.023316


### Boosted Trees

First we specify the learning task and the corresponding learning objective. The first nuiscance function is estimated using the `XGBRegressor` with objective option reg:squarederror, is a regression with squared loss. The default evaluation metric is the rmse. The second nuisance function is estimated by classification, hence the objective is binary:lorgistic with an evaluation metric logloss. Eta, which ranges $[0,1]$ is the step size shrinkage is used during update to prevent overfitting. Because our outpuv value is already numeric we set the label encoder to false. For further information see the documentation of [XGBoost](https://xgboost.readthedocs.io/en/latest/parameter.html).

In [53]:
boost = XGBRegressor(n_jobs=1, objective = "reg:squarederror")
boost_class = XGBClassifier(use_label_encoder=False, n_jobs=1,
                            objective = "binary:logistic", eval_metric = "logloss")

np.random.seed(123)
dml_irm_boost = dml.DoubleMLIRM(data_dml_base,
                                ml_g = boost,
                                ml_m = boost_class,
                                trimming_threshold = 0.025,
                                n_folds = 3)

In [54]:
print(dml_irm_boost.params_names)


['ml_g0', 'ml_g1', 'ml_m']


### Grid Search

Similarly to the lasso and random forest the tuning for gradient boosting is done. Due to the smaller parameter grid, it only takes about 2 minutes to tune. The hardcoded parameters can be verified by uncommenting the `.tune` method and letting it run.

In [55]:
#parameter space wide

tuning_space_ml_g = {'n_estimators': [5,8,15,25,30, 40, 50], 'eta': [0.01,0.1,0.2,0,5]}

tuning_space_ml_m = {'n_estimators': [5,8,15,25,30, 40, 50], 'eta': [0.01,0.1,0.2,0,5]}

tuning_spaces = ({"ml_g" : tuning_space_ml_g,              
                     "ml_m" : tuning_space_ml_m})

import warnings
warnings.filterwarnings("ignore")

type(tuning_spaces)
#dml_irm_boost.tune(tuning_spaces, search_mode = 'randomized_search', tune_on_folds = True,
#                    n_folds_tune=5, set_as_params = True, return_tune_res = False)

dict

In [56]:
print(dml_irm_boost.get_params('ml_g0'))
print(dml_irm_boost.get_params('ml_g1'))
print(dml_irm_boost.get_params('ml_m'))

{'A': [None]}
{'A': [None]}
{'A': [None]}


In [57]:
#Set nuisance-part specific parameters
dml_irm_boost.set_ml_nuisance_params('ml_g0', 'A', {
    'n_estimators': 30, 'eta': 0.2})
dml_irm_boost.set_ml_nuisance_params('ml_g1', 'A', {
    'n_estimators': 30, 'eta': 0.2})
dml_irm_boost.set_ml_nuisance_params('ml_m', 'A', {
    'n_estimators': 15, 'eta': 0.2})

<doubleml.double_ml_irm.DoubleMLIRM at 0x207f162bee0>

In [58]:
dml_irm_boost.fit(store_predictions=True)
boost_summary = dml_irm_boost.summary

boost_summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
A,0.739225,0.092511,7.990648,1.34231e-15,0.557906,0.920544


### Summary Results
We define a small function which calculates the bias from the true parameter $0.8$.

In [59]:
def bias(x):
    x=x - 0.8
    x=round(abs(x),3)
    return x

In [60]:
irm_summary = pd.concat((lasso_summary, forest_summary, boost_summary))
irm_summary.index = ['lasso', 'forest', 'xgboost']

irm_summary.insert(1, "Bias", [bias(irm_summary.loc['lasso','coef']), 
                               bias(irm_summary.loc['forest','coef']), 
                               bias(irm_summary.loc['xgboost','coef'])])

irm_summary[['coef', 'Bias', '2.5 %', '97.5 %']].round(3)

Unnamed: 0,coef,Bias,2.5 %,97.5 %
lasso,0.834,0.034,0.696,0.972
forest,0.862,0.062,0.701,1.023
xgboost,0.739,0.061,0.558,0.921


## 2. Tuning: Partial Linear Regression Model

### Lasso

The lasso tuning from above problem works quite well, therefore it was unchanged.

In [61]:
#PLR
from sklearn.linear_model import Lasso

#is pipeline bc. there is scaling applied before
lasso = make_pipeline(StandardScaler(), LassoCV(cv=5, max_iter=10000))          #lasso is only for regression (not for classification)

lasso_class = make_pipeline(StandardScaler(),
                            LogisticRegressionCV(cv=5, penalty='l1', solver='liblinear',      #logistic reg with l1 penalty is a lassso classifier
                                                 max_iter=1000))                              #liblinear solver for small datasets

np.random.seed(123)
# Initialize DoubleMLPLR model
dml_plr_lasso = dml.DoubleMLPLR(data_dml_base,
                                ml_g = lasso,
                                ml_m = lasso_class,
                                n_folds = 3)

dml_plr_lasso

<doubleml.double_ml_plr.DoubleMLPLR at 0x207f15ea670>

In [62]:
# Set nuisance-part specific parameters
dml_plr_lasso.set_ml_nuisance_params('ml_m', 'A', {
    'logisticregressioncv__Cs': 4, 'logisticregressioncv__random_state': 0})

#fit model with above set parameters
dml_plr_lasso.fit(store_predictions=True) 
lasso_summary = dml_plr_lasso.summary

lasso_summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
A,0.784574,0.070414,11.14235,7.803312e-29,0.646566,0.922582


In [63]:
dml_plr_lasso.fit(store_predictions=True)
lasso_summary = dml_plr_lasso.summary

lasso_summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
A,0.784574,0.070414,11.14235,7.803312e-29,0.646566,0.922582


The coefficient estimated with lasso is the same as in the main notebook [TODO:insert link of main notebook].

### Random Forest

In this case we use the random forest defaults which perform quite well. Therefore no tuning grid is provided.

In [64]:
randomForest = RandomForestRegressor()
randomForest_class = RandomForestClassifier()

np.random.seed(123)
dml_plr_forest = dml.DoubleMLPLR(data_dml_base,
                                 ml_g = randomForest,
                                 ml_m = randomForest_class,
                                 n_folds = 3,
                                n_rep=3)

In [65]:
dml_plr_forest.fit(store_predictions=True)
forest_summary = dml_plr_forest.summary

forest_summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
A,0.823899,0.073474,11.213529,3.499555e-29,0.679893,0.967904


### Bosting

Hyperparameters are pre-set for the boosting learners. The squared error objective is appropriate for a regression learner and the binary:logistic with a log loss evaluation metric for a classifier.

In [66]:
boost = XGBRegressor(n_jobs=1, objective = "reg:squarederror",
                     eta=0.1)

boost_class = XGBClassifier(use_label_encoder=False, n_jobs=1,
                            objective = "binary:logistic", eval_metric = "logloss",
                            eta=0.1)

# Boosted Trees

np.random.seed(123)
dml_plr_boost = dml.DoubleMLPLR(data_dml_base,
                                ml_g = boost,
                                ml_m = boost_class,
                                n_folds = 3)

Bosting is not very good without tuning, therefore a grid of parameters is provided below. It was found using a randomized search, that `n_estimators` in range $55$ and $65$ provide good results. Therfore in the gird search below this range for `n_estimators` is used. Similarly for the $ml_m$ it was found that `n_estimators` between $1$ and $10$ perform well. This refined our search to a smaller range of hyperparameters. To tune, uncomment `dml_plr_boost.tune`.

In [67]:
#tuning further parameters

tuning_space_ml_g = {'n_estimators': np.linspace(55, 65, 5, dtype = int), 'max_depth': [2,3,4,6]}

tuning_space_ml_m = {'n_estimators': np.linspace(1, 10, 5, dtype = int), 'max_depth': [2,3,4,6]}

tuning_spaces = ({"ml_g" : tuning_space_ml_g, 
                     "ml_m" : tuning_space_ml_m})

import warnings
warnings.filterwarnings("ignore")

type(tuning_spaces)
#dml_plr_boost.tune(tuning_spaces, search_mode = 'randomized_search')

print(dml_plr_boost.params_names)
print(dml_plr_boost.get_params('ml_g'))
print(dml_plr_boost.get_params('ml_m'))

['ml_g', 'ml_m']
{'A': [None]}
{'A': [None]}


The results of the tuning will show that the best hyperparameters for $ml_g$ are  `n_estimators: 57, max_depth: 2` and the best hyperparameters for $ml_m$ are `n_estimators: 10, max_depth: 3`. These parameters are hardcoded using the [`.set_ml_nuisance_params`](https://docs.doubleml.org/stable/api/generated/doubleml.DoubleMLPLR.html#doubleml.DoubleMLPLR.set_ml_nuisance_params) method. 

In [68]:
dml_plr_boost.set_ml_nuisance_params('ml_g', 'A', {                                  
    'n_estimators': 57, 'max_depth': 2})
dml_plr_boost.set_ml_nuisance_params('ml_m', 'A', {                          
    'n_estimators': 10, 'max_depth': 3})                                

<doubleml.double_ml_plr.DoubleMLPLR at 0x207f162b1f0>

In [69]:
dml_plr_boost.fit(store_predictions=True)
boost_summary = dml_plr_boost.summary

boost_summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
A,0.814555,0.071203,11.439929,2.641013e-30,0.675,0.95411


In [70]:
plr_summary = pd.concat((lasso_summary, forest_summary, boost_summary))
plr_summary.index = ['lasso', 'forest', 'xgboost']

plr_summary.insert(1, "Bias",  [bias(plr_summary.loc['lasso','coef']), 
                               bias(plr_summary.loc['forest','coef']), 
                               bias(plr_summary.loc['xgboost','coef'])])

plr_summary[['coef', 'Bias', '2.5 %', '97.5 %']].round(3)

Unnamed: 0,coef,Bias,2.5 %,97.5 %
lasso,0.785,0.015,0.647,0.923
forest,0.824,0.024,0.68,0.968
xgboost,0.815,0.015,0.675,0.954


## Summary Results All Models
Regression results are left out because there is no tuning required.

In [71]:
df_summary = pd.concat((irm_summary, plr_summary)).reset_index().rename(columns={'index': 'ML'})
df_summary['Model'] = np.concatenate((np.repeat('IRM Tune', 3), np.repeat('PLR Tune', 3)))
df_summary

df_summary=df_summary[['ML', 'Model','coef', 'Bias', '2.5 %', '97.5 %']]
df_summary
df_summary.set_index(['Model', 'ML']).round(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,coef,Bias,2.5 %,97.5 %
Model,ML,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
IRM Tune,lasso,0.834,0.034,0.696,0.972
IRM Tune,forest,0.862,0.062,0.701,1.023
IRM Tune,xgboost,0.739,0.061,0.558,0.921
PLR Tune,lasso,0.785,0.015,0.647,0.923
PLR Tune,forest,0.824,0.024,0.68,0.968
PLR Tune,xgboost,0.815,0.015,0.675,0.954
