# Python: Hyperparametertuning with Optuna

This notebook explains how to use the implmented `tune_ml_models()` method to tune hyperparameters using the [Optuna](https://optuna.org/) package.

In this example, we will focus on the [DoubleMLAPO](https://docs.doubleml.org/stable/api/generated/doubleml.irm.DoubleMLAPO.html#doubleml.irm.DoubleMLAPO) model to estimate average potential outcomes (APOs) in an interactive regression model (see [DoubleMLIRM](https://docs.doubleml.org/stable/guide/models.html#binary-interactive-regression-model-irm)).

The goal is to estimate the average potential outcome

 $$\theta_0 =\mathbb{E}[Y(d)]$$

for a given treatment level $d$ and and discrete valued treatment $D$.

For a more detailed description of the DoubleMLAPO model, see [Average Potential Outcome Model](https://docs.doubleml.org/stable/guide/models.html#average-potential-outcomes-apos) or [Example Gallery](https://docs.doubleml.org/stable/examples/index.html).

In [1]:
import optuna
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.io import show

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import LinearRegression, Ridge, LogisticRegression
from sklearn.ensemble import StackingRegressor, StackingClassifier
from lightgbm import LGBMRegressor, LGBMClassifier

from doubleml.data import DoubleMLData
from doubleml.irm import DoubleMLAPO
from doubleml.irm.datasets import make_irm_data_discrete_treatments

palette = sns.color_palette("colorblind")

import warnings
warnings.filterwarnings("ignore")

## Data Generating Process (DGP)

At first, let us generate data according to the [make_irm_data_discrete_treatments](https://docs.doubleml.org/dev/api/datasets.html#dataset-generators) data generating process. The process generates data with a continuous treatment variable and contains the true individual treatment effects (ITEs) with respect to option of not getting treated.

According to the continuous treatment variable, the treatment is discretized into multiple levels, based on quantiles. Using the *oracle* ITEs, enables the comparison to the true APOs and averate treatment effects (ATEs) for the different levels of the treatment variable.

**Remark:** The average potential outcome model does not require an underlying continuous treatment variable. The model will work identically if the treatment variable is discrete by design.

In [2]:
# Parameters
n_obs = 500
n_levels = 3
treatment_lvl = 1.0

np.random.seed(42)
data_apo = make_irm_data_discrete_treatments(n_obs=n_obs,n_levels=n_levels, linear=False)

y0 = data_apo['oracle_values']['y0']
cont_d = data_apo['oracle_values']['cont_d']
ite = data_apo['oracle_values']['ite']
d = data_apo['d']
potential_level = data_apo['oracle_values']['potential_level']
level_bounds = data_apo['oracle_values']['level_bounds']

average_ites = np.full(n_levels + 1, np.nan)
apos = np.full(n_levels + 1, np.nan)
mid_points = np.full(n_levels, np.nan)

for i in range(n_levels + 1):
    average_ites[i] = np.mean(ite[d == i]) * (i > 0)
    apos[i] = np.mean(y0) + average_ites[i]

print(f"Average Individual effects in each group:\n{np.round(average_ites,2)}\n")
print(f"Average Potential Outcomes in each group:\n{np.round(apos,2)}\n")
print(f"Levels and their counts:\n{np.unique(d, return_counts=True)}")

Average Individual effects in each group:
[ 0.    3.44  9.32 10.49]

Average Potential Outcomes in each group:
[210.05 213.49 219.38 220.54]

Levels and their counts:
(array([0., 1., 2., 3.]), array([171, 110, 109, 110]))


As for all [DoubleML](https://docs.doubleml.org/stable/index.html) models, we specify a [DoubleMLData](https://docs.doubleml.org/stable/api/generated/doubleml.data.DoubleMLData.html) object to handle the data.

In [3]:
y = data_apo['y']
x = data_apo['x']
d = data_apo['d']
df_apo = pd.DataFrame(
    np.column_stack((y, d, x)),
    columns=['y', 'd'] + ['x' + str(i) for i in range(data_apo['x'].shape[1])]
)

dml_data = DoubleMLData(df_apo, 'y', 'd')
print(dml_data)


------------------ Data summary      ------------------
Outcome variable: y
Treatment variable(s): ['d']
Covariates: ['x0', 'x1', 'x2', 'x3', 'x4']
Instrument variable(s): None
No. Observations: 500

------------------ DataFrame info    ------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Columns: 7 entries, y to x4
dtypes: float64(7)
memory usage: 27.5 KB



## Basic Tuning Example

At first, we will take a look at a very basic tuning example without much customization.

### Define Nuisance Learners

For our example, we will choose [LightGBM](https://lightgbm.readthedocs.io/en/stable/) learners, which are typical non-parametric choice.

In [4]:
ml_g = LGBMRegressor(random_state=314, verbose=-1)
ml_m = LGBMClassifier(random_state=314, verbose=-1)

### Untuned Model

Now let us take a look at the standard workflow, focusing on a single treatment level and using default hyperparameters.

In [5]:
dml_obj_untuned = DoubleMLAPO(
    dml_data,
    ml_g,
    ml_m,
    treatment_level=treatment_lvl,
)

dml_obj_untuned.fit()
dml_obj_untuned.summary

Unnamed: 0,coef,std err,t,P>|t|,2.5 %,97.5 %
d,211.232659,15.657431,13.490888,1.769555e-41,180.544657,241.920661


### Hyperparametertuning

Now, let us take a look at the basic hyperparametertuning. We will initialize a separate model to compare the results.

In [6]:
dml_obj_tuned = DoubleMLAPO(
    dml_data,
    ml_g,
    ml_m,
    treatment_level=treatment_lvl,
)

The required input for tuning is a parameter space dictionary for the hyperparameters for each learner that should be tuned.
This dictionary should include a callable for each learner you want to have (only a subset is also possible, i.e. only tuning `ml_g`).

The parameter spaces should be a callable and suggest the search spaces via a `trial` object.

Generally, the hyperparameter structure should follow the definitions in [Optuna](https://optuna.org/#key_features), but instead of the objective the hyperparameters have to be specified as a callable. The corresponding DoubleML object then assigns a corresponding objective for each learning using the supplied parameter space.

To keep this example fast and simple, we keep the `n_estimators` fix and only tune a small number of other hyperparameters.

In [7]:
# parameter space for the outcome regression tuning
def ml_g_params(trial):
    return {
        'n_estimators': 100,
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.1),
        'max_depth': trial.suggest_int('max_depth', 2, 5),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
    }

# parameter space for the propensity score tuning
def ml_m_params(trial):
    return {
        'n_estimators': 100,
        'learning_rate': trial.suggest_float('learning_rate', 0.005, 0.1),
        'max_depth': trial.suggest_int('max_depth', 2, 5),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
    }

param_space = {
    'ml_g': ml_g_params,
    'ml_m': ml_m_params
}

To tune the hyperparameters the `tune_ml_models()` with the `ml_param_space` argument should be called.
Further, to define the number of trials and other optuna options you can use the `optuna_setttings` argument.

In [8]:
optuna_settings = {
    'n_trials': 100,
    'show_progress_bar': True,
}

dml_obj_tuned.tune_ml_models(
    ml_param_space=param_space,
    optuna_settings=optuna_settings,
)

TypeError: _dml_tune_optuna() missing 1 required positional argument: 'params_name'

Per default, the model will set the best hyperparameters automatically (identical hyperparameters for each fold), and you can directly call the `fit()` method afterwards.

In [None]:
dml_obj_tuned.fit()
dml_obj_tuned.summary

**Remark**: Even if the initialization and tuning only requires the learners `ml_g` and `ml_m`, the models in the `irm` submodule generally, copy the learner for `ml_g` and fit different response surfaces for treatment and control (or not-treatment) groups. These different learners are tuned separately but with the same parameter space. To see which parameter spaces can be tuned you can take a look at the `params_names` property.

In this example, we specified the parameter spaces for `ml_m` and `ml_g`, but actually three sets of hyperparameters were tuned, i.e. `ml_m`, `ml_g_d_lvl0` and `ml_g_d_lvl1` (two response surfaces for the outcome).

In [None]:
dml_obj_tuned.params_names

Each hyperparameter combination is set for each fold.

In [None]:
dml_obj_tuned.params

### Comparison
 
Let us compare the results for both models. If we take a look at the predictive performance of the learners, the main difference can be observed in the log loss of the propensity score `ml_m`

In [None]:
dml_obj_untuned.evaluate_learners()

In [None]:
dml_obj_tuned.evaluate_learners()

As a result the standard error is reduced and confidence intervals are much tighter.

In [None]:
theta_untuned = dml_obj_untuned.coef[0]
theta_tuned = dml_obj_tuned.coef[0]
ci_untuned = dml_obj_untuned.confint()
ci_tuned = dml_obj_tuned.confint()

# Create comparison dataframe
comparison_data = {
    'Model': ['Untuned', 'Tuned'],
    'theta': [theta_untuned, theta_tuned],
    'ci_lower': [ci_untuned.iloc[0, 0], ci_tuned.iloc[0, 0]],
    'ci_upper': [ci_untuned.iloc[0, 1], ci_tuned.iloc[0, 1]]
}
df_comparison = pd.DataFrame(comparison_data)

print(f"\nTrue APO at treatment level {treatment_lvl}: {apos[int(treatment_lvl)]:.4f}\n")
print(df_comparison.to_string(index=False))

plt.figure(figsize=(10, 6))
plt.errorbar(0, df_comparison.loc[0, 'theta'], 
             yerr=[[df_comparison.loc[0, 'theta'] - df_comparison.loc[0, 'ci_lower']], 
                   [df_comparison.loc[0, 'ci_upper'] - df_comparison.loc[0, 'theta']]], 
             fmt='o', capsize=5, capthick=2, ecolor=palette[0], color=palette[0], 
             label='Untuned', markersize=10, zorder=2)
plt.errorbar(1, df_comparison.loc[1, 'theta'], 
             yerr=[[df_comparison.loc[1, 'theta'] - df_comparison.loc[1, 'ci_lower']], 
                   [df_comparison.loc[1, 'ci_upper'] - df_comparison.loc[1, 'theta']]], 
             fmt='o', capsize=5, capthick=2, ecolor=palette[1], color=palette[1], 
             label='Tuned', markersize=10, zorder=2)
plt.axhline(y=apos[int(treatment_lvl)], color=palette[4], linestyle='--', 
            linewidth=2, label='True APO', zorder=1)

plt.title(f'Estimated APO Coefficients with and without Tuning for Treatment Level {treatment_lvl}')
plt.ylabel('Coefficient Value')
plt.xticks([0, 1], ['Untuned', 'Tuned'])
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Detailed Hyperparameter Tuning Guide

In this section, we explore tuning options in more detail and employ a more complicated learning pipeline.

### Define Nuisance Learners

For our example, we will choose [Pipelines](https://scikit-learn.org/stable/modules/compose.html#pipeline) to generate a complex [StackingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingRegressor.html) or [StackingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html).

In [None]:
base_regressors = [
    ('linear_regression', LinearRegression()),
    ('lgbm', LGBMRegressor(random_state=42, verbose=-1))
]

stacking_regressor = StackingRegressor(
    estimators=base_regressors,
    final_estimator=Ridge()
)

ml_g_pipeline = Pipeline([
    ('scaler', RobustScaler()),
    ('stacking', stacking_regressor)
])

In [None]:
base_classifiers = [
    ('logistic_regression', LogisticRegression(max_iter=1000, random_state=42)),
    ('lgbm', LGBMClassifier(random_state=42, verbose=-1))
]

stacking_classifier = StackingClassifier(
    estimators=base_classifiers,
    final_estimator=LogisticRegression(),
)

ml_m_pipeline = Pipeline([
    ('scaler', RobustScaler()),
    ('stacking', stacking_classifier)
])

### Untuned Model with Pipeline

Now let us take a look at the standard workflow, focusing on a single treatment level and using default hyperparameters.

In [None]:
dml_obj_untuned_pipeline = DoubleMLAPO(
    dml_data,
    ml_g_pipeline,
    ml_m_pipeline,
    treatment_level=treatment_lvl,
)

dml_obj_untuned_pipeline.fit()
dml_obj_untuned_pipeline.summary

### Hyperparametertuning with Pipelines

Now, let us take a look at more complex. Again, we will initialize a separate model to compare the results.

In [None]:
dml_obj_tuned_pipeline = DoubleMLAPO(
    dml_data,
    ml_g_pipeline,
    ml_m_pipeline,
    treatment_level=treatment_lvl,
)

As before the tuning input is a parameter space dictionary for the hyperparameters for each learner that should be tuned.
This dictionary should include a callable for each learner you want to have (only a subset is also possible, i.e. only tuning `ml_g`).

Since we have now a much more complicated learner the tuning inputs have to passed correctly into the pipeline.

In [None]:
# parameter space for the outcome regression tuning
def ml_g_params_pipeline(trial):
    return {
        'stacking__lgbm__n_estimators': 100,
        'stacking__lgbm__learning_rate': trial.suggest_float('learning_rate', 0.005, 0.1),
        'stacking__lgbm__max_depth': trial.suggest_int('max_depth', 2, 5),
        'stacking__lgbm__min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
        'stacking__final_estimator__alpha': trial.suggest_float('alpha', 0.001, 10.0, log=True),
    }

# parameter space for the propensity score tuning
def ml_m_params_pipeline(trial):
    return {
        'stacking__lgbm__n_estimators': 100,
        'stacking__lgbm__learning_rate': trial.suggest_float('learning_rate', 0.005, 0.1),
        'stacking__lgbm__max_depth': trial.suggest_int('max_depth', 2, 5),
        'stacking__lgbm__min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
        'stacking__final_estimator__C': trial.suggest_float('C', 0.01, 100.0, log=True),
        'stacking__final_estimator__max_iter': 1000,
    }

param_space_pipeline = {
    'ml_g': ml_g_params_pipeline,
    'ml_m': ml_m_params_pipeline
}

As before, we can pass the arguments for optuna via `optuna_settings`.  For possible option please take a look at the [Optuna Documenation](https://optuna.readthedocs.io/en/stable/index.html). For each learner you can pass local settings which will override the settings.

Here, we will reduce the number of trials for `ml_g` as it did already perform quite well before.
In principle, we could also use different [samplers](https://optuna.readthedocs.io/en/stable/reference/samplers/index.html), but generally we recommend to use the [TPESampler](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html), which is used by default.

In [None]:
optuna_settings_pipeline = {
    'n_trials': 100,
    'show_progress_bar': True,
    'ml_g': {
        'n_trials': 50
    }
}

As before, we can tune the hyperparameters via the `tune_ml_models()` method. If we would like to inspect the optuna.study results, we can return all tuning results via the `return_tune_res` argument.

We will have a detailed look at the returned results later in the notebook.

In [None]:
tuning_results = dml_obj_tuned.tune_ml_models(
    ml_param_space=param_space_pipeline,
    optuna_settings=optuna_settings_pipeline,
    return_tune_res=True,
)

In [None]:
dml_obj_tuned_pipeline.fit()
dml_obj_tuned_pipeline.summary

**Remark**: All settings (`optuna_settings` and `ml_param_space`) can also be set on the `params_names` level instead of the `learner_names` level, i.e. `ml_g_d_lvl1` instead of `ml_g`. Generally, more specific settings will override more general settings.

### Comparison

In [None]:
# Extract coefficients and confidence intervals for all models
theta_untuned = dml_obj_untuned.coef[0]
theta_tuned = dml_obj_tuned.coef[0]
theta_untuned_pipeline = dml_obj_untuned_pipeline.coef[0]
theta_tuned_pipeline = dml_obj_tuned_pipeline.coef[0]

ci_untuned = dml_obj_untuned.confint()
ci_tuned = dml_obj_tuned.confint()
ci_untuned_pipeline = dml_obj_untuned_pipeline.confint()
ci_tuned_pipeline = dml_obj_tuned_pipeline.confint()

# Create comparison dataframe
comparison_data = {
    'Model': ['Untuned', 'Tuned', 'Untuned Pipeline', 'Tuned Pipeline'],
    'theta': [theta_untuned, theta_tuned, theta_untuned_pipeline, theta_tuned_pipeline],
    'ci_lower': [ci_untuned.iloc[0, 0], ci_tuned.iloc[0, 0], 
                 ci_untuned_pipeline.iloc[0, 0], ci_tuned_pipeline.iloc[0, 0]],
    'ci_upper': [ci_untuned.iloc[0, 1], ci_tuned.iloc[0, 1],
                 ci_untuned_pipeline.iloc[0, 1], ci_tuned_pipeline.iloc[0, 1]]
}
df_comparison = pd.DataFrame(comparison_data)

print(f"\nTrue APO at treatment level {treatment_lvl}: {apos[int(treatment_lvl)]:.4f}\n")
print(df_comparison.to_string(index=False))

# Create plot with all 4 models
plt.figure(figsize=(12, 6))

for i in range(len(df_comparison)):
    plt.errorbar(i, df_comparison.loc[i, 'theta'], 
                 yerr=[[df_comparison.loc[i, 'theta'] - df_comparison.loc[i, 'ci_lower']], 
                       [df_comparison.loc[i, 'ci_upper'] - df_comparison.loc[i, 'theta']]], 
                 fmt='o', capsize=5, capthick=2, ecolor=palette[i], color=palette[i], 
                 label=df_comparison.loc[i, 'Model'], markersize=10, zorder=2)

plt.axhline(y=apos[int(treatment_lvl)], color=palette[4], linestyle='--', 
            linewidth=2, label='True APO', zorder=1)

plt.title('Estimated APO Coefficients: Comparison Across All Models')
plt.ylabel('Coefficient Value')
plt.xticks(range(4), df_comparison['Model'], rotation=15, ha='right')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Detailed Tuning Result Analysis

The `tune_ml_models()` method creates several [Optuna Studies](https://optuna.readthedocs.io/en/stable/reference/study.html), which can be inspected in detail via the returned results.

The results are a list of dictionaries, which contain a corresponding `DMLOptunaResult` for each treatment variable on the `param_names` level, i.e. for each [Optuna Study](https://optuna.readthedocs.io/en/stable/reference/study.html) object a separate `DMLOptunaResult` is constructed.

In [None]:
# Optuna results for the single treatment
print(tuning_results[0])

In this example, we take a more detailed look in the tuning of `ml_m`

In [None]:
print(tuning_results[0]['ml_m'])

As we have access to the saved [Optuna Study](https://optuna.readthedocs.io/en/stable/reference/study.html) object, it is possible to access all trials and hyperparameter combinations

In [None]:
ml_m_study = tuning_results[0]['ml_m'].study_
ml_m_study.trials_dataframe()

 Additionally, we can access all [Optuna visualization options](https://optuna.readthedocs.io/en/stable/reference/visualization/index.html)

In [None]:
fig = optuna.visualization.plot_optimization_history(ml_m_study)
show(fig)

In [None]:
fig = optuna.visualization.plot_parallel_coordinate(ml_m_study)
show(fig)

In [None]:
fig = optuna.visualization.plot_param_importances(ml_m_study)
show(fig)