# 4 Hyperparamater tuning using Optuna
Optuna is a hyperparameter tuning package that is integrated in PyDFLT. In this notebook we describe how to use it.

In [1]:
import os
import sys

import yaml

path_to_project = os.path.dirname(os.path.abspath("")) + "/"
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath("decision-focused-learning-codebase"))))

## 4.1 Prepare basic config

We first define a base config with the basic parameter configuration. Specifically include parameters here that you want to remain fixed.

In [2]:
yaml_dir = "configs/knapsack.yml"
base_config = yaml.safe_load(open(yaml_dir))  # base_config is a dictionary, so does not have to be loaded from a .yml

for key, value in base_config.items():
    print(f"{key}: {value}")

model: {'name': 'knapsack_continuous', 'seed': 5}
data: {'name': 'knapsack', 'num_data': 500, 'seed': 5}
runner: {'num_epochs': 5, 'use_wandb': False, 'experiment_name': 'tuning', 'experiments_folder': 'results/', 'seed': 5}
problem: {'train_ratio': 0.75, 'val_ratio': 0.15, 'seed': 5}
decision_maker: {'name': 'differentiable', 'learning_rate': 0.05, 'batch_size': 32, 'seed': 5}


## 4.2 Define search spaces
Now we define the parameters we want to search over. In this example we will search the best `learning_rate` and `batch_size`.

In [3]:
from src.utils.optuna import SearchSpaceConfig

# To conduct a hyperparameter tuning experiment, we first need to define the search spaces
search_space = SearchSpaceConfig(path_to_project + "examples/hparams_search_spaces/test_search_config.yaml")

for key, value in search_space.config.items():  # search_space.config is a nested dictionary
    print(f"{key}: {value}")

Auto-Sklearn cannot be imported.
decision_maker: {'learning_rate': {'type': 'float', 'low': 1e-05, 'high': 0.001, 'log': True}, 'batch_size': {'type': 'int', 'low': 1, 'high': 512, 'log': True}}


## 4.3 Create pruner
A pruner can be used to prune trials that are not promising. This can greatly reduce the time to find good parameters.

In [4]:
import optuna

pruner = optuna.pruners.MedianPruner(
    n_startup_trials=10,  # Number of trials to run before pruning
    n_warmup_steps=15,  # Number of epochs to wait before pruning
    interval_steps=1,  # Interval between pruning checks
    n_min_trials=1,  # Minimum trials required for pruning
)

## 4.4 Specify study
Specify study name and the folder where the results are stored. A database.db file will be created in the folder `OUTPUT_DIR`. While different problems/methods can be all inside the same database, it can be convenient to separate it by problem or method.

In [5]:
from src.utils.optuna import create_study

STUDY_NAME = "test_study"  # Note that optuna will continue with an existing study if the study already exists
OUTPUT_DIR = "hparam_optimization_results/"  # Folder where the study database is stored (at OUTPUT_DIR/STUDY_NAME)
os.makedirs(f"{OUTPUT_DIR}/{STUDY_NAME}", exist_ok=True)  # Ensure that the folder exists
study = create_study(
    STUDY_NAME,
    storage_url=f"sqlite:///{OUTPUT_DIR}/{STUDY_NAME}/database.db",
    prunner=pruner,
)

Storage url: sqlite:///hparam_optimization_results//test_study/database.db


[I 2025-10-13 18:52:28,078] A new study created in RDB with name: test_study


## 4.5 Set up dashboard
We can set up a dashboard that visualizes results when the study is running, opened in the background (using package `optuna-dashboard`). Alternatively open the dashboard through the terminal using: `optuna-dashboard sqlite:///examples/hparam_optimization_results//test_study//database.db`

In [6]:
import os
import socket
import subprocess

# Suppress dashboard warning noise about experimental PedAnova importance evaluator
warning_filters = [
    "ignore::optuna.exceptions.ExperimentalWarning",
    "ignore:PedAnovaImportanceEvaluator computes the importances of params to achieve low `target` values.:UserWarning",
]
env = {
    **os.environ,
    "PYTHONWARNINGS": ",".join(filter(None, [os.environ.get("PYTHONWARNINGS", ""), *warning_filters])),
}


# Find a free port to host the dashboard
def find_free_port():
    with socket.socket() as sock:
        sock.bind(("", 0))
        return sock.getsockname()[1]


port = str(find_free_port())
print(f"Found free port: {port}")

# Set-up dashboard
subprocess.Popen(
    [
        "optuna-dashboard",
        "sqlite:///hparam_optimization_results//test_study//database.db",
        "--port",
        port,
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
    env=env,
)

print(f"See dashboard here: http://localhost:{port}")

Found free port: 49789
See dashboard here: http://localhost:49789


Below we show how the dashboard looks. It updates when the study is running.

In [7]:
from IPython.display import IFrame

IFrame(src=f"http://localhost:{port}", width="100%", height=600)

## 4.6 Define trial

To run hyperparameters tuning with Optuna, we need to define what a 'trial' looks like. We run each configuration for multiple seeds using `run_trial`. Finally, we define `objective_function`, denoting what we use as evaluation metric. In this case this is the validation objective as returned by running the experiment through `run_trial`.

In [8]:
import numpy as np

from src.utils.experiments import run, update_config


# from src.utils.optuna import run_trial
def run_trial(trial, search_space: SearchSpaceConfig, base_config, seeds: list):
    assert len(seeds) > 0, "Provide at least one seed!"

    trial_config = search_space.get_trial_config(trial)
    per_seed_results: list[float] = []

    for seed_idx, seed in enumerate(seeds, start=1):
        config = update_config(base_config=base_config, updates_config=trial_config)

        config["seed"] = seed
        for key in config:
            if isinstance(config[key], dict) and "seed" in config[key]:
                config[key]["seed"] = seed

        result = run(config, optuna_trial=None)
        per_seed_results.append(float(result))

        if trial is not None:
            intermediate = float(np.mean(per_seed_results))
            trial.report(intermediate, seed_idx)
            if trial.should_prune():
                raise optuna.TrialPruned()

    return float(np.mean(per_seed_results))


def objective_fn(trial):
    return run_trial(trial, search_space, base_config, seeds=list(range(3)))

## 4.7 Run tuning
Now we run the hyperparameter tuning.

In [9]:
study.optimize(objective_fn, n_trials=3, timeout=None, catch=(Exception,), show_progress_bar=True)

  0%|          | 0/3 [00:00<?, ?it/s]

Generating data using knapsack
Computing optimal decisions for the entire dataset...
Optimal decisions computed and added to dataset.
Computing optimal objectives for the entire dataset...
Optimal objectives computed and added to dataset.
Shuffling indices before splitting...
Dataset split completed: Train=375, Validation=75, Test=50
Problem mode set to: train
Problem mode set to: train
Epoch 0/5: Starting initial validation...
Problem mode set to: validation
Epoch Results:
validation/abs_regret_mean: 8.9304
validation/rel_regret_mean: 0.4107
validation/objective_mean: 15.7564
validation/item_value_mean: 4.3376
validation/sym_rel_regret_mean: 0.2772
validation/mse_mean: 41.6570
validation/select_item_mean: 0.3410
Initial best validation metric (abs_regret): 8.930413246154785
Starting training...
Epoch: 1/5
Problem mode set to: train
Epoch Results:
train/sym_rel_regret_mean: 0.2488
validation/abs_regret_mean: 8.9304
validation/rel_regret_mean: 0.4107
validation/objective_mean: 15.7564
t

## 4.8 Retrieve and save results
The dashboard summarizes the results. Alternatively results can be retrieved and saved as follows.

In [10]:
import optuna

from src.utils.optuna import save_progress

path = f"sqlite:///{OUTPUT_DIR}/{STUDY_NAME}/database.db"
studies = optuna.study.get_all_study_summaries(storage=path)

for study_summary in studies:
    study_name = study_summary.study_name
    study = optuna.load_study(study_name=study_name, storage=path)

    print(
        f"Results study: {study_name}\n"
        f"Completed trials: {len(study.trials)}\n"
        f"Best value: {study.best_value:.4f}\n"
        f"Best parameters: {study.best_params}"
    )

save_progress(study, search_space, OUTPUT_DIR)

Results study: test_study
Completed trials: 3
Best value: 7.4052
Best parameters: {'learning_rate': 0.00018822825020467675, 'batch_size': 361}
Best configuration saved to hparam_optimization_results/best_config.yaml
