# A Quick Introduction to Optuna

This Jupyter notebook goes through the basic usage of Optuna.

- Install Optuna
- Write a training algorithm that involves hyperparameters
  - Read train/valid data
  - Define and train model
  - Evaluate model
- Use Optuna to tune the hyperparameters (hyperparameter optimization, HPO)
- Visualize HPO

## Install `optuna`

Optuna can be installed via `pip` or `conda`.

In [1]:
!pip3 install -U scikit-learn optuna tensorflow keras scikeras




In [4]:
import optuna, sklearn, tensorflow, keras, scikeras

optuna.__version__
print("scikit-learn:", sklearn.__version__)
print("optuna:", optuna.__version__)
print("tensorflow:", tensorflow.__version__)
print("keras:", keras.__version__)
print("scikeras:", scikeras.__version__)

scikit-learn: 1.7.2
optuna: 4.5.0
tensorflow: 2.20.0
keras: 3.11.3
scikeras: 0.13.0


## Optimize Hyperparameters

### Define a simple scikit-learn model

We start with a simple random forest model to classify flowers in the Iris dataset. We define a function called `objective` that encapsulates the whole training process and outputs the accuracy of the model.

In [23]:

import optuna
import numpy as np
import tensorflow as tf
from sklearn import datasets, model_selection, preprocessing
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# --- 1. Model Creation Function ---
def create_model(n_hidden, n_neurons, learning_rate):
    """Creates and compiles a Keras model based on hyperparameters."""
    model = Sequential()
    model.add(Input(shape=(4,)))
    for _ in range(n_hidden):
        model.add(Dense(n_neurons, activation="relu"))
    model.add(Dense(3, activation="softmax"))

    optimizer = Adam(learning_rate=learning_rate)
    # Use categorical_crossentropy since we will one-hot encode the labels
    model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
    return model

# --- 2. Optuna Objective Function with Manual Cross-Validation ---
def objective(trial):
    # --- Load Data ---
    iris = datasets.load_iris()
    X, y = iris.data, iris.target
    X = X.astype(np.float32)

    # --- Define Hyperparameter Search Space ---
    n_hidden = trial.suggest_int("n_hidden", 1, 3)
    n_neurons = trial.suggest_int("n_neurons", 8, 64)
    learning_rate = trial.suggest_float("learning_rate", 1e-4, 1e-1, log=True)
    epochs = 100
    batch_size = 32

    # --- Manual 3-Fold Cross-Validation ---
    kfold = model_selection.StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
    fold_scores = []

    for train_index, val_index in kfold.split(X, y):
        # Split data for this fold
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]

        # Preprocess data (fit on train, transform on val)
        scaler = preprocessing.StandardScaler()
        X_train = scaler.fit_transform(X_train)
        X_val = scaler.transform(X_val)

        # One-hot encode labels for Keras
        y_train_cat = to_categorical(y_train, num_classes=3)
        y_val_cat = to_categorical(y_val, num_classes=3)

        # Create and compile a new model for each fold to ensure fair evaluation
        model = create_model(n_hidden, n_neurons, learning_rate)

        # Train the model
        model.fit(X_train, y_train_cat, epochs=epochs, batch_size=batch_size, verbose=0)

        # Evaluate and store the score for the fold
        loss, accuracy = model.evaluate(X_val, y_val_cat, verbose=0)
        fold_scores.append(accuracy)

    # Return the mean accuracy across all folds
    return np.mean(fold_scores)

if __name__ == "__main__":
    print("--- Starting Hyperparameter Optimization with Manual CV ---")
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=50) # n_trials can be increased for a more thorough search

    # --- Results ---
    print("\n--- Optimization Finished ---")
    best_trial = study.best_trial
    print(f"Optimized Accuracy: {best_trial.value:.4f}")
    print(f"Best hyperparameters: {best_trial.params}")


[I 2025-10-07 21:23:44,348] A new study created in memory with name: no-name-39cec3c3-0f9b-4483-94bc-63298c292358


--- Starting Hyperparameter Optimization with Manual CV ---


[I 2025-10-07 21:24:07,612] Trial 0 finished with value: 0.953333338101705 and parameters: {'n_hidden': 2, 'n_neurons': 46, 'learning_rate': 0.0017262650642339218}. Best is trial 0 with value: 0.953333338101705.
[I 2025-10-07 21:24:28,991] Trial 1 finished with value: 0.9466666579246521 and parameters: {'n_hidden': 2, 'n_neurons': 44, 'learning_rate': 0.0005383638021894106}. Best is trial 0 with value: 0.953333338101705.
[I 2025-10-07 21:24:50,186] Trial 2 finished with value: 0.8666666547457377 and parameters: {'n_hidden': 2, 'n_neurons': 43, 'learning_rate': 0.0003381898145832297}. Best is trial 0 with value: 0.953333338101705.
[I 2025-10-07 21:25:10,718] Trial 3 finished with value: 0.9599999984105428 and parameters: {'n_hidden': 1, 'n_neurons': 17, 'learning_rate': 0.0025014050022983546}. Best is trial 3 with value: 0.9599999984105428.
[I 2025-10-07 21:25:31,956] Trial 4 finished with value: 0.9666666587193807 and parameters: {'n_hidden': 1, 'n_neurons': 53, 'learning_rate': 0.0113

KeyboardInterrupt: 

### Optimize hyperparameters of the model

The hyperparameters of the above algorithm are `n_estimators` and `max_depth` for which we can try different values to see if the model accuracy can be improved. The `objective` function is modified to accept a trial object. This trial has several methods for sampling hyperparameters. We create a study to run the hyperparameter optimization and finally read the best hyperparameters.

It is possible to condition hyperparameters using Python `if` statements. We can for instance include another classifier, a support vector machine, in our HPO and define hyperparameters specific to the random forest model and the support vector machine.

### Plotting the study

Plotting the optimization history of the study.

In [24]:
optuna.visualization.plot_optimization_history(study)

Plotting the accuracies for each hyperparameter for each trial.

In [21]:
optuna.visualization.plot_slice(study)

A slice plot in Optuna is a visualization that helps you understand how individual hyperparameters affect the objective value (e.g., accuracy, loss, AUC) across all the trials.

Here’s the breakdown:

Meaning of Slice Plot

* Each subplot corresponds to one hyperparameter.

* On the x-axis → values of that hyperparameter tried in different trials.

* On the y-axis → the corresponding objective values (the metric you are optimizing).

Each dot = one trial’s result.

It helps spot:

* Trends → e.g., higher learning rate tends to give worse loss.

* Ranges that work well → clusters of good performance.

* Interactions (when viewed with other plots like contour plots).

Plotting the accuracy surface for the hyperparameters involved in the random forest model.

A contour plot in Optuna is a visualization that shows how two hyperparameters interact and how their combinations affect the objective value.

**Meaning of Contour Plot**

* X-axis & Y-axis → two selected hyperparameters.

* Colors (contours) → represent the objective value (e.g., accuracy, loss).

* Darker/warmer colors = better (depending on whether you minimize or maximize).

* Each trial is also shown as a point on the plot.

**What It Tells You**

* Best regions of hyperparameter space → where the optimal values cluster.
* Interactions → whether two parameters influence each other.
Example: A learning rate of 0.01 might only work well if batch size is small.

Search efficiency → shows which areas of the parameter space have been explored and which remain sparse.

**Example**

If tuning: Learning rate (x-axis) and Batch size (y-axis):

* The contour plot might show a “sweet spot” region (e.g., lr=0.001–0.005 and batch size=64–128) where performance is highest.

* In short: a contour plot reveals the relationship between two hyperparameters and the regions where they jointly give good results.

In [22]:
optuna.visualization.plot_contour(study, params=["n_estimators", "max_depth"])