# Hyperparameter Optimization with Optuna

Welcome to the Optuna-based hyperparameter optimization tutorial! In this interactive notebook, you will explore world of hyperparameter tuning for a Convolutional Neural Network (CNN) specifically aimed at image classification using the CIFAR-10 dataset. Hyperparameter optimization is pivotal in enhancing model performance, making your models more accurate and efficient.

Optuna, a robust and versatile library, plays a central role in automating and streamlining this process. It empowers you to navigate through complex hyperparameter spaces with ease. In this tutorial, you will engage with Optuna's core functionalities, and you'll also have the opportunity to construct a flexible CNN architecture. This adaptable design is essential for understanding how models can be fine-tuned effortlessly to suit various hyperparameter configurations.

Throughout this session, you will:
- Learn how to set up and execute an Optuna study, incorporating all essential elements required for effective hyperparameter optimization.
- Perform a thorough analysis of the results to evaluate how different hyperparameters influence model performance, gaining insights into their practical impact.

Additionally, this tutorial includes an optional section where you will compare two prevalent methods of hyperparameter optimization: Optuna's default sampling method (Tree-structured Parzen Estimator, or TPE) and the traditional Grid Search method. This comparison will not only highlight the strengths of Optuna but also provide a clearer perspective on how it can outperform conventional optimization techniques.


In [3]:
import torch
import torch.nn as nn 
import torch.optim as optim
import optuna
import matplotlib.pyplot as plt
import torch.nn.functional as F
from pprint import pprint
import helper

helper.set_seed(15)


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
device = torch.device("mps" if torch.mps.is_available() else "cpu")
print(device)

mps


## Hyperparameter Optimization for CNNs on CIFAR-10

In this section, you explore the vital task of finding the optimal hyperparameters for a Convolutional Neural Network (CNN) tailored to the CIFAR-10 dataset. 
Utilizing Optuna, a sophisticated framework for hyperparameter optimization, your goal is to streamline and automate the process, ensuring efficiency and effectiveness. 
The selection of hyperparameters is notably intensive computationally and depends on various factors including the architecture of the model, the dataset characteristics, and the specific training processes involved. These elements, collectively and individually, have significant impacts on the performance outcomes of the model.

### Defining a Flexible CNN Architecture

The model architecture here is deliberately designed to be flexible, accommodating variability in its layers which is pivotal for adapting to different hyperparameter configurations suggested by Optuna during optimization trials.  
The architecture is defined in a modular manner, allowing for easy adjustments and experimentation with different layer configurations, activation functions, and other hyperparameters. 

`FlexibleCNN` is a class that encapsulates the architecture of the CNN model:

* **`__init__`**: The constructor initializes the model's feature extraction layers.
>    * It constructs a series of convolutional blocks based on the `n_layers` parameter. Each block is a sequence of `nn.Conv2d`, `nn.ReLU`, and `nn.MaxPool2d`.
>    * The `in_channels` for each block is set to the `out_channels` of the preceding block to ensure a seamless data flow.
>    * All blocks are combined into a single `nn.Sequential` module assigned to the `.features` attribute, which handles feature extraction.
>    * The classifier, `.classifier`, is initially set to `None` and will be constructed dynamically later.
 * **`_create_classifier`**: This helper method dynamically builds the classifier part of the network.
>    * It's called during the first forward pass once the input size for the linear layers is known.
 * **`forward`**: This method defines the forward pass of the model.
>    * The input `x` first passes through the `.features` layers.
>    * The output from the feature extractor is flattened to determine the input size for the classifier.
>    * If the `.classifier` has not been created yet, it calls `_create_classifier` to build it on the fly.
>    * Finally, the flattened data is passed through the `.classifier` to produce the final output.

In [5]:
class FlexibleCNN(nn.Module):
    """
    A flexible Convolutional Neural Network with a dynamically created classifier.

    This CNN's architecture is defined by the provided hyperparameters,
    allowing for a variable number of convolutional layers. The classifier
    (fully connected layers) is constructed during the first forward pass
    to adapt to the output size of the convolutional feature extractor.
    """
    def __init__(self, n_layers, n_filters, kernel_sizes, dropout_rate, fc_size):
        """
        Initializes the feature extraction part of the CNN.

        Args:
            n_layers: The number of convolutional blocks to create.
            n_filters: A list of integers specifying the number of output
                       filters for each convolutional block.
            kernel_sizes: A list of integers specifying the kernel size for
                          each convolutional layer.
            dropout_rate: The dropout probability to be used in the classifier.
            fc_size: The number of neurons in the hidden fully connected layer.
        """
        super(FlexibleCNN, self).__init__()

        #Initialize an empty list to hold the convolution blocks
        blocks = []
        #Set the initial number of input channels for RGB images
        in_channels = 3

        #Loop to construct each convolutional block
        for i in range(n_layers):
            #Get the parameters for current convolution layer
            out_channels = n_filters[i]
            kernel_size = kernel_sizes[i]

            #Calculate padding to maintain the input spatial dimension('same' padding)
            padding = (kernel_size-1) // 2

            #Define a block as a sequence of conv, ReLU and MaxPool. layers
            block = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size, padding=padding),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )

            #Add the newly created block to the list
            blocks.append(block)

            #Update the number of input channels for the next block
            in_channels = out_channels

        #Combine all blocks inot a single feature extractor module
        self.features = nn.Sequential(*blocks)

        #Store hyperparameters needed for building the classifier later
        self.dropout_rate = dropout_rate
        self.fc_size = fc_size

        #The classifier will be initialized dynamically in the forward pass
        self.classifier = None

    def _create_classifier(self, flattened_size, device):
        """
        Dynamically creates and initializes the classifier part of the network.

        This helper method is called during the first forward pass to build the
        fully connected layers based on the feature map size from the
        convolutional base.

        Args:
            flattened_size: The number of input features for the first linear
                            layer, determined from the flattened feature map.
            device: The device to which the new classifier layers should be moved.
        """

        #Define the classifier's architecture
        self.classifier = nn.Sequential(
            nn.Dropout(self.dropout_rate),
            nn.Linear(flattened_size, self.fc_size),
            nn.ReLU(inplace=True),
            nn.Dropout(self.dropout_rate),
            nn.Linear(self.fc_size, 100)
        ).to(device)

    def forward(self, x):
        """
        Defines the forward pass of the model.

        Args:
            x: The input tensor of shape (batch_size, channels, height, width).

        Returns:
            The output logits from the classifier.
        """
        #Get the device of the input tensor to ensure consistency
        device = x.device

        #Pass the input through the feature extraction layer
        x=self.features(x)

        #Flatten the feature map to prepare it for the fully connected layers
        flattened = torch.flatten(x, 1)
        flattened_size = flattened.size(1)

        #If the classifier has not been created yet, initialize it
        if self.classifier is None:
            self._create_classifier(flattened_size, device)

        #pass the flattened feature through the classifier to get the final output
        return self.classifier(flattened)

## Defining the Optuna Objective Function

The objective function is the core of the hyperparameter optimization process, being the function that Optuna will repeatedly call to evaluate different hyperparameter configurations.
This function encapsulates the entire training and evaluation process, including the definition of the CNN model architecture, the optimizer, the data loaders, the training loop, and the evaluation metrics.
Within this function, you define the search space for hyperparameters using `trial.suggest_*` methods, which allow Optuna to sample hyperparameters from a defined range or set of values. 
For a full list of available `suggest_*` methods, you can refer to the [Optuna documentation](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html).

*The objective function is designed to return a single scalar value, which represents the performance of the model for the given hyperparameters*. 
In your case, the aim to maximize the accuracy of the model on the validation set, which is computed using the `evaluate_accuracy` function.

**Dynamic Layer Initialization**: A noteworthy addition to this objective function is the initialization step involving a dummy input. Because the `FlexibleCNN` creates its classifier layers dynamically during the first forward pass, these parameters do not exist immediately after the model is instantiated.
- **Why use a dummy input?** Passing data through the model forces it to calculate the flattened feature size and build the classifier layers. You must do this *before* defining the optimizer so that `model.parameters()` includes the classifier weights. Otherwise, the optimizer would only track the feature extractor, leaving the classifier untrained.
>
- **Why these dimensions?** The tensor `torch.randn(1, 3, 32, 32)` is used to mimic the structure of the CIFAR-10 dataset. It represents a single image (batch size of 1) with 3 color channels (RGB) and a resolution of `32x32` pixels.

Observe that some hyperparameters are defined as fixed values, such as the number of epochs, the batch size, and the learning rate.

In [8]:
def objective(trial, device):
    """
    Defines the objective function for hyperparameter optimization using Optuna.

    For each trial, this function samples a set of hyperparameters,
    constructs a model, trains it for a fixed number of epochs, evaluates
    its performance on a validation set, and returns the accuracy. Optuna
    uses the returned accuracy to guide its search for the best
    hyperparameter combination.

    Args:
        trial: An Optuna `Trial` object, used to sample hyperparameters.
        device: The device ('cpu' or 'cuda') for model training and evaluation.

    Returns:
        The validation accuracy of the trained model as a float.
    """

    #Sample hyperparameters for the feature extractor using the Optuna trial
    n_layers = trial.suggest_int("n_layer", 1, 3)
    n_filters=[trial.suggest_int(f"n_filter_{i}", 16, 128) for i in range(n_layers)]
    kernal_sizes = [trial.suggest_categorical(f"kernel_size_{i}", [3,5]) for i in range(n_layers)]

    #Sample hyperparameters for the classifier
    dropout_rate = trial.suggest_float("dropout_rate", 0.1, 0.5)
    fc_size = trial.suggest_int("fc_size", 64, 256)

    #Instantiate the model with the sampled hyperparameters
    model = FlexibleCNN(n_layers, n_filters, kernal_sizes, dropout_rate, fc_size).to(device)

    #Initialize the dynamic classifier layer bt passing a dummy input through the model
    #This ensure all parameters are instantiated before the optimizer is defined
    dummy_input = torch.rand(1,3,32, 32).to(device)
    model(dummy_input)

    #Define fixed training parameters: lr, loss function and optimizer
    learning_rate = 0.001
    loss_fcn = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(),lr=learning_rate)

    #Define fixed daata loading parameters and crete data loader
    batch_size=128
    train_loader, val_loader = helper.get_dataset_dataloaders(batch_size=batch_size)

    #Define the fixed number of epochs for training
    n_epochs = 10

    #Train model using a helper function
    helper.train_model(
        model=model,
        optimizer=optimizer,
        loss_fcn=loss_fcn,
        train_dataloader=train_loader,
        n_epochs=n_epochs,
        device=device
    )

    #Evaluate the trained model's accuracy on the validaation set
    accuracy = helper.evaluate_accuracy(model, val_loader, device)

    return accuracy

## Running the Optuna Study

Once that the objective function is defined, an Optuna study is created to manage the hyperparameter optimization process.
The study is responsible for running the objective function multiple times with different hyperparameter configurations, allowing Optuna to explore the search space and find the best hyperparameters.
In this case, your goal is to **maximize the accuracy** of the CNN model on the CIFAR-10 dataset, this is why we use `direction='maximize'` when creating the study.
The `optimize` method of the study is called to start the optimization process, which will run the objective function for a defined number of trials.

A lambda function is used to pass the device to the objective function, allowing the model to be trained on the specified device
*Note*: you can also pass other parameters to the objective function using the lambda function, if needed.

**NOTE:** the code below will take about 8 minutes to run.

In [9]:
#Create a study object and optimie the objective function
study = optuna.create_study(direction='maximize') #The goal in this case is to maximize accuracy

#Start the optimization process 
n_trials = 20
study.optimize(lambda trial: objective(trial, device), n_trials=n_trials)

[32m[I 2026-01-30 11:06:46,717][0m A new study created in memory with name: no-name-be59388d-7752-4469-aab3-f2b03f6c4c6f[0m
  entry = pickle.load(f, encoding="latin1")
Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.37s/it] 

Epoch 5 - Train Loss: 1.0618


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:14<00:00,  1.49s/it]


Epoch 10 - Train Loss: 0.6576
Training complete!



[32m[I 2026-01-30 11:07:03,158][0m Trial 0 finished with value: 0.5745 and parameters: {'n_layer': 1, 'n_filter_0': 120, 'kernel_size_0': 3, 'dropout_rate': 0.25596530447594396, 'fc_size': 187}. Best is trial 0 with value: 0.5745.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:06<00:04,  1.05s/it]  

Epoch 5 - Train Loss: 1.6073


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:11<00:00,  1.12s/it]


Epoch 10 - Train Loss: 1.3762
Training complete!



[32m[I 2026-01-30 11:07:15,534][0m Trial 1 finished with value: 0.4955 and parameters: {'n_layer': 3, 'n_filter_0': 20, 'n_filter_1': 36, 'n_filter_2': 37, 'kernel_size_0': 3, 'kernel_size_1': 3, 'kernel_size_2': 5, 'dropout_rate': 0.2963825816538338, 'fc_size': 74}. Best is trial 0 with value: 0.5745.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.40s/it]  

Epoch 5 - Train Loss: 1.4369


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:14<00:00,  1.46s/it]


Epoch 10 - Train Loss: 1.2246
Training complete!



[32m[I 2026-01-30 11:07:31,201][0m Trial 2 finished with value: 0.57 and parameters: {'n_layer': 2, 'n_filter_0': 54, 'n_filter_1': 16, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.446500524419426, 'fc_size': 255}. Best is trial 0 with value: 0.5745.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:08<00:05,  1.43s/it]  

Epoch 5 - Train Loss: 1.3812


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:15<00:00,  1.54s/it]


Epoch 10 - Train Loss: 1.0123
Training complete!



[32m[I 2026-01-30 11:07:47,829][0m Trial 3 finished with value: 0.58 and parameters: {'n_layer': 3, 'n_filter_0': 56, 'n_filter_1': 55, 'n_filter_2': 82, 'kernel_size_0': 3, 'kernel_size_1': 3, 'kernel_size_2': 5, 'dropout_rate': 0.2685762334254175, 'fc_size': 188}. Best is trial 3 with value: 0.58.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.40s/it]  

Epoch 5 - Train Loss: 1.0785


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:14<00:00,  1.47s/it]


Epoch 10 - Train Loss: 0.6771
Training complete!



[32m[I 2026-01-30 11:08:03,606][0m Trial 4 finished with value: 0.589 and parameters: {'n_layer': 1, 'n_filter_0': 89, 'kernel_size_0': 5, 'dropout_rate': 0.2635541398295715, 'fc_size': 165}. Best is trial 4 with value: 0.589.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:16<00:12,  3.00s/it]  

Epoch 5 - Train Loss: 1.3549


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:32<00:00,  3.28s/it]


Epoch 10 - Train Loss: 1.0020
Training complete!



[32m[I 2026-01-30 11:08:37,363][0m Trial 5 finished with value: 0.595 and parameters: {'n_layer': 3, 'n_filter_0': 118, 'n_filter_1': 84, 'n_filter_2': 86, 'kernel_size_0': 5, 'kernel_size_1': 5, 'kernel_size_2': 3, 'dropout_rate': 0.2479710439634352, 'fc_size': 154}. Best is trial 5 with value: 0.595.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.40s/it]  

Epoch 5 - Train Loss: 1.4916


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:15<00:00,  1.51s/it]


Epoch 10 - Train Loss: 1.1903
Training complete!



[32m[I 2026-01-30 11:08:53,437][0m Trial 6 finished with value: 0.556 and parameters: {'n_layer': 3, 'n_filter_0': 125, 'n_filter_1': 42, 'n_filter_2': 95, 'kernel_size_0': 3, 'kernel_size_1': 3, 'kernel_size_2': 3, 'dropout_rate': 0.4468695081428332, 'fc_size': 170}. Best is trial 5 with value: 0.595.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:13<00:09,  2.46s/it]  

Epoch 5 - Train Loss: 1.2153


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:26<00:00,  2.67s/it]


Epoch 10 - Train Loss: 0.7674
Training complete!



[32m[I 2026-01-30 11:09:21,238][0m Trial 7 finished with value: 0.616 and parameters: {'n_layer': 2, 'n_filter_0': 97, 'n_filter_1': 94, 'kernel_size_0': 3, 'kernel_size_1': 5, 'dropout_rate': 0.2307979761624401, 'fc_size': 219}. Best is trial 7 with value: 0.616.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:09<00:06,  1.70s/it]  

Epoch 5 - Train Loss: 1.2993


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:18<00:00,  1.85s/it]


Epoch 10 - Train Loss: 0.9408
Training complete!



[32m[I 2026-01-30 11:09:40,799][0m Trial 8 finished with value: 0.621 and parameters: {'n_layer': 2, 'n_filter_0': 84, 'n_filter_1': 35, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.24824192835781644, 'fc_size': 232}. Best is trial 8 with value: 0.621.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.36s/it]  

Epoch 5 - Train Loss: 1.3074


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:14<00:00,  1.49s/it]


Epoch 10 - Train Loss: 0.9993
Training complete!



[32m[I 2026-01-30 11:09:56,628][0m Trial 9 finished with value: 0.5655 and parameters: {'n_layer': 2, 'n_filter_0': 116, 'n_filter_1': 42, 'kernel_size_0': 3, 'kernel_size_1': 3, 'dropout_rate': 0.21165355874796218, 'fc_size': 171}. Best is trial 8 with value: 0.621.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.36s/it]  

Epoch 5 - Train Loss: 1.0524


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:14<00:00,  1.49s/it]


Epoch 10 - Train Loss: 0.6011
Training complete!



[32m[I 2026-01-30 11:10:12,340][0m Trial 10 finished with value: 0.6015 and parameters: {'n_layer': 1, 'n_filter_0': 71, 'kernel_size_0': 5, 'dropout_rate': 0.11510774472990823, 'fc_size': 118}. Best is trial 8 with value: 0.621.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:16<00:11,  2.99s/it]  

Epoch 5 - Train Loss: 1.0675


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:32<00:00,  3.26s/it]


Epoch 10 - Train Loss: 0.5488
Training complete!



[32m[I 2026-01-30 11:10:45,833][0m Trial 11 finished with value: 0.6195 and parameters: {'n_layer': 2, 'n_filter_0': 95, 'n_filter_1': 116, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.15041115624979712, 'fc_size': 243}. Best is trial 8 with value: 0.621.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:16<00:11,  2.95s/it]  

Epoch 5 - Train Loss: 1.0821


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:32<00:00,  3.21s/it]


Epoch 10 - Train Loss: 0.5347
Training complete!



[32m[I 2026-01-30 11:11:19,046][0m Trial 12 finished with value: 0.6255 and parameters: {'n_layer': 2, 'n_filter_0': 90, 'n_filter_1': 123, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.12530665372694055, 'fc_size': 251}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:14<00:10,  2.69s/it]  

Epoch 5 - Train Loss: 1.1573


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:29<00:00,  2.91s/it]


Epoch 10 - Train Loss: 0.7387
Training complete!



[32m[I 2026-01-30 11:11:49,091][0m Trial 13 finished with value: 0.619 and parameters: {'n_layer': 2, 'n_filter_0': 77, 'n_filter_1': 124, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.35688679734193873, 'fc_size': 219}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:08<00:06,  1.54s/it]  

Epoch 5 - Train Loss: 1.2605


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:16<00:00,  1.69s/it]


Epoch 10 - Train Loss: 0.8807
Training complete!



[32m[I 2026-01-30 11:12:07,064][0m Trial 14 finished with value: 0.5985 and parameters: {'n_layer': 2, 'n_filter_0': 59, 'n_filter_1': 73, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.3560107539753389, 'fc_size': 225}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:06<00:04,  1.19s/it]  

Epoch 5 - Train Loss: 1.1487


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:13<00:00,  1.30s/it]


Epoch 10 - Train Loss: 0.7444
Training complete!



[32m[I 2026-01-30 11:12:20,997][0m Trial 15 finished with value: 0.544 and parameters: {'n_layer': 1, 'n_filter_0': 30, 'kernel_size_0': 5, 'dropout_rate': 0.1736941549425369, 'fc_size': 239}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:17<00:13,  3.27s/it]  

Epoch 5 - Train Loss: 1.1342


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:35<00:00,  3.56s/it]


Epoch 10 - Train Loss: 0.6555
Training complete!



[32m[I 2026-01-30 11:12:57,587][0m Trial 16 finished with value: 0.5995 and parameters: {'n_layer': 2, 'n_filter_0': 103, 'n_filter_1': 102, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.10958275363323897, 'fc_size': 201}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.36s/it]  

Epoch 5 - Train Loss: 1.1865


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:14<00:00,  1.48s/it]


Epoch 10 - Train Loss: 0.8291
Training complete!



[32m[I 2026-01-30 11:13:13,222][0m Trial 17 finished with value: 0.594 and parameters: {'n_layer': 1, 'n_filter_0': 80, 'kernel_size_0': 5, 'dropout_rate': 0.3363314813708626, 'fc_size': 126}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:14<00:10,  2.62s/it]  

Epoch 5 - Train Loss: 1.3448


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:28<00:00,  2.85s/it]


Epoch 10 - Train Loss: 0.9055
Training complete!



[32m[I 2026-01-30 11:13:42,647][0m Trial 18 finished with value: 0.593 and parameters: {'n_layer': 3, 'n_filter_0': 106, 'n_filter_1': 62, 'n_filter_2': 128, 'kernel_size_0': 5, 'kernel_size_1': 5, 'kernel_size_2': 5, 'dropout_rate': 0.17688786100126325, 'fc_size': 206}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


Training - Current Epoch: 6:  60%|██████    | 6/10 [00:07<00:05,  1.39s/it]  

Epoch 5 - Train Loss: 1.4231


Training - Current Epoch: 10: 100%|██████████| 10/10 [00:14<00:00,  1.49s/it]


Epoch 10 - Train Loss: 1.1666
Training complete!



[32m[I 2026-01-30 11:13:58,399][0m Trial 19 finished with value: 0.563 and parameters: {'n_layer': 2, 'n_filter_0': 40, 'n_filter_1': 23, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.411496977086487, 'fc_size': 252}. Best is trial 12 with value: 0.6255.[0m


Evaluation complete.


### Analyzing the Results
After the optimization process is complete, you can analyze the results to understand which hyperparameters yielded the best performance.
The `study` object contains a wealth of information about the trials, including the hyperparameters sampled, the corresponding performance metrics, and the best trial.

You can access the full DataFrame of trials using the `trials_dataframe()` method, which provides a comprehensive overview of all the trials conducted during the optimization process.
This DataFrame includes columns for the trial number, hyperparameters, and the objective value (in our case, the accuracy).

To access the best hyperparameters and the best trial, you can use the `best_trial` attributes of the study object.

**Note:** these results may change every time you re-run the training study.

In [10]:
#Extract the dataframe with the result 
df = study.trials_dataframe()
df

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_dropout_rate,params_fc_size,params_kernel_size_0,params_kernel_size_1,params_kernel_size_2,params_n_filter_0,params_n_filter_1,params_n_filter_2,params_n_layer,state
0,0,0.5745,2026-01-30 11:06:46.718465,2026-01-30 11:07:03.158754,0 days 00:00:16.440289,0.255965,187,3,,,120,,,1,COMPLETE
1,1,0.4955,2026-01-30 11:07:03.159221,2026-01-30 11:07:15.534493,0 days 00:00:12.375272,0.296383,74,3,3.0,5.0,20,36.0,37.0,3,COMPLETE
2,2,0.57,2026-01-30 11:07:15.534888,2026-01-30 11:07:31.201139,0 days 00:00:15.666251,0.446501,255,5,5.0,,54,16.0,,2,COMPLETE
3,3,0.58,2026-01-30 11:07:31.201581,2026-01-30 11:07:47.829249,0 days 00:00:16.627668,0.268576,188,3,3.0,5.0,56,55.0,82.0,3,COMPLETE
4,4,0.589,2026-01-30 11:07:47.829597,2026-01-30 11:08:03.606687,0 days 00:00:15.777090,0.263554,165,5,,,89,,,1,COMPLETE
5,5,0.595,2026-01-30 11:08:03.607125,2026-01-30 11:08:37.363217,0 days 00:00:33.756092,0.247971,154,5,5.0,3.0,118,84.0,86.0,3,COMPLETE
6,6,0.556,2026-01-30 11:08:37.363562,2026-01-30 11:08:53.436992,0 days 00:00:16.073430,0.44687,170,3,3.0,3.0,125,42.0,95.0,3,COMPLETE
7,7,0.616,2026-01-30 11:08:53.437292,2026-01-30 11:09:21.238821,0 days 00:00:27.801529,0.230798,219,3,5.0,,97,94.0,,2,COMPLETE
8,8,0.621,2026-01-30 11:09:21.239173,2026-01-30 11:09:40.799672,0 days 00:00:19.560499,0.248242,232,5,5.0,,84,35.0,,2,COMPLETE
9,9,0.5655,2026-01-30 11:09:40.799951,2026-01-30 11:09:56.628444,0 days 00:00:15.828493,0.211654,171,3,3.0,,116,42.0,,2,COMPLETE


In [12]:
#Extract and print the best trial
best_trial = study.best_trial
print("Best Trial:")
print(f"Value Accuracy: {best_trial.value:.4f}")

print("Hyperpaameters:")
print(best_trial.params)

Best Trial:
Value Accuracy: 0.6255
Hyperpaameters:
{'n_layer': 2, 'n_filter_0': 90, 'n_filter_1': 123, 'kernel_size_0': 5, 'kernel_size_1': 5, 'dropout_rate': 0.12530665372694055, 'fc_size': 251}


## Visualizing the Results

Optuna provides several built-in visualization functions to help analyze the results of the hyperparameter optimization process.
These visualizations can provide valuable insights into the optimization process and the impact of different hyperparameters on the model's performance:
- `plot_optimization_history`: This plot shows the optimization history of the objective function, allowing you to see how the performance of the model improved over time. It provides a visual representation of the objective values (in this case, accuracy) across different trials.
 
- `plot_param_importances`: This plot shows the importance of each hyperparameter in the optimization process. It helps identify which hyperparameters had the most significant impact on the model's performance, allowing you to focus on the most influential hyperparameters in future experiments.

- `plot_parallel_coordinate`: This plot visualizes the relationship between different hyperparameters and the objective function. It allows you to see how different hyperparameter configurations affected the model's performance, providing insights into the interactions between hyperparameters and their impact on the objective value.

In [None]:
#Plotting the optimization history
optuna.visualization.matplotlib.plot_optimization_history(study)
plt.title("Optimization History")
plt.show()

#Importance of hyperparameters
optuna.visualization.matplotlib.plot_param_importances(study)
plt.show()

ax = optuna.visualization.matplotlib.plot_parallel_coordinate(
    study, params=['n_layers', 'n_filters_0', 'kernel_size_0', 'dropout_rate', 'fc_size']
)

fig = ax.figure
fig.set_size_inches(12, 6, forward=True)
fig.tight_layout()