### FlexCNN – A Configurable Convolutional Neural Network

The FlexCNN class defines a customizable CNN architecture that can be adjusted through hyperparameters such as the number of layers, filter sizes, kernel sizes, dropout rate, and fully connected layer size.

The model is composed of two main parts:

1.	Feature extractor – built dynamically in the constructor (__init__), consisting of multiple convolutional blocks. Each block includes a convolutional layer, ReLU activation, and max pooling, allowing the network to learn increasingly abstract spatial features.

2.	Classifier – created automatically during the first forward pass, once the feature map size is known. It includes dropout for regularization, one hidden fully connected layer, and an output layer for classification (e.g., 120 classes).

This design allows the CNN to adapt its structure to different configurations without manually redefining the architecture each time, making it ideal for experiments in hyperparameter tuning and model comparison.

In [None]:
class FlexCNN(nn.Module):
    """
    A flexible Convolutional Neural Network with a dynamically created classifier.

    This CNN's architecture is defined by the provided hyperparameters,
    allowing for a variable number of convolutional layers. The classifier
    (fully connected layers) is constructed during the first forward pass
    to adapt to the output size of the convolutional feature extractor.
    """

    def __init__(self, n_layers, n_filters, kernel_sizes, dropout_rate, fc_size):
        """
        Initialize the feature extraction portion of the network.

        Args:
            n_layers (int): Number of convolutional blocks to build.
            n_filters (list[int]): Output channels for each block.
            kernel_sizes (list[int]): Kernel sizes for the convolutions.
            dropout_rate (float): Dropout probability used in the classifier.
            fc_size (int): Hidden size of the fully connected layer.

        Returns:
            None
        """
        super(FlexCNN, self).__init__()

        # Initialize an empty list to hold the convolutional blocks
        blocks = []
        # Set the initial number of input channels for RGB images
        in_channels = 3

        # Loop to construct each convolutional block
        for i in range(n_layers):

            # Get the parameters for the current convolutional layer
            out_channels = n_filters[i]
            kernel_size = kernel_sizes[i]
            # Calculate padding to maintain the input spatial dimensions ('same' padding)
            padding = (kernel_size - 1) // 2

            # Define a block as a sequence of Conv, ReLU, and MaxPool layers
            block = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size, padding=padding),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
            
            # Add the newly created block to the list
            blocks.append(block)

            # Update the number of input channels for the next block
            in_channels = out_channels

        # Combine all blocks into a single feature extractor module
        self.features = nn.Sequential(*blocks)

        # Store hyperparameters needed for building the classifier later
        self.dropout_rate = dropout_rate
        self.fc_size = fc_size

        # The classifier will be initialized dynamically in the forward pass
        self.classifier = None

    def _create_classifier(self, flattened_size, device):
        """
        Dynamically create and initialize the classifier head.

        Args:
            flattened_size (int): Number of input features for the first linear layer.
            device (torch.device): Device to move the classifier to.

        Returns:
            None
        """
        # Define the classifier's architecture
        self.classifier = nn.Sequential(
            nn.Dropout(self.dropout_rate),
            nn.Linear(flattened_size, self.fc_size),
            nn.ReLU(inplace=True),
            nn.Dropout(self.dropout_rate),
            nn.Linear(self.fc_size, 120)  # Assumes 120 output classes
        ).to(device)

    def forward(self, x):
        """
        Define the forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor of shape ``(batch_size, channels, height, width)``.

        Returns:
            torch.Tensor: Logits for each dog breed.
        """
        # Get the device of the input tensor to ensure consistency
        device = x.device

        # Pass the input through the feature extraction layers
        x = self.features(x)

        # Flatten the feature map to prepare it for the fully connected layers
        flattened = torch.flatten(x, 1)
        flattened_size = flattened.size(1)

        # If the classifier has not been created yet, initialize it
        if self.classifier is None:
            self._create_classifier(flattened_size, device)

        # Pass the flattened features through the classifier to get the final output
        return self.classifier(flattened)



### Optuna Objective Function

The objective() function defines the core of the Optuna optimization process.
For each trial, Optuna samples a new combination of hyperparameters, builds and trains a CNN model using those values, and then evaluates its performance on the validation set.

Specifically, this function:

•	Randomly selects key hyperparameters (e.g., number of layers, filters, kernel sizes, dropout rate, and fully connected layer size).  
•	Constructs a FlexibleCNN model with the sampled configuration and trains it for a fixed number of epochs using the Adam optimizer and cross-entropy loss.  
•	Evaluates the model’s validation accuracy after training and returns it to Optuna.

Optuna then uses this returned accuracy to guide its search, iteratively refining hyperparameter combinations to find the configuration that yields the highest validation performance.

In [1]:
def objective(trial, device):
    """
    Run a single Optuna trial that trains ``FlexCNN`` and reports validation accuracy.

    Args:
        trial (optuna.trial.Trial): Current trial used to suggest hyperparameters.
        device (torch.device): Device on which the model and tensors should live.

    Returns:
        float: Final validation accuracy achieved by the sampled configuration.
    """
    n_layers = trial.suggest_int('n_layers', 1, 3)
    n_filters = [
        trial.suggest_int(f'n_filters_{i}', 16, 128)
        for i in range(n_layers)
    ]
    kernel_sizes = [
        trial.suggest_categorical(f'kernel_size_{i}', [3, 5])
        for i in range(n_layers)
    ]

    dropout_rate = trial.suggest_float('dropout_rate', 0.1, 0.5)
    fc_size = trial.suggest_int('fc_size', 64, 256)

    model = FlexCNN(
        n_layers=n_layers,
        n_filters=n_filters,
        kernel_sizes=kernel_sizes,
        dropout_rate=dropout_rate,
        fc_size=fc_size,
    ).to(device)

    learning_rate = 0.001
    loss_fcn = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    writer = SummaryWriter(log_dir=f"runs/optuna/trial_{trial.number}")

    train_loader = trainloader_with_aug
    val_loader = validationloader

    n_epochs = 10
    helper_utils.train_model(
        model=model,
        optimizer=optimizer,
        train_dataloader=train_loader,
        n_epochs=n_epochs,
        loss_fcn=loss_fcn,
        device=device,
        writer=writer,
    )

    _, accuracy = helper_utils.validate_epoch(
        model=model,
        dataloader=val_loader,
        loss_fcn=loss_fcn,
        device=device,
    )
    writer.add_hparams(
        {
            "n_layers": n_layers,
            "dropout_rate": dropout_rate,
            "fc_size": fc_size,
            "learning_rate": learning_rate,
            **{f"n_filters_{i}": n_filters[i] if i < len(n_filters) else 0 for i in range(3)},
            **{f"kernel_size_{i}": kernel_sizes[i] if i < len(kernel_sizes) else 0 for i in range(3)},
        },
        {
            "val_accuracy": accuracy,
        },
    )
    writer.close()

    return accuracy



In [None]:
# # Create a study object and optimize the objective function
# study = optuna.create_study(direction='maximize')

# # Start the optimization process
# n_trials = 20
# study.optimize(lambda trial: objective(trial, device), n_trials=n_trials)