## Optuna

- How to know what is the right architecture? Like: Number of nodes or hidden layers.
- How to select learning rate, batch size, epochs, droput?

Solution:
- Experimentation. Try everything. For this Optuna comes in handy. To search the optimized hyperparameter it uses bayesian search.

Things we are going to tune:
1. Number of hidden layers.
2. Neurons per layer.
3. Number of epochs.
4. Optimizer.
5. Learning rate.
6. Batch size.
7. Dropout rate.
8. weight decay (lambda).

Layout for Optuna:
- Objective function.
  - Define search space.
  - model init.
  - param init.
  - training loop.
  - evaluation loop.

- Study object.

In [1]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

In [None]:
torch.manual_seed(42)

<torch._C.Generator at 0x7b10c805eb30>

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cuda


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
file_location_train = "/content/drive/MyDrive/PyTorch/Dataset/fashion-mnist_train.csv"
file_location_test = "/content/drive/MyDrive/PyTorch/Dataset/fashion-mnist_test.csv"

In [None]:
train_data = pd.read_csv(file_location_train)
test_data = pd.read_csv(file_location_test)

In [None]:
X_train = train_data.iloc[: , 1 : ].values
y_train = train_data.iloc[:, 0].values
X_test = test_data.iloc[ : , 1 : ].values
y_test = test_data.iloc[ : , 0].values

In [None]:
print(f"Shape of X_train, y_train, X_test, y_test: {X_train.shape, y_train.shape, X_test.shape, y_test.shape}")

Shape of X_train, y_train, X_test, y_test: ((60000, 784), (60000,), (10000, 784), (10000,))


In [None]:
X_train = X_train/255.0
X_test = X_test/255.0

**First we are only going to optimize two parameters:**
- Number of hidden layers.
- Number of nodes per layer.

In [None]:
class CustomDataset(Dataset):

  def __init__(self, feature, label):
    self.features = feature
    self.labels = label

  def __len__(self):
    return self.features.shape[0]

  def __getitem__(self, index):
    return self.features[index], self.labels[index]

In [None]:
train_dataset = CustomDataset(X_train, y_train)
test_dataset = CustomDataset(X_test, y_test)

In [None]:
train_loader = DataLoader(train_dataset, batch_size = 32, shuffle = True, pin_memory = True)
test_loader = DataLoader(test_dataset, batch_size = 32, shuffle = True, pin_memory = True)

In [None]:
len(train_loader)
print(train_dataset.features.shape[1])

784


In [None]:
class MyNN(nn.Module):

  def __init__(self, input_dim, output_dim, num_hidden_layers, neurons_per_layer):
    super().__init__()

    layers = []       # Will store all the layers. Because it is going to change after every loop.

    for i in range(num_hidden_layers):

      layers.append(nn.Linear(input_dim, neurons_per_layer))
      layers.append(nn.BatchNorm1d(neurons_per_layer))
      layers.append(nn.ReLU())
      layers.append(nn.Dropout(0.3))
      input_dim = neurons_per_layer                            # One drawback that for all the hidden layers the number of neurons will be the same.

    layers.append(nn.Linear(neurons_per_layer, output_dim))

    self.model = nn.Sequential(*layers)                       # Here we are using '*' because we want to send the layers one by one not in a list all together.

  def forward(self, x):
    return self.model(x.float())

**Objective function**

In [None]:
def objective(trial):

  # 1. next hyperparameter values from the search space.
  num_hidden_layers = trial.suggest_int("num_hidden_layers", 1, 5)
  neurons_per_layer = trial.suggest_int("neurons_per_layer", 8, 128, step = 8)

  # 2. Model init
  input_dim = train_dataset.features.shape[1]
  output_dim = 10

  model = MyNN(input_dim, output_dim, num_hidden_layers, neurons_per_layer)
  model.to(device)

  # 3. Parameter init
  learning_rate = 0.01
  epochs = 50

  # 4. optimizer selection
  criterion = nn.CrossEntropyLoss()
  optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1e-4)

  # 5. Training loop
  for epoch in range(epochs):

    for batch_features, batch_labels in train_loader:

      # move data to gpu
      batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

      # forward pass
      outputs = model(batch_features)

      # calculate loss
      loss = criterion(outputs, batch_labels)

      # back pass
      optimizer.zero_grad()
      loss.backward()

      # update grads
      optimizer.step()

  # 6. Evaluation code

  model.eval()

  total = 0
  correct = 0

  with torch.no_grad():

    for batch_features, batch_labels in test_loader:

      # move data to gpu
      batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

      outputs = model(batch_features)

      _, predicted = torch.max(outputs, 1)

      total = total + batch_labels.shape[0]

      correct = correct + (predicted == batch_labels).sum().item()

    accuracy = correct/total

  return accuracy

In [None]:
!pip install optuna

Collecting optuna
  Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.16.1-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.3.0-py3-none-any.whl (386 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.6/386.6 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.16.1-py3-none-any.whl (242 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.5/242.5 kB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, alembic, optuna
Successfully installed alembic-1.16.1 colorlog-6.9.0 optuna-4.3.0


In [None]:
import optuna

study = optuna.create_study(direction = 'maximize')

[I 2025-06-06 04:59:25,263] A new study created in memory with name: no-name-2369f94d-26b5-4f5d-84dd-3563f9de5b86


In [None]:
study.optimize(objective, n_trials = 10)

[I 2025-06-06 05:10:51,989] Trial 2 finished with value: 0.8908 and parameters: {'num_hidden_layers': 1, 'neurons_per_layer': 72}. Best is trial 2 with value: 0.8908.
[I 2025-06-06 05:16:37,737] Trial 3 finished with value: 0.8865 and parameters: {'num_hidden_layers': 5, 'neurons_per_layer': 88}. Best is trial 2 with value: 0.8908.
[I 2025-06-06 05:22:22,925] Trial 4 finished with value: 0.8808 and parameters: {'num_hidden_layers': 5, 'neurons_per_layer': 80}. Best is trial 2 with value: 0.8908.
[I 2025-06-06 05:25:33,562] Trial 5 finished with value: 0.8861 and parameters: {'num_hidden_layers': 1, 'neurons_per_layer': 72}. Best is trial 2 with value: 0.8908.
[I 2025-06-06 05:29:24,107] Trial 6 finished with value: 0.8911 and parameters: {'num_hidden_layers': 2, 'neurons_per_layer': 88}. Best is trial 6 with value: 0.8911.
[I 2025-06-06 05:32:33,516] Trial 7 finished with value: 0.8848 and parameters: {'num_hidden_layers': 1, 'neurons_per_layer': 72}. Best is trial 6 with value: 0.8911

In [None]:
study.best_value

0.8973

In [None]:
study.best_params

{'num_hidden_layers': 3, 'neurons_per_layer': 120}

## Now we are going to try fintune all the other hyperparameters.

In [None]:
class MyNN(nn.Module):

  def __init__(self, input_dim, output_dim, num_hidden_layers, neurons_per_layer, dropout_rate):

    super().__init__()

    layers = []

    for i in range(num_hidden_layers):

      layers.append(nn.Linear(input_dim, neurons_per_layer))
      layers.append(nn.BatchNorm1d(neurons_per_layer))
      layers.append(nn.ReLU())
      layers.append(nn.Dropout(dropout_rate))
      input_dim = neurons_per_layer

    layers.append(nn.Linear(neurons_per_layer, output_dim))

    self.model = nn.Sequential(*layers)

  def forward(self, x):

    return self.model(x.float())

In [None]:
def objective(trial):

  # 1. Hyperparameter from the search.
  num_hidden_layers = trial.suggest_int("num_hidden_layers", 1, 5)
  neurons_per_layer = trial.suggest_int("neurons_per_layer", 8, 128, step = 8)
  epochs = trial.suggest_int("epochs", 10, 50, step = 10)
  learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-1, log = True)
  dropout_rate = trial.suggest_float("dropout_rate", 0.1, 0.5, step = 0.1)
  batch_size = trial.suggest_int("batch_size", [16, 32, 64, 128])
  optimizer_name = trial.suggest_categorical("optimizer", ['Adam', 'SGD', 'RMSprop'])
  weight_decay = trial.suggest_float("weight_decay", 1e-5, 1e-3, log=True)

  train_loader = DataLoader(train_dataset, batch_size = batch_size, shuffle = True, pin_memory = True)
  test_loader = DataLoader(test_dataset, batch_size = batch_size, shuffle = True, pin_memory = True)

  # model init
  input_dim = 784
  output_dim = 10

  model = MyNN(input_dim, output_dim, num_hidden_layers, neurons_per_layer, dropout_rate)
  model.to(device)

  # optimizer selection
  criterion = nn.CrossEntropyLoss()

  if optimizer_name == 'Adam':
    optimizer = optim.Adam(model.parameters(), lr = learning_rate, weight_decay = weight_decay)
  elif optimizer_name == 'SGD':
    optimizer = optim.SGD(model.parameters(), lr = learning_rate, weight_decay = weight_decay)
  else:
    optimizer = optim.RMSprop(model.parameters(), lr = learning_rate, weight_decay = weight_decay)

  # training loop
  for epoch in range(epochs):

    for batch_features, batch_labels in train_loader:

      # move data to gpu
      batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

      # forward pass
      outputs = model(batch_features)

      # calculate loss
      loss = criterion(outputs, batch_labels)

      # back pass
      optimizer.zero_grad()
      loss.backward()

      # update grads
      optimizer.step()

  # evaluation
  model.eval()
  # evaluation on test data
  total = 0
  correct = 0

  with torch.no_grad():

    for batch_features, batch_labels in test_loader:

      # move data to gpu
      batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

      outputs = model(batch_features)

      _, predicted = torch.max(outputs, 1)

      total = total + batch_labels.shape[0]

      correct = correct + (predicted == batch_labels).sum().item()

    accuracy = correct/total

  return accuracy

In [None]:
study = optuna.create_study(direction = 'maximize')

In [None]:
study.optimize(objective, n_trials = 10)

In [None]:
study.best_value

In [None]:
study.best_params

**How to improve NN more?**
1. Increase number of trials.
2. Increase the range of the hyperparameters. For eg, neurons_per_layer --> 8, 200.

**How to improve the architecture more?**
- Changing number of neurons_per_layer for every layer.

**What we want to do is:**
- Progressive shrinking/growing of layer sizes while maintaining batch normalization and dropout consistency across layers.

In [None]:
class MyNN(nn.Module):

  def __init__(self, input_dim, output_dim, layer_neurons, dropout_rate):
    super().__init__()
    layers = []

    current_dim = input_dim

    for neurons in layer_neurons:

      layers.append(nn.Linear(current_dim, neurons))
      layers.append(nn.BatchNorm1d(neurons))
      layers.append(nn.ReLU())
      layers.append(nn.Dropout(dropout_rate))
      current_dim = neurons

    layers.append(nn.Linear(current_dim, output_dim))
    self.model = nn.Sequential(*layers)

  def forward(self, x):
    return self.model(x.float())

In [None]:
def objective(trial):

  num_hidden_layers = trial.suggest_int("num_hidden_layers", 1, 5)
  layer_neurons = [trial.suggest_int(f"neurons_layer_{i}", 8, 128, step = 8) for i in range(num_hidden_layers)]

  epochs = trial.suggest_int("epochs", 10, 50, step = 10)
  learning_rate = trial.suggest_float("learning_rate", 1e-5, 1e-1, log = True)
  dropout_rate = trial.suggest_float("dropout_rate", 0.1, 0.5, step = 0.1)
  batch_size = trial.suggest_int("batch_size", [16, 32, 64, 128])
  optimizer_name = trial.suggest_categorical("optimizer", ['Adam', 'SGD', 'RMSprop'])
  weight_decay = trial.suggest_float("weight_decay", 1e-5, 1e-3, log=True)

**What if I want Optuna to decide whether to use dropout and/or batch normalization, as well as the number of neurons.**

In [None]:
class MyNN(nn.Module):
    def __init__(self, input_dim, output_dim, layer_neurons, use_dropout, use_batchnorm, dropout_rate):
        super().__init__()
        layers = []
        current_dim = input_dim

        for i, neurons in enumerate(layer_neurons):
            layers.append(nn.Linear(current_dim, neurons))
            if use_batchnorm[i]:
                layers.append(nn.BatchNorm1d(neurons))
            layers.append(nn.ReLU())
            if use_dropout[i]:
                layers.append(nn.Dropout(dropout_rate))
            current_dim = neurons

        layers.append(nn.Linear(current_dim, output_dim))
        self.model = nn.Sequential(*layers)

    def forward(self, x):
        return self.model(x.float())

In [None]:
def objective(trial):
    num_hidden_layers = trial.suggest_int("num_hidden_layers", 1, 5)
    layer_neurons = [
        trial.suggest_int(f"neurons_layer_{i}", 8, 128, step=8)
        for i in range(num_hidden_layers)
    ]
    use_dropout = [
        trial.suggest_categorical(f"use_dropout_{i}", [True, False])
        for i in range(num_hidden_layers)
    ]
    use_batchnorm = [
        trial.suggest_categorical(f"use_batchnorm_{i}", [True, False])
        for i in range(num_hidden_layers)
    ]
    dropout_rate = trial.suggest_float("dropout_rate", 0.1, 0.5, step=0.1)
    # ... (other hyperparameters as before)

    model = MyNN(
        input_dim=784,
        output_dim=10,
        layer_neurons=layer_neurons,
        use_dropout=use_dropout,
        use_batchnorm=use_batchnorm,
        dropout_rate=dropout_rate
    ).to(device)
    # ... (rest of training/evaluation code)