## Hyperparameter Tuning

The hyperparameters in deep learning to tune are 
- the number of neurons
- activation function
- optimiser
- learning rate
- batch size
- epochs 
- number of layers.


Reference: 
- https://www.analyticsvidhya.com/blog/2021/05/tuning-the-hyperparameters-and-layers-of-neural-network-deep-learning/

In [1]:
from typing import Tuple, List, Optional, Callable
import random
import os
from tqdm import tqdm
import gdown
import zipfile

import torch
import torch.optim as optim
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Subset, random_split
from torchvision.models import efficientnet_b5, EfficientNet_B5_Weights

from ray import tune
from ray.tune.search.optuna import OptunaSearch
from ray.tune.schedulers import ASHAScheduler

print("Libraries imported. Using device:", "cuda" if torch.cuda.is_available() else "cpu")

Libraries imported. Using device: cpu


## Data Download
This is adapted from SeparatingData.ipynb
Download the processed dataset from Google Drive is yet to.

In [2]:
def download_dataset(data_dir: str, zip_url: str, zip_filename: str, root_dir: str) -> None:
    """
    Download and extract the dataset from Google Drive if it doesn't exist

    Args:
        data_dir (str): Directory where the dataset zip file will be stored
        zip_url (str): URL of the dataset zip file
        zip_filename (str): Name for the downloaded zip file
        root_dir (str): Directory where the dataset will be extracted
    """
    # Create the data directory if it doesn't exist.
    if not os.path.exists(root_dir):
        os.makedirs(root_dir)
        print(f"Created directory: {root_dir}")

    # Check if the dataset is already extracted.
    if not os.path.exists(data_dir):
        print(f"Downloading dataset from {zip_url} to {zip_filename}")
        gdown.download(zip_url, zip_filename, quiet=False)
        print("Extracting dataset...")
        with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
            zip_ref.extractall(root_dir)
        print(f"Extraction complete. Dataset available at {data_dir}")
    else:
        print(f"Dataset already exists at {data_dir}")

In [3]:
DATA_DIR = "../input/train_images_4_class"
ZIP_URL = "https://drive.google.com/uc?id=1DiCQ52XyU40nwl5JC6B2nUBVYQrrpFl4"
ZIP_FILENAME = "../input/train_images_4_class.zip"
ROOT_DIR = "../input"

download_dataset(DATA_DIR, ZIP_URL, ZIP_FILENAME, ROOT_DIR)

Dataset already exists at ../input/train_images_4_class


## Subset Data Load
For fasting hyperparameter tuning, use a subset of the dataset to find the most optimised set of hyperparameters.
Loads dataset from processed dataset which should have been split to train, test, eval.

In [4]:
def create_tuning_data_loaders(
    dataset_root: str,
    transform: transforms.Compose,
    batch_size: int = 32,
    subset_fraction: float = 0.1,
    random_seed: int = 42
) -> Tuple[DataLoader, DataLoader, DataLoader]:
    """Creates DataLoaders for hyperparameter tuning using a subset of the dataset.

    If the dataset_root directory contains subfolders 'train', 'eval', and 'test',
    these are loaded directly. Otherwise, the full dataset is loaded and randomly split.
    In either case, a subset of each split is sampled for faster tuning.

    Note:
      The transform provided is applied to each image as it is loaded by ImageFolder.
      Even if the images in the pre-split folders have been preprocessed externally,
      it is common to store raw images and apply transforms on the fly.

    Args:
        dataset_root (str): Root directory of the dataset. This should either be the parent
            folder of the split directories or the folder containing all images.
        transform (transforms.Compose): Transformations to apply to the dataset.
        batch_size (int, optional): Batch size for DataLoaders. Defaults to 32.
        subset_fraction (float, optional): Fraction of each split to use for tuning. Defaults to 0.1.
        random_seed (int, optional): Random seed for reproducibility. Defaults to 42.

    Returns:
        Tuple[DataLoader, DataLoader, DataLoader]: DataLoaders for training, validation, and test subsets.
    """
    # Check if dataset_root contains pre-split 'train', 'eval', and 'test' subdir
    split_dirs = ['train', 'eval', 'test']
    if all(os.path.exists(os.path.join(dataset_root, sub)) for sub in split_dirs):
        train_dataset = datasets.ImageFolder(root=os.path.join(dataset_root, 'train'), transform=transform)
        print(f"Full train dataset: {len(train_dataset)}")
        val_dataset = datasets.ImageFolder(root=os.path.join(dataset_root, 'eval'), transform=transform)
        print(f"Full val dataset: {len(val_dataset)}")
        test_dataset = datasets.ImageFolder(root=os.path.join(dataset_root, 'test'), transform=transform)
        print(f"Full test dataset: {len(test_dataset)}")

    else:
        # If not pre-split, load the full dataset and split it randomly
        full_dataset = datasets.ImageFolder(root=dataset_root, transform=transform)
        total_len = len(full_dataset)
        train_len = int(0.7 * total_len)
        val_len = int(0.15 * total_len)
        test_len = total_len - train_len - val_len
        train_dataset, val_dataset, test_dataset = random_split(
            full_dataset, [train_len, val_len, test_len],
            generator=torch.Generator().manual_seed(random_seed)
        )

    def sample_subset(dataset, fraction):
        """Returns a subset of the dataset with the specified fraction of examples."""
        dataset_len = len(dataset)
        subset_size = max(1, int(fraction * dataset_len))
        indices = random.sample(range(dataset_len), subset_size)
        return Subset(dataset, indices)

    # Sample a subset from each split
    if subset_fraction < 1.0:
        subset_train_dataset = sample_subset(train_dataset, subset_fraction)
        print(f"Subset-train dataset: {len(train_dataset)}")
        subset_val_dataset = sample_subset(val_dataset, subset_fraction)
        print(f"Subset-val dataset: {len(val_dataset)}")
        subset_test_dataset = sample_subset(test_dataset, subset_fraction)
        print(f"Subset-test dataset: {len(test_dataset)}")

    subset_train_loader = DataLoader(subset_train_dataset, batch_size=batch_size, shuffle=True)
    subset_val_loader = DataLoader(subset_val_dataset, batch_size=batch_size, shuffle=False)
    subset_test_loader = DataLoader(subset_test_dataset, batch_size=batch_size, shuffle=False)

    return subset_train_loader, subset_val_loader, subset_test_loader

In [5]:
SPLIT_DATASET = os.path.abspath("../input/actual")
BATCH_SIZE = 64

# Define a transform. 
# In this notebook, we assume we doing it for EfficientNetB5, so we resize to ~456x456
weights = EfficientNet_B5_Weights.DEFAULT
transform = transforms.Compose([
        transforms.Resize((456, 456)),
        transforms.ToTensor(),
        weights.transforms()  # applies normalization as required
    ])

TRAIN_LOADER, VAL_LOADER, TEST_LOADER = create_tuning_data_loaders(
    dataset_root=SPLIT_DATASET,
    transform=transform,
    batch_size=BATCH_SIZE,
    subset_fraction=0.1,
    random_seed=42
)

print("DataLoaders for hyperparameter tuning are ready.")

Full train dataset: 8033
Full val dataset: 1719
Full test dataset: 1726
Subset-train dataset: 8033
Subset-val dataset: 1719
Subset-test dataset: 1726
DataLoaders for hyperparameter tuning are ready.


## Model Specifications
This is where you should replace with your model.

EfficientNetB5 Partial Transfer Learning:
- https://discuss.pytorch.org/t/partial-transfer-learning-efficientnet/109689 

In [6]:
class BaseEfficientNetB5(nn.Module):
    """EfficientNetB5 model for transfer learning on the dog emotion dataset
    with a configurable classification head for hyperparameter tuning, 
    i.e parameters you wish to tune need to be specified

    This model uses a pretrained EfficientNetB5 backbone and replaces its
    classifier with a multi-layer fully connected network whose architecture
    can be tuned (number of layers, neurons, and activation function)
    """
    
    def __init__(self,
                 num_classes: int = 4,
                 dropout: float = 0.2,
                 freeze_backbone: bool = False,
                 hidden_sizes: Optional[List[int]] = None,
                 activation: str = 'relu') -> None:
        """
        Args:
            num_classes (int): Number of output classes.
            dropout (float): Dropout rate to apply in the classifier.
            freeze_backbone (bool): If True, freeze the backbone layers.
            hidden_sizes (Optional[List[int]]): List of sizes for hidden layers in the classifier.
                If None, a single linear layer is used.
            activation (str): Activation function to use in the classifier ('relu', 'tanh', etc.).
        """
        super(BaseEfficientNetB5, self).__init__()
        weights = EfficientNet_B5_Weights.DEFAULT
        self.backbone = efficientnet_b5(weights=weights)
        in_features = self.backbone.classifier[1].in_features
        
        if freeze_backbone:
            for param in self.backbone.features.parameters():
                param.requires_grad = False
        
        # Build the classifier based on the provided hidden_sizes
        layers = []
        input_dim = in_features
        if hidden_sizes:
            for hidden_dim in hidden_sizes:
                layers.append(nn.Dropout(p=dropout))
                layers.append(nn.Linear(input_dim, hidden_dim))
                layers.append(self._get_activation(activation))
                input_dim = hidden_dim
            # final classification layer.
            layers.append(nn.Dropout(p=dropout))
            layers.append(nn.Linear(input_dim, num_classes))
        else:
            # single linear layer if no hidden layers specified
            layers.append(nn.Dropout(p=dropout))
            layers.append(nn.Linear(input_dim, num_classes))
        
        self.backbone.classifier[1] = nn.Sequential(*layers)
    
    def _get_activation(self, activation: str) -> Callable:
        """Returns an activation function based on the given string.

        Args:
            activation (str): Name of the activation function.

        Returns:
            Callable: Activation function module.
        """
        if activation.lower() == 'relu':
            return nn.ReLU()
        elif activation.lower() == 'tanh':
            return nn.Tanh()
        elif activation.lower() == 'sigmoid':
            return nn.Sigmoid()
        else:
            raise ValueError(f"Unsupported activation function: {activation}")
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.backbone(x)

In [7]:
model = BaseEfficientNetB5(num_classes=4, dropout=0.3, freeze_backbone=True, hidden_sizes=[256, 128], activation='relu')
print("Model instantiated:", model.__class__.__name__)

Model instantiated: BaseEfficientNetB5


## Hyperparameter Tuning
This is the part where you write the training function and load it to the ray tune scheduler.
For this execution, ASHAscheduler is used with Optuna for bayesian optimisation techniques - which should be using the default Tree-Structured Parzen Estimator.
If many parameters, this is would be more efficient than grid search and random search.

References:
- https://docs.ray.io/en/latest/tune/examples/includes/async_hyperband_example.html
- https://docs.ray.io/en/latest/tune/examples/tune-pytorch-cifar.html 
- https://docs.ray.io/en/latest/tune/examples/includes/mnist_pytorch.html
- https://docs.ray.io/en/latest/tune/api/suggestion.html

In [8]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from ray import tune

CHECKPOINT_DIR = os.path.abspath("../models/hyptune")
def train_model(config, checkpoint_dir=CHECKPOINT_DIR, data_dir=None):
    """Training function for Ray Tune hyperparameter tuning.

    This function instantiates the model with hyperparameters
    specified in the config dictionary, trains the model on the global TRAIN_LOADER,
    evaluates on VAL_LOADER, and reports the validation loss to Ray Tune.

    Args:
        config (dict): Hyperparameter configuration. Expected keys include:
            - lr (float): Learning rate.
            - weight_decay (float): Weight decay for the optimizer.
            - dropout (float): Dropout rate for the classifier.
            - hidden_sizes (list or None): List of hidden layer sizes in the classifier.
            - activation (str): Activation function to use ('relu', 'tanh', etc.).
            - freeze_backbone (bool): Whether to freeze the model backbone.
            - num_epochs (int): Number of training epochs.
            - optimiser (callable, optional): Optimiser class. Default is optim.Adam.
            - criterion (callable, optional): Loss function instance. Default is nn.CrossEntropyLoss().
        checkpoint_dir (str, optional): Directory for checkpointing (if applicable).
        data_dir (str, optional): Not used here; included for compatibility.
    """
    device = "cuda" if torch.cuda.is_available() else "cpu"

    if checkpoint_dir:
        os.makedirs(checkpoint_dir, exist_ok=True)
        print(f"Checkpoint Folder exists")
    
    # instantiate model with hyperparameters from config
    model = BaseEfficientNetB5(
        num_classes=4,
        dropout=config.get("dropout", 0.2),
        freeze_backbone=config.get("freeze_backbone", True),
        hidden_sizes=config.get("hidden_sizes", None),
        activation=config.get("activation", "relu")
    ).to(device)
    
    optimiser_class = config.get("optimiser", optim.Adam)
    optimiser = optimiser_class(model.parameters(), lr=config["lr"], weight_decay=config["weight_decay"])
    criterion = config.get("criterion", nn.CrossEntropyLoss())

    num_epochs = config.get("num_epochs", 2)  # a low number for quick tuning, but update accordingly
    
    # training loop
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for inputs, targets in tqdm(TRAIN_LOADER, desc=f"Epoch {epoch+1}/{num_epochs}"):
            inputs, targets = inputs.to(device), targets.to(device)
            optimiser.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimiser.step()
            running_loss += loss.item() * inputs.size(0)
        
        epoch_loss = running_loss / len(TRAIN_LOADER.dataset)
        print(f"Epoch {epoch + 1}/{num_epochs}, Training Loss: {epoch_loss:.4f}")
        
        # Optionally, checkpoint the model.
        if checkpoint_dir:
            path = os.path.join(checkpoint_dir, f"checkpoint_{epoch}.pt")
            torch.save(model.state_dict(), path)
    
    # Evaluation on the validation set
    model.eval()
    total_loss = 0.0
    with torch.no_grad():
        for inputs, targets in VAL_LOADER:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            total_loss += loss.item() * inputs.size(0)
    
    avg_val_loss = total_loss / len(VAL_LOADER.dataset)
    print(f"Validation Loss: {avg_val_loss:.4f}")
    
    # Report the metric to Ray Tune.
    tune.report({"loss": avg_val_loss})

In [None]:
asha_scheduler = ASHAScheduler(
    time_attr='training_iteration',
    metric='loss',
    mode='min',
    max_t=100,           # max training iterations per trial
    grace_period=10,     # min iterations before stopping
    reduction_factor=3,
    brackets=1,
)

optuna_search = OptunaSearch(metric="loss", mode="min", seed=42)

# define search space in a config dictionary, i.e what are the values you want to try, this is just example of format
'''
config = {
    "lr": tune.loguniform(1e-5, 1e-2),
    "weight_decay": tune.loguniform(1e-6, 1e-2),
    "dropout": tune.uniform(0.1, 0.5),
    "hidden_sizes": tune.choice([[256, 128], [512, 256], None]),
    "activation": tune.choice(["relu", "tanh"]),
    "freeze_backbone": tune.choice([True, False]),
    "num_epochs": 2, 
    "optimiser": tune.choice([optim.Adam, optim.SGD]),
    "criterion": tune.choice([nn.CrossEntropyLoss, nn.NLLLoss]),
}
'''

# this is what i specified for the example because i am running on cpu
config = {
    "lr": tune.loguniform(1e-5, 1e-2),
    "weight_decay": tune.loguniform(1e-6, 1e-2),
    "dropout": tune.uniform(0.1, 0.5),
    "freeze_backbone": tune.choice([True]),
    "num_epochs": 2,
}

# tuner object
tuner = tune.Tuner(
    tune.with_resources(train_model, {"cpu": 2, "gpu": 0}), # specify based on the device u using because by default it uses all, i.e if u have 4 cpus; it does 4 concurrent trials
    tune_config=tune.TuneConfig(
        scheduler=asha_scheduler,
        search_alg=optuna_search,
        num_samples=2,  # number of trials to run
    ),
    run_config=tune.RunConfig(verbose=1),
    param_space=config,
)

results = tuner.fit()
print("Best config:", results.get_best_result(metric="loss", mode="min").config)

0,1
Current time:,2025-04-06 04:27:56
Running for:,00:30:39.09
Memory:,9.4/18.0 GiB

Trial name,status,loc,dropout,freeze_backbone,lr,weight_decay,iter,total time (s),loss
train_model_8c6f8dad,TERMINATED,127.0.0.1:29051,0.392798,True,0.000132929,0.00635122,1,1835.63,1.36224
train_model_18671d32,TERMINATED,127.0.0.1:29056,0.162398,True,0.000625137,4.20799e-06,1,1835.87,1.30187


[36m(train_model pid=29051)[0m Checkpoint Folder exists


Epoch 1/2:   0%|          | 0/13 [00:00<?, ?it/s]
Epoch 1/2:   8%|▊         | 1/13 [01:08<13:39, 68.32s/it]
Epoch 1/2:   0%|          | 0/13 [00:00<?, ?it/s]
Epoch 1/2:  15%|█▌        | 2/13 [02:15<12:26, 67.89s/it][32m [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)[0m
Epoch 1/2:  23%|██▎       | 3/13 [03:22<11:14, 67.42s/it][32m [repeated 2x across cluster][0m
Epoch 1/2:  31%|███       | 4/13 [04:29<10:05, 67.25s/it][32m [repeated 2x across cluster][0m
Epoch 1/2:  38%|███▊      | 5/13 [05:36<08:56, 67.03s/it][32m [repeated 2x across cluster][0m
Epoch 1/2:  46%|████▌     | 6/13 [06:42<07:46, 66.60s/it][32m [repeated 2x across cluster][0m
Epoch 1/2:  54%|█████▍    | 7/13 [07:48<06:38, 66.44s/it][32m [repeated 2x across cluster][0m
Epoch 1/2:  62%|██████▏   | 8/13 [08:54<05:31, 66.33s

[36m(train_model pid=29051)[0m Epoch 1/2, Training Loss: 1.3845
[36m(train_model pid=29056)[0m Checkpoint Folder exists


Epoch 2/2:   0%|          | 0/13 [00:00<?, ?it/s]
Epoch 2/2:   8%|▊         | 1/13 [01:07<13:24, 67.02s/it]
Epoch 1/2: 100%|██████████| 13/13 [13:56<00:00, 64.32s/it]
Epoch 2/2:   0%|          | 0/13 [00:00<?, ?it/s]
Epoch 2/2:  15%|█▌        | 2/13 [02:12<12:08, 66.21s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  23%|██▎       | 3/13 [03:18<10:59, 65.97s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  31%|███       | 4/13 [04:24<09:53, 65.98s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  38%|███▊      | 5/13 [05:31<08:50, 66.29s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  46%|████▌     | 6/13 [06:37<07:43, 66.22s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  54%|█████▍    | 7/13 [07:44<06:38, 66.45s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  62%|██████▏   | 8/13 [08:50<05:31, 66.38s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  69%|██████▉   | 9/13 [09:56<04:24, 66.25s/it][32m [repeated 2x across cluster][0m
Epoch 2/2:  77%

[36m(train_model pid=29051)[0m Epoch 2/2, Training Loss: 1.3708[32m [repeated 2x across cluster][0m


Epoch 2/2: 100%|██████████| 13/13 [13:53<00:00, 64.12s/it]


[36m(train_model pid=29051)[0m Validation Loss: 1.3622
[36m(train_model pid=29056)[0m Epoch 2/2, Training Loss: 1.2704


2025-04-06 04:27:56,224	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/huiningonn/ray_results/train_model_2025-04-06_03-57-16' in 0.0034s.
2025-04-06 04:27:56,227	INFO tune.py:1041 -- Total run time: 1839.11 seconds (1839.08 seconds for the tuning loop).


Best config: {'lr': 0.0006251373574521745, 'weight_decay': 4.2079886696066345e-06, 'dropout': 0.16239780813448107, 'freeze_backbone': True, 'num_epochs': 2}




[36m(train_model pid=29056)[0m Validation Loss: 1.3019
