<h1>Distributed Hyperparameter Optimization (HPO) Techniques for CNN on MNIST</h1>

In [2]:
%pip install torchvision
%pip install optuna

Collecting torchvision
  Using cached torchvision-0.17.2-cp311-cp311-macosx_10_13_x86_64.whl.metadata (6.6 kB)
Collecting torch==2.2.2 (from torchvision)
  Using cached torch-2.2.2-cp311-none-macosx_10_9_x86_64.whl.metadata (25 kB)
Collecting filelock (from torch==2.2.2->torchvision)
  Downloading filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting sympy (from torch==2.2.2->torchvision)
  Using cached sympy-1.13.3-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch==2.2.2->torchvision)
  Downloading networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting fsspec (from torch==2.2.2->torchvision)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy->torch==2.2.2->torchvision)
  Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Using cached torchvision-0.17.2-cp311-cp311-macosx_10_13_x86_64.whl (1.7 MB)
Using cached torch-2.2.2-cp311-none-macosx_10_9_x86_64.whl (150.8 MB)
Downloading filelock-3.18.

<h2>1. Introduction</h2>

Hyperparameter Optimization (HPO) is a critical step in deep learning model training to improve accuracy and efficiency. 
Traditional hyperparameter tuning approaches like Grid Search and Random Search are computationally expensive and inefficient. 

In this assignment, we compare and analyze different hyperparameter optimization strategies using distributed computing to achieve optimal hyperparameter selection efficiently.

<h2>2. Objectives</h2>

The goal of this project is to:

1. Compare multiple HPO techniques for training a Convolutional Neural Network (CNN) on the MNIST dataset.

2. Evaluate these techniques based on training speed, search efficiency, accuracy, and GPU resource utilization.

3. Implement real-time GPU monitoring to track memory usage and optimize resource allocation.

4. Identify the most effective HPO method that balances speed, accuracy, and efficiency.

<h2>3. HPO Strategies Implemented</h2>

We implemented and compared four different approaches for HPO:

1. Baseline (No HPO): Train the model with default hyperparameters.

2. ASHA (Asynchronous Successive Halving Algorithm): Eliminates underperforming trials early to speed up training.

3. BOHB (Bayesian Optimization + HyperBand): Uses Bayesian learning to intelligently select hyperparameters while efficiently allocating compute resources.

4. BOHB + ASHA Hybrid: Combines BOHB’s smart selection with ASHA’s aggressive pruning for improved efficiency.

<h2>4. Implementation Details</h2>

<h2>4.1 Dataset: MNIST</h2>

The MNIST dataset consists of handwritten digits (0-9).

Training set: 1000 images.

Test set: 1000 images.

Image size: 28x28 pixels, grayscale.

Output classes: 10 (digits 0-9).

In [None]:
import optuna
from optuna.pruners import SuccessiveHalvingPruner
import multiprocessing
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Subset
import numpy as np
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import time
from torch.utils.tensorboard import SummaryWriter
import time
import psutil  # For system memory tracking
from torch.utils.tensorboard import SummaryWriter
import torch
import torch.nn as nn
import torch.optim as optim
import psutil  # For CPU memory tracking
import time
import optuna
from optuna.pruners import SuccessiveHalvingPruner  # ASHA Implementation
import random

In [10]:

# Transformations
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load full MNIST dataset
full_trainset = torchvision.datasets.MNIST(root="./data", train=True, download=True, transform=transform)
full_testset = torchvision.datasets.MNIST(root="./data", train=False, download=True, transform=transform)

# Select 1000 random indices for train and test sets
train_indices = np.random.choice(len(full_trainset), 10000, replace=False)
test_indices = np.random.choice(len(full_testset), 10000, replace=False)

# Create subsets of MNIST
trainset = Subset(full_trainset, train_indices)
testset = Subset(full_testset, test_indices)

# Create DataLoaders
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)
testloader = DataLoader(testset, batch_size=64, shuffle=False)

dataset = (trainloader, testloader)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100.0%


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100.0%


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100.0%


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100.0%

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






<h2>4.2 Model: CNN Architecture</h2>

The CNN model used for training consists of:

1. Two convolutional layers with ReLU activation.

2. Max-pooling layers for feature down-sampling.

3. Fully connected layers with a dropout layer.

4. Softmax activation for classification.

<b>Hyperparameters Considered</b>

Learning Rate - 1e-4 to 1e-2 (log scale)

Dropout Rate - 0.2 to 0.5

Number of Filters - 16, 32, 64


In [11]:


# CNN Model for MNIST
class CNN(nn.Module):
    def __init__(self, dropout_rate=0.5, num_filters=32):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, num_filters, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(num_filters, num_filters * 2, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(num_filters * 2 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        self.dropout = nn.Dropout(dropout_rate)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

<h2>4.3 GPU Monitoring & Resource Utilization Tracking</h2>

We implemented real-time GPU monitoring using PyTorch’s memory allocation tracking.

GPU usage was recorded at each training epoch.

This allowed us to compare memory efficiency across different HPO techniques.

In [12]:
# Function to Monitor GPU Memory Usage
# def log_gpu_usage(tag=""):
#     if torch.cuda.is_available():
#         memory_allocated = torch.cuda.memory_allocated() / 1e6  # Convert to MB
#         memory_reserved = torch.cuda.memory_reserved() / 1e6  # Convert to MB
#         print(f"[{tag}] GPU Memory Used: {memory_allocated:.2f} MB, Reserved: {memory_reserved:.2f} MB")
#         return memory_allocated
#     return 0  # Return 0 if no GPU available


# Function to log memory usage (CPU + GPU approximation)
def log_memory_usage(stage=""):
    # Get CPU RAM usage
    ram_usage = psutil.virtual_memory().used / (1024 ** 2)  # Convert to MB
    
    # Get GPU memory (Approximate via tensor usage)
    if device.type == "mps":
        torch.mps.empty_cache()  # Free unused memory (for better tracking)
        gpu_usage = "MPS does not expose memory tracking"
    else:
        gpu_usage = "GPU not in use"
    
    return ram_usage, gpu_usage

<h2>5. Comparison of HPO Approaches</h2>

1. Training Speed
2. Model Accuracy
3. GPU Memory Utilization

<h2>5.1 Baseline Model (No HPO)</h2>

In [13]:
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
device

device(type='mps')

In [14]:


# Train Baseline Model (Without HPO) with GPU Logging
def train_baseline():
    writer = SummaryWriter(log_dir="./logs/baseline")
    device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
    print(f"Using device: {device}")
    model = CNN().to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    loss_fn = nn.CrossEntropyLoss()
    
    start_time = time.time()
    # gpu_usages = []
    memory_logs = []

    for epoch in range(5):
        model.train()
        epoch_loss = 0
        for images, labels in trainloader:
            images, labels = images.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(images)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()
        
        # Log GPU Memory
        # gpu_usage = log_gpu_usage("Baseline")
        # gpu_usages.append(gpu_usage)
        
        # Log Memory Usage
        ram_usage, gpu_usage = log_memory_usage("Baseline")
        memory_logs.append(ram_usage)
        
        writer.add_scalar("Loss/train", epoch_loss / len(trainloader), epoch)
        writer.add_scalar("Memory/CPU_RAM_MB", ram_usage, epoch)
    
    end_time = time.time()
    
    # Compute GPU Usage Stats
    # avg_gpu_usage = sum(gpu_usages) / len(gpu_usages)
    # avg_gpu_usage = "MPS memory tracking unavailable"

    # Compute Average Memory Usage
    avg_ram_usage = sum(memory_logs) / len(memory_logs)

    # Test Model
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Baseline Accuracy: {accuracy:.2f}%, Training Time: {end_time - start_time:.2f}s, Avg CPU RAM Usage: {avg_ram_usage:.2f} MB, GPU Usage: {gpu_usage}")
    writer.close()

    # return accuracy, end_time - start_time, avg_gpu_usage
    return accuracy, end_time - start_time, avg_ram_usage
    

# Run Baseline Training
# baseline_accuracy, baseline_time, baseline_gpu = train_baseline()
baseline_accuracy, baseline_time, baseline_memory = train_baseline()


Using device: mps
Baseline Accuracy: 97.86%, Training Time: 46.62s, Avg CPU RAM Usage: 8572.09 MB, GPU Usage: MPS does not expose memory tracking


<h2>5.2 ASHA HPO</h2>

Collecting optuna
  Downloading optuna-4.2.1-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.15.1-py3-none-any.whl.metadata (7.2 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting sqlalchemy>=1.4.2 (from optuna)
  Downloading sqlalchemy-2.0.39-cp311-cp311-macosx_10_9_x86_64.whl.metadata (9.6 kB)
Collecting tqdm (from optuna)
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.9-py3-none-any.whl.metadata (2.9 kB)
Collecting greenlet!=0.4.17 (from sqlalchemy>=1.4.2->optuna)
  Using cached greenlet-3.1.1-cp311-cp311-macosx_11_0_universal2.whl.metadata (3.8 kB)
Downloading optuna-4.2.1-py3-none-any.whl (383 kB)
Downloading alembic-1.15.1-py3-none-any.whl (231 kB)
Downloading sqlalchemy-2.0.39-cp311-cp311-macosx_10_9_x86_64.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1

In [None]:


# Train Model with ASHA HPO, Memory Logging, and TensorBoard Logging
def train_cnn_asha(trial):
    writer = SummaryWriter(log_dir=f"./logs/asha_trial_{trial.number}")  # TensorBoard log directory
    device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

    # Sample hyperparameters using Optuna
    dropout_rate = trial.suggest_float("dropout", 0.2, 0.5)
    num_filters = trial.suggest_categorical("num_filters", [16, 32, 64])
    learning_rate = trial.suggest_float("lr", 1e-4, 1e-2, log=True)

    model = CNN(dropout_rate=dropout_rate, num_filters=num_filters).to(device)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    loss_fn = nn.CrossEntropyLoss()

    memory_logs = []
    start_time = time.time()

    for epoch in range(5):
        model.train()
        epoch_loss = 0
        for images, labels in trainloader:
            images, labels = images.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(images)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()

        # Log Memory Usage
        ram_usage, gpu_usage = log_memory_usage("ASHA")
        memory_logs.append(ram_usage)

        # Log Loss and Memory to TensorBoard
        writer.add_scalar("Loss/train", epoch_loss / len(trainloader), epoch)
        writer.add_scalar("Memory/CPU_RAM_MB", ram_usage, epoch)

        # Evaluate Model (ASHA needs validation accuracy for pruning)
        model.eval()
        correct, total = 0, 0
        with torch.no_grad():
            for images, labels in testloader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        accuracy = 100 * correct / total

        # Report accuracy for ASHA pruning
        trial.report(accuracy, epoch)

        # ASHA: Stop bad trials early
        if trial.should_prune():
            writer.close()  # Ensure writer closes even when pruned
            raise optuna.exceptions.TrialPruned()

    end_time = time.time()

    # Compute Average CPU Memory Usage
    avg_ram_usage = sum(memory_logs) / len(memory_logs)

    # Log final accuracy and memory stats to TensorBoard
    writer.add_scalar("Accuracy", accuracy)
    writer.add_scalar("Training Time (s)", end_time - start_time)
    writer.close()

    # Print Summary (Same Format as Baseline)
    print(f"Accuracy: {accuracy:.2f}%, Training Time: {end_time - start_time:.2f}s, "
          f"Avg CPU RAM Usage: {avg_ram_usage:.2f} MB, GPU Usage: {gpu_usage}")

    return accuracy, end_time - start_time, avg_ram_usage, gpu_usage

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
import optuna
from optuna.pruners import SuccessiveHalvingPruner
import multiprocessing

# Store best training time & resource usage
best_training_time = float("inf")
best_ram_usage = float("inf")
best_gpu_usage = None

# Optimize parallel processing
# n_jobs = max(1, multiprocessing.cpu_count() // 2)  # Use half the available cores

# Enable best GPU performance for Apple MPS
torch.set_float32_matmul_precision('high') 

# Define Objective Function for Optuna
def objective(trial):
    global best_training_time, best_ram_usage, best_gpu_usage

    accuracy, training_time, avg_ram_usage, gpu_usage = train_cnn_asha(trial)  # Now returns more metrics

    # Track best training time & resource utilization
    if training_time < best_training_time:
        best_training_time = training_time
        best_ram_usage = avg_ram_usage
        best_gpu_usage = gpu_usage

    return accuracy  # Optuna optimizes based on accuracy

# Create Optuna Study with ASHA (Successive Halving)
study = optuna.create_study(
    study_name="asha_hpo",
    direction="maximize",  # We want to maximize accuracy
    pruner=SuccessiveHalvingPruner(),  # ASHA Pruning
    sampler=optuna.samplers.TPESampler(
        multivariate=True,  # Optimizes multiple parameters together
        constant_liar=True  # Avoids redundant evaluations
    )
)

# Run Optimization (20 Trials)
study.optimize(objective, n_trials=20)

# Print Best Results
print(f"\nBest Model Config: {study.best_params}")
print(f"Best Accuracy: {study.best_value:.2f}%")
print(f"Best Training Time: {best_training_time:.2f}s")
print(f"Best Avg CPU RAM Usage: {best_ram_usage:.2f} MB")
print(f"Best GPU Usage: {best_gpu_usage}")


[I 2025-03-18 11:03:43,113] A new study created in memory with name: asha_hpo
[I 2025-03-18 11:04:38,575] Trial 0 finished with value: 93.83 and parameters: {'dropout': 0.34445838191119016, 'num_filters': 16, 'lr': 0.0001481271872072106}. Best is trial 0 with value: 93.83.


Accuracy: 93.83%, Training Time: 54.50s, Avg CPU RAM Usage: 8456.52 MB, GPU Usage: MPS does not expose memory tracking


[I 2025-03-18 11:05:28,606] Trial 1 finished with value: 96.25 and parameters: {'dropout': 0.3892764670419021, 'num_filters': 16, 'lr': 0.0003344461352061419}. Best is trial 1 with value: 96.25.


Accuracy: 96.25%, Training Time: 50.02s, Avg CPU RAM Usage: 8411.15 MB, GPU Usage: MPS does not expose memory tracking


[I 2025-03-18 11:06:22,065] Trial 2 finished with value: 97.32 and parameters: {'dropout': 0.3380726923209525, 'num_filters': 32, 'lr': 0.0043144527413316106}. Best is trial 2 with value: 97.32.


Accuracy: 97.32%, Training Time: 53.45s, Avg CPU RAM Usage: 8333.52 MB, GPU Usage: MPS does not expose memory tracking


[I 2025-03-18 11:07:15,216] Trial 3 finished with value: 97.66 and parameters: {'dropout': 0.4112362766199672, 'num_filters': 16, 'lr': 0.008400365048314784}. Best is trial 3 with value: 97.66.


Accuracy: 97.66%, Training Time: 53.14s, Avg CPU RAM Usage: 8354.39 MB, GPU Usage: MPS does not expose memory tracking


[I 2025-03-18 11:07:36,303] Trial 4 pruned. 


<h2>5.3 Train with BOHB HPO</h2>

In [54]:
import torch
import torch.nn as nn
import torch.optim as optim
import psutil
import time
import hpbandster.core.nameserver as hpns
import hpbandster.core.result as hpres
from hpbandster.optimizers.bohb import BOHB
from hpbandster.core.worker import Worker
import ConfigSpace as CS
from torch.utils.tensorboard import SummaryWriter

# Define Worker for BOHB
class CNNWorker(Worker):
    def __init__(self, run_id, dataset, **kwargs):
        # print('__init__')
        super().__init__(run_id, **kwargs)
        self.trainloader, self.testloader = dataset
        self.device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

    def compute(self, config, budget, **kwargs):
        print(type(config))
        print(type(budget))
        writer = SummaryWriter(log_dir=f"./logs/bohb_trial_{config}")
        print('writer')

        # ✅ Convert `config` to Python native dict
        config_native = {key: int(value) if isinstance(value, np.integer) else float(value) if isinstance(value, np.floating) else value for key, value in config.items()}
        
        model = CNN(dropout_rate=float(config["dropout"]), num_filters=int(config["num_filters"])).to(self.device)
        optimizer = optim.Adam(model.parameters(), lr=float(config["lr"]))
        loss_fn = nn.CrossEntropyLoss()
        print('reached loss_fn')

        memory_logs = []
        start_time = time.time()

        for epoch in range(int(budget)):  # `budget` is set by BOHB (early stopping)
            model.train()
            epoch_loss = 0
            for images, labels in self.trainloader:
                images, labels = images.to(self.device), labels.to(self.device)
                optimizer.zero_grad()
                outputs = model(images)
                loss = loss_fn(outputs, labels)
                loss.backward()
                optimizer.step()
                epoch_loss += loss.item()

            # Log Memory Usage
            ram_usage, gpu_usage = log_memory_usage()
            memory_logs.append(float(ram_usage))

            # Log Loss and Memory to TensorBoard
            writer.add_scalar("Loss/train", epoch_loss / len(self.trainloader), epoch)
            writer.add_scalar("Memory/CPU_RAM_MB", ram_usage, epoch)

        end_time = time.time()
        avg_ram_usage = sum(memory_logs) / len(memory_logs)

        # Evaluate Model
        model.eval()
        correct, total = 0, 0
        with torch.no_grad():
            for images, labels in self.testloader:
                images, labels = images.to(self.device), labels.to(self.device)
                outputs = model(images)
                _, predicted = torch.max(outputs, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        accuracy = 100 * correct / total

        # ✅ Convert all NumPy types to standard Python types before returning
        accuracy = int(np.round(accuracy))  # Convert NumPy int64 to Python int
        avg_ram_usage = float(np.round(avg_ram_usage, 2))  # Convert float32 to Python float
        training_time = float(np.round(end_time - start_time, 2))  # Convert time to Python float

        writer.add_scalar("Accuracy", accuracy)
        writer.add_scalar("Training Time (s)", training_time)
        writer.close()

        print(f"Accuracy: {accuracy:.2f}%, Training Time: {training_time:.2f}s, "
              f"Avg CPU RAM Usage: {avg_ram_usage:.2f} MB, GPU Usage: {gpu_usage}")

        return {
        "loss": -accuracy,  # Loss should be negative for BOHB to maximize accuracy
        "info": {
            "training_time": training_time,
            "ram_usage": avg_ram_usage,
            "config": config_native  # ✅ Ensure all values are JSON serializable
            }
        }

    @staticmethod
    def get_configspace():
        cs = CS.ConfigurationSpace()
        cs.add(CS.UniformFloatHyperparameter("dropout", lower= float(0.2), upper= float(0.5)))
        cs.add(CS.CategoricalHyperparameter("num_filters", choices=[16, 32, 64]))
        cs.add(CS.UniformFloatHyperparameter("lr", lower=float(0.0001), upper=float(0.01)))
        # print('CS')
        return cs

In [58]:
def get_configspace():
    config_space = {
        "dropout": {"type": "float", "lower": 0.2, "upper": 0.5},
        "num_filters": {"type": "categorical", "choices": [16, 32, 64]},
        "lr": {"type": "float", "lower": 0.0001, "upper": 0.01}
    }
    return config_space


In [60]:


def sample_hyperparameters(config_space):
    sampled_config = {}
    for param, details in config_space.items():
        if details["type"] == "float":
            sampled_config[param] = random.uniform(details["lower"], details["upper"])
        elif details["type"] == "categorical":
            sampled_config[param] = random.choice(details["choices"])
    return sampled_config


In [61]:
config_space = get_configspace()
sampled_config = sample_hyperparameters(config_space)
print("Sampled Hyperparameters:", sampled_config)


Sampled Hyperparameters: {'dropout': 0.37038666386353447, 'num_filters': 32, 'lr': 0.0017675938006139328}


In [62]:
# Set up BOHB optimization
run_id = "bohb_hpo"
# result_logger = hpres.json_result_logger(directory="./bohb_results", overwrite=True)

# Start Nameserver for BOHB
NS = hpns.NameServer(run_id=run_id, host="localhost", port=0)
NS.start()

# Start BOHB Worker
worker = CNNWorker(run_id=run_id, dataset=dataset, nameserver="localhost", nameserver_port=NS.port)
worker.run(background=True)

print(CNNWorker.get_configspace())

# Run BOHB Optimization
bohb = BOHB(
    configspace=sampled_config,
    run_id=run_id,
    nameserver="localhost",
    nameserver_port=NS.port,
    min_budget= int(1),  # Minimum epochs per trial
    max_budget= int(5)  # Maximum epochs per trial
    # result_logger=result_logger
)

res = bohb.run(n_iterations= int(20))  # Number of trials

# # Shutdown Nameserver and Worker
# bohb.shutdown(shutdown_workers=True)
# NS.shutdown()

# # Get Best Hyperparameters
# best_config = res.get_incumbent_id()
# best_accuracy = -res.get_incumbent_trajectory()["losses"][-1]

# print(f"\n🔹 Best Model Config: {res.get_id2config_mapping()[best_config]['config']}")
# print(f"✅ Best Accuracy: {best_accuracy:.2f}%")

04:19:02 WORKER: Connected to nameserver <Pyro4.core.Proxy at 0x302e18050; connected IPv4; for PYRO:Pyro.NameServer@localhost:53931>


Configuration space object:
  Hyperparameters:
    dropout, Type: UniformFloat, Range: [0.2, 0.5], Default: 0.35
    lr, Type: UniformFloat, Range: [0.0001, 0.01], Default: 0.00505
    num_filters, Type: Categorical, Choices: {16, 32, 64}, Default: 16



AttributeError: 'dict' object has no attribute 'get_hyperparameters'

04:19:02 WORKER: No dispatcher found. Waiting for one to initiate contact.
04:19:02 WORKER: start listening for jobs
