# Tutorial 4-2: The Hyperparameter Hunt

**Course:** CSEN 342: Deep Learning  
**Topic:** Hyperparameter Optimization, Grid vs. Random Search, and HPC Batch Jobs

## Objective
Finding the right hyperparameters (learning rate, weight decay, batch size) is often more important than the model architecture itself. As discussed in class, there are two main strategies:

1.  **Grid Search:** Systematically checking every combination of fixed values.
2.  **Random Search:** Randomly sampling hyperparameters from a distribution.

In this tutorial, we will use the **HPC Cluster** to run a "Hyperparameter Hunt" in the background. You will:
1.  Define a flexible training script that accepts hyperparameters as arguments.
2.  Submit a batch job that runs both Grid Search and Random Search.
3.  Analyze the results to demonstrate why Random Search is theoretically and practically superior for high-dimensional problems.

---

## Part 1: The Configurable Trainer

First, we need a training function that isn't hard-coded. It must accept `lr` (learning rate) and `wd` (weight decay) as input arguments.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Import utility functions
import os
import sys
sys.path.append(os.path.abspath(os.path.join('..')))
from utils import download_fashion_mnist

download_fashion_mnist()

# Define a simple CNN for Fashion-MNIST
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(16, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Linear(32 * 7 * 7, 10)

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        return self.classifier(x)

# A single function to run one complete training session
# Returns: Final Validation Accuracy
def train_evaluate_model(lr, weight_decay, epochs=3, device='cpu'):
    # 1. Load Data (Use shared directory)
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    # We use a subset for speed in this tutorial
    full_trainset = torchvision.datasets.FashionMNIST(root='../data', train=True, download=False, transform=transform)
    train_subset, val_subset, _ = torch.utils.data.random_split(full_trainset, [2000, 1000, 57000])
    
    trainloader = torch.utils.data.DataLoader(train_subset, batch_size=64, shuffle=True)
    valloader = torch.utils.data.DataLoader(val_subset, batch_size=100, shuffle=False)
    
    # 2. Setup Model
    model = SimpleCNN().to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
    
    # 3. Train
    for epoch in range(epochs):
        model.train()
        for inputs, labels in trainloader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            output = model(inputs)
            loss = criterion(output, labels)
            loss.backward()
            optimizer.step()
            
    # 4. Evaluate
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in valloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
    return 100 * correct / total

# Smoke Test: Run one epoch locally to make sure code doesn't crash
print("Running smoke test...")
acc = train_evaluate_model(lr=0.01, weight_decay=0.0, epochs=1)
print(f"Test Run Accuracy: {acc:.2f}%")

---

## Part 2: The Search Script

We will now generate a Python script `hyperparam_search.py` that runs the full experiment.

**The Experiment Design:**
We have a "budget" of **9 training runs** for each method.

* **Grid Search:** We test 3 learning rates $\times$ 3 weight decays.
    * LR: $[10^{-2}, 10^{-3}, 10^{-4}]$
    * WD: $[10^{-3}, 10^{-4}, 10^{-5}]$
* **Random Search:** We sample 9 random points from the same ranges.
    * LR: LogUniform distribution between $10^{-4}$ and $10^{-2}$.
    * WD: LogUniform distribution between $10^{-5}$ and $10^{-3}$.

In [None]:
%%writefile hyperparam_search.py
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import random
import math
import csv

# Import utility functions
import os
import sys
sys.path.append(os.path.abspath(os.path.join('..')))
from utils import download_fashion_mnist

download_fashion_mnist()

# --- Re-defining Model and Train Function for the standalone script ---
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(16, 32, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2)
        )
        self.classifier = nn.Linear(32 * 7 * 7, 10)
    def forward(self, x):
        return self.classifier(self.features(x).view(x.size(0), -1))

def train_evaluate(lr, wd, device, trainloader, valloader):
    model = SimpleCNN().to(device)
    optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    criterion = nn.CrossEntropyLoss()
    
    # Train for 5 epochs for the search
    for _ in range(5):
        model.train()
        for inputs, labels in trainloader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            loss = criterion(model(inputs), labels)
            loss.backward()
            optimizer.step()
            
    # Evaluate
    model.eval()
    correct = 0; total = 0
    with torch.no_grad():
        for inputs, labels in valloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    return 100 * correct / total

# --- Main Execution ---
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Running on {device}")

# Load Data Once
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
full_set = torchvision.datasets.FashionMNIST(root='../data', train=True, download=False, transform=transform)
train_set, val_set, _ = torch.utils.data.random_split(full_set, [5000, 2000, 53000]) # 5k Training
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=100, shuffle=False)

# Open CSV to save results
with open('search_results.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['type', 'lr', 'weight_decay', 'accuracy'])

    # 1. Grid Search (3x3 = 9 runs)
    # We cover specific points in the space
    grid_lrs = [1e-2, 1e-3, 1e-4]
    grid_wds = [1e-3, 1e-4, 1e-5]
    
    print("Starting Grid Search...")
    for lr in grid_lrs:
        for wd in grid_wds:
            acc = train_evaluate(lr, wd, device, train_loader, val_loader)
            print(f"[Grid] LR={lr}, WD={wd} -> Acc={acc:.2f}%")
            writer.writerow(['grid', lr, wd, acc])

    # 2. Random Search (9 runs)
    # We sample from the log-uniform distribution to cover the same range
    print("Starting Random Search...")
    for i in range(9):
        # LogUniform sampling: 10^uniform(low_exp, high_exp)
        lr = 10 ** random.uniform(-4, -2) # 1e-4 to 1e-2
        wd = 10 ** random.uniform(-5, -3) # 1e-5 to 1e-3
        
        acc = train_evaluate(lr, wd, device, train_loader, val_loader)
        print(f"[Random] LR={lr:.5f}, WD={wd:.5f} -> Acc={acc:.2f}%")
        writer.writerow(['random', lr, wd, acc])

print("Search Complete. Results saved to search_results.csv")

## Part 3: Submitting the Job

Now we create the Slurm script to run this efficiently on a GPU node. Make sure to change the user email to your email.

In [None]:
%%writefile submit.sh
#!/bin/bash
#SBATCH --partition=hub
#SBATCH --job-name=hyper_hunt
#SBATCH --output=search_%j.log
#SBATCH --error=search_%j.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
## #SBATCH --gres=gpu:1
#SBATCH --time=00:20:00
#SBATCH --mem=8G
#SBATCH --mail-user=user_account@scu.edu
#SBATCH --mail-type=END

# Load necessary modules (Uncomment and adjust based on your cluster's specific modules)
module load Anaconda3
conda activate 342wi26


python hyperparam_search.py

In [None]:
# Submit the job
!sbatch submit.sh

### Monitoring
Use `!squeue -u $USER` to verify your job is running. Wait for it to complete before moving to Part 4. You should receive an email when it's done.

In [None]:
!squeue -u $USER

---

## Part 4: Analyzing Results

Once the job is done, we will read `search_results.csv` and visualize the difference. We expect Random Search to likely find a better configuration because it explores more unique values for the "important" parameters (usually Learning Rate).

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import os

if os.path.exists('search_results.csv'):
    df = pd.read_csv('search_results.csv')
    
    # Separate Data
    grid_df = df[df['type'] == 'grid']
    rand_df = df[df['type'] == 'random']
    
    print("Top 3 Grid Configurations:")
    print(grid_df.sort_values('accuracy', ascending=False).head(3)[['lr', 'weight_decay', 'accuracy']])
    
    print("\nTop 3 Random Configurations:")
    print(rand_df.sort_values('accuracy', ascending=False).head(3)[['lr', 'weight_decay', 'accuracy']])
    
    # Visualization
    plt.figure(figsize=(10, 6))
    
    # Plot Grid Points
    plt.scatter(grid_df['lr'], grid_df['weight_decay'], 
                s=grid_df['accuracy']*5, c='blue', alpha=0.6, label='Grid Search')
    
    # Plot Random Points
    plt.scatter(rand_df['lr'], rand_df['weight_decay'], 
                s=rand_df['accuracy']*5, c='red', alpha=0.6, label='Random Search')
    
    plt.xscale('log')
    plt.yscale('log')
    plt.xlabel('Learning Rate (Log Scale)')
    plt.ylabel('Weight Decay (Log Scale)')
    plt.title('Hyperparameter Search Space (Size = Accuracy)')
    plt.legend()
    plt.grid(True, which="both", ls="--")
    plt.show()
    
else:
    print("Results file not found yet. Please wait for the job to complete.")

### Conclusion
Observe the scatter plot. 
* **Grid Search** samples overlap on the axes (we essentially tested only 3 unique learning rates).
* **Random Search** tests 9 unique learning rates. Since Learning Rate is usually the most important parameter, Random Search gives us a better chance of hitting the "sweet spot" between our grid lines.