<img src="https://drive.google.com/uc?id=1DvKhAzLtk-Hilu7Le73WAOz2EBR5d41G" width="500"/>

---


### ***Name***: [Tehmoor Gull]
### ***username***: [acse_tg1523]
### ***CID***: [02538824]


You can save this notebook in Colab by clicking `File` from the top menu, and then selecting `Download --> Download .ipynb`

Make sure that when you save your notebook you have all the cells executed and you can see the outputs (livelossplot graphs, etc)

## Hyperparameter tunning notebook

Explain the steps and tests you do.

Organise it well to show how the data you present here has helped you design your final network hyperparameters (that you will use for the final training in the `yourusername_DLcw1_clean.ipynb` notebook).

While not as efficient. I am going to change the code to be able to accept hyperparameters in this file. Once I get the best hyper parameters I will apply them in the other notebook. I am copying most of the code except my implementation of the VAE here is more flexible for hyperparameters.

In [1]:
# Load the Drive helper and mount
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Standard libraries
import os
import time
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# PyTorch related imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as data
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

# Additional utilities
from tqdm import tqdm
from sklearn.model_selection import train_test_split
import itertools


def set_device(device="cpu", idx=0):
    if device != "cpu":
        if torch.cuda.device_count() > idx and torch.cuda.is_available():
            print("Cuda installed! Running on GPU {} {}!".format(idx, torch.cuda.get_device_name(idx)))
            device="cuda:{}".format(idx)
        elif torch.cuda.device_count() > 0 and torch.cuda.is_available():
            print("Cuda installed but only {} GPU(s) available! Running on GPU 0 {}!".format(torch.cuda.device_count(), torch.cuda.get_device_name()))
            device="cuda:0"
        else:
            device="cpu"
            print("No GPU available! Running on CPU")
    return device

device = set_device("cuda")
print(device)

Cuda installed! Running on GPU 0 Tesla V100-SXM2-16GB!
cuda:0


In [3]:
class CustomHandDataset(Dataset):
    def __init__(self, image_paths, transform=None):
        self.image_paths = image_paths
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('L')  # Convert to grayscale
        if self.transform:
            image = self.transform(image)
        return image

In [4]:
# Define transformations including resize to 64x64
transform = transforms.Compose([
    # transforms.Resize((64, 64)),  # Resize images to 64x64
    transforms.ToTensor(),  # Convert to tensor
    transforms.Normalize([0.5], [0.5])  # Normalize to (-1, 1) range
])

# Get all image paths

image_dir = 'drive/MyDrive/real_hands'


image_paths = [os.path.join(image_dir, img) for img in os.listdir(image_dir) if img.endswith('.jpeg')]  # List of all image paths

# Split the paths into train and test sets
train_paths, test_paths = train_test_split(image_paths, test_size=0.2, random_state=42)  # 80-20 split

# Create instances of the dataset with the new transform
train_dataset = CustomHandDataset(train_paths, transform=transform)
test_dataset = CustomHandDataset(test_paths, transform=transform)



# Now you can use train_loader and test_loader in your training and evaluation loops.

Upto now I have copied the code but here is where i changed my class to be more suitable for hyperparamter tuning. I did this because it will make the main notebook less complicated instead of having all the hyperparameters become arguments etc.

In [5]:
class VAE(nn.Module):
    def __init__(self, conv_channels=[32, 64, 128, 256], fc_size=128, latent_dim=128):
        super(VAE, self).__init__()
        self.conv_channels = conv_channels
        # Assume conv_channels = [32, 64, 128, 256]
        # Encoder layers
        self.conv1 = nn.Conv2d(1, conv_channels[0], kernel_size=4, stride=2, padding=1)  # Output: 16x16
        self.conv2 = nn.Conv2d(conv_channels[0], conv_channels[1], kernel_size=4, stride=2, padding=1)  # Output: 8x8
        self.conv3 = nn.Conv2d(conv_channels[1], conv_channels[2], kernel_size=4, stride=2, padding=1)  # Output: 4x4
        self.conv4 = nn.Conv2d(conv_channels[2], conv_channels[3], kernel_size=4, stride=2, padding=1)  # Output: 2x2

        self.fc1 = nn.Linear(conv_channels[3] * 2 * 2, latent_dim)  # Output size: mu
        self.fc2 = nn.Linear(conv_channels[3] * 2 * 2, latent_dim)  # Output size: logvar

        # Decoder layers
        self.fc3 = nn.Linear(latent_dim, conv_channels[3] * 2 * 2)  # Match to the output of the encoder
        self.deconv1 = nn.ConvTranspose2d(conv_channels[3], conv_channels[2], kernel_size=4, stride=2, padding=1)  # Output: 4x4
        self.deconv2 = nn.ConvTranspose2d(conv_channels[2], conv_channels[1], kernel_size=4, stride=2, padding=1)  # Output: 8x8
        self.deconv3 = nn.ConvTranspose2d(conv_channels[1], conv_channels[0], kernel_size=4, stride=2, padding=1)  # Output: 16x16
        self.deconv4 = nn.ConvTranspose2d(conv_channels[0], 1, kernel_size=4, stride=2, padding=1)  # Output: 32x32


    def encode(self, x):
        h1 = F.relu(self.conv1(x))
        h2 = F.relu(self.conv2(h1))
        h3 = F.relu(self.conv3(h2))
        h4 = F.relu(self.conv4(h3))
        h4_flattened = h4.view(-1, self.conv_channels[3] * 2 * 2)  # Use dynamic sizing
        return self.fc1(h4_flattened), self.fc2(h4_flattened)



    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar) # Compute std from logvar
        eps = torch.randn_like(std) # Noise
        return mu + eps * std



    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        h3 = h3.view(-1, self.conv_channels[3], 2, 2)  # Use dynamic sizing
        h4 = F.relu(self.deconv1(h3))
        h5 = F.relu(self.deconv2(h4))
        h6 = F.relu(self.deconv3(h5))
        x_recon = torch.tanh(self.deconv4(h6))
        return x_recon


    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

def vae_loss(recon_x, x, mu, logvar):
    mse_loss = F.mse_loss(recon_x, x, reduction='sum') # Summing instead of averaging
    kld_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) # Kullback-Leibler divergence loss term
    return mse_loss + kld_loss

In [6]:
def train(model, optimizer, train_loader, device, epoch):
    model.train()  # Set the model to training mode
    train_loss = 0
    for batch_idx, data in enumerate(train_loader):
        data = data.to(device)
        optimizer.zero_grad()
        recon_batch, mu, logvar = model(data)
        loss = vae_loss(recon_batch, data, mu, logvar)
        loss.backward()
        train_loss += loss.item()
        optimizer.step()

    print(f"Epoch {epoch}, Average Train Loss: {train_loss / len(train_loader.dataset):.4f}")

def validate(model, test_loader, device, epoch):
    model.eval()  # Set the model to evaluation mode
    test_loss = 0
    with torch.no_grad():
        for data in test_loader:
            data = data.to(device)
            recon, mu, logvar = model(data)
            test_loss += vae_loss(recon, data, mu, logvar).item()

    test_loss /= len(test_loader.dataset)
    print(f"Epoch {epoch}, Average Test Loss: {test_loss:.4f}")
    return test_loss  # Return the average test loss



Im going to run the grid search without using batch normalisation and then with and see which one gives me better results after if there is signficant time.

In [7]:
# Will use 25 epochs for grid search and 100 for final training
num_epochs = 25


In [8]:
batch_size_options = [32, 64, 128]
conv_channels_options = [[32, 64, 128, 256], [16, 32, 64, 128]]
latent_dim_options = [64, 128, 256]
learning_rate_options = [0,1 ,0.01, 0.001, 0.0001]



In [10]:
class EarlyStopping:
    def __init__(self, patience=5, min_delta=0):
        self.patience = patience
        self.min_delta = min_delta
        self.counter = 0
        self.best_loss = None
        self.early_stop = False

    def __call__(self, val_loss):
        if self.best_loss is None:
            self.best_loss = val_loss
        elif self.best_loss - val_loss > self.min_delta:
            self.best_loss = val_loss
            self.counter = 0
        else:
            self.counter += 1
            if self.counter >= self.patience:
                self.early_stop = True

# Initialize early stopping
early_stopping = EarlyStopping(patience=5, min_delta=0.01)

best_val_loss = float('inf')
best_params = {}
best_time = float('inf')


# Grid Search Loop
for conv_channels, latent_dim, lr, batch_size in itertools.product(conv_channels_options, latent_dim_options, learning_rate_options, batch_size_options):
    # Initialize the model
    model = VAE(conv_channels=conv_channels, latent_dim=latent_dim).to(device)
    optimizer = optim.Adam(model.parameters(), lr=lr)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    start_time = time.time()  # Start time measurement

    val_loss = float('inf')  # Initialize to inf to ensure it's always defined

    # Training loop with early stopping
    for epoch in range(1, num_epochs + 1):
        train(model, optimizer, train_loader, device, epoch)
        current_val_loss = validate(model, test_loader, device, epoch)

        if current_val_loss is not None:  # Make sure val_loss is not None
            val_loss = current_val_loss
            early_stopping(val_loss)

            if early_stopping.early_stop:
                print(f"Early stopping triggered at epoch {epoch}!")
                break

    end_time = time.time()  # End time measurement
    training_time = end_time - start_time  # Calculate training time

    # Update the best model if the current model is better
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_params = {
            'conv_channels': conv_channels,
            'latent_dim': latent_dim,
            'lr': lr,
            'batch_size': batch_size
        }
        best_time = training_time

    # Reset early stopping for the next configuration
    early_stopping = EarlyStopping(patience=5, min_delta=0.01)

# After grid search
print(f"Best Validation Loss: {best_val_loss}")
print(f"Best Hyperparameters: {best_params}")
print(f"Training Time for Best Model: {best_time:.2f} seconds")


Epoch 1, Average Train Loss: 603.2490
Epoch 1, Average Test Loss: 604.2438
Epoch 2, Average Train Loss: 603.2474
Epoch 2, Average Test Loss: 604.2517
Epoch 3, Average Train Loss: 603.2440
Epoch 3, Average Test Loss: 604.2659
Epoch 4, Average Train Loss: 603.2456
Epoch 4, Average Test Loss: 604.2472
Epoch 5, Average Train Loss: 603.2402
Epoch 5, Average Test Loss: 604.2676
Epoch 6, Average Train Loss: 603.2408
Epoch 6, Average Test Loss: 604.2587
Early stopping triggered at epoch 6!
Epoch 1, Average Train Loss: 580.4464
Epoch 1, Average Test Loss: 581.1524
Epoch 2, Average Train Loss: 580.4502
Epoch 2, Average Test Loss: 581.1579
Epoch 3, Average Train Loss: 580.4492
Epoch 3, Average Test Loss: 581.1539
Epoch 4, Average Train Loss: 580.4499
Epoch 4, Average Test Loss: 581.1541
Epoch 5, Average Train Loss: 580.4478
Epoch 5, Average Test Loss: 581.1556
Epoch 6, Average Train Loss: 580.4478
Epoch 6, Average Test Loss: 581.1629
Early stopping triggered at epoch 6!
Epoch 1, Average Train Los

KeyboardInterrupt: ignored

Ive noticed that sometimes the loss value is NaN. This could be because the learning rate is too high in this specific config.


## Note:
My grid search stopped running because for some reason google collab started multiple sessions for one other notebook. This means that the run was interupted and there is not enough time for another one. While looking at the losses for each epoch i found that the smallest loss at 10 epochs was around 65. This is very similar to a short test I was running in the main file for 10 epochs so I will keep the hyperparameters the same for now.