# Lab Assignment 1

Student name: Harish Nandhan Shanmugam

## Notebook version

This notebook includes all the codes in the codebase of lab assignment 1. Completing and submitting this script is equivalent to submitting the codebase. Please note that your submitted script should include errorless cell outputs that contain necessary information that proves you have successfully run the notebook in your own directory.

You can choose to (1) run this notebook locally on your end or (2) run this notebook on colab. For the former, you will need to download the dataset to your device that resembles the instructions for the codebase. For the latter, **you will need to upload the dataset to your Google Drive** account, and connect your colab notebook to your Google Drive. Then, go to "File->Save a copy in Drive" to create a copy you can edit.


#### Colab (if applicable)

If you are running this script on colab, uncomment and run the cell below:

In [11]:
# from google.colab import drive
# drive.mount('/content/drive')

In [1]:
import torch

print("Number of GPU: ", torch.cuda.device_count())
print("GPU Name: ", torch.cuda.get_device_name())


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)

Number of GPU:  1
GPU Name:  NVIDIA GeForce RTX 3060 Laptop GPU
Using device: cuda


Note that the Google Drive directory has the root `/content/drive/`. For instance, my directory to the dataset is `'/content/drive/My Drive/Courses/CSCI 5922/CSCI 5922 SP25/Demo/MNIST/'`.

### mnist.py

In [2]:
#Original source: https://www.kaggle.com/code/hojjatk/read-mnist-dataset
#It has been modified for ease of use w/ pytorch

#You do NOT need to modify ANY code in this file!

import numpy as np
import struct
from array import array
import torch

class MnistDataloader(object):
    def __init__(self, training_images_filepath,training_labels_filepath,
                 test_images_filepath, test_labels_filepath):
        self.training_images_filepath = training_images_filepath
        self.training_labels_filepath = training_labels_filepath
        self.test_images_filepath = test_images_filepath
        self.test_labels_filepath = test_labels_filepath

    def read_images_labels(self, images_filepath, labels_filepath):
        n = 60000 if "train" in images_filepath else 10000
        labels = torch.zeros((n, 10))
        with open(labels_filepath, 'rb') as file:
            magic, size = struct.unpack(">II", file.read(8))
            if magic != 2049:
                raise ValueError('Magic number mismatch, expected 2049, got {}'.format(magic))
            l = torch.tensor(array("B", file.read())).unsqueeze(-1)
            l = torch.concatenate((torch.arange(0, n).unsqueeze(-1), l), dim = 1).type(torch.int32)
            labels[l[:,0], l[:,1]] = 1

        with open(images_filepath, 'rb') as file:
            magic, size, rows, cols = struct.unpack(">IIII", file.read(16))
            if magic != 2051:
                raise ValueError('Magic number mismatch, expected 2051, got {}'.format(magic))
            image_data = array("B", file.read())
        images = torch.zeros((n, 28**2))
        for i in range(size):
            img = np.array(image_data[i * rows * cols:(i + 1) * rows * cols])
            #img = img.reshape(28, 28)
            images[i, :] = torch.tensor(img)

        return images, labels

    def load_data(self):
        x_train, y_train = self.read_images_labels(self.training_images_filepath, self.training_labels_filepath)
        x_test, y_test = self.read_images_labels(self.test_images_filepath, self.test_labels_filepath)
        return (x_train, y_train),(x_test, y_test)

### activations.py

In [3]:
import torch

class ReLU:
    def forward(self, x: torch.tensor) -> torch.tensor:
        """
        Applies the ReLU activation function.
        ReLU(x) = max(0, x)
        """
        return torch.maximum(torch.zeros_like(x), x)

    def backward(self, delta: torch.tensor, x: torch.tensor) -> torch.tensor:
        """
        Computes the gradient of ReLU.
        ReLU'(x) = 1 if x > 0 else 0
        """
        return delta * (x > 0).float()


class LeakyReLU:
    def __init__(self, alpha=0.1):
        """
        Initializes the LeakyReLU activation function with a specified alpha value.
        """
        self.alpha = alpha

    def forward(self, x: torch.tensor) -> torch.tensor:
        """
        Applies the Leaky ReLU activation function.
        LeakyReLU(x) = x if x > 0 else alpha * x
        """
        return torch.where(x >= 0, x, self.alpha * x)

    def backward(self, delta: torch.tensor, x: torch.tensor) -> torch.tensor:
        """
        Computes the gradient of Leaky ReLU.
        LeakyReLU'(x) = 1 if x > 0 else alpha
        """
        return delta * torch.where(x >= 0, torch.ones_like(x), self.alpha * torch.ones_like(x))


### framework.py

In [9]:
import torch
import numpy as np
import tqdm

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

class MLP:
    '''
    Multi-Layer Perceptron (MLP) for MNIST classification.
    Implements forward propagation, backpropagation, and training.
    '''
    
    def __init__(self, layer_sizes: list[int]):
        self.layer_sizes: list[int] = layer_sizes
        self.num_layers = len(layer_sizes) - 1
        self.weights: list[torch.tensor] = []
        self.biases: list[torch.tensor] = []
        self.features: list[torch.tensor] = []  

        self.learning_rate: float = 1
        self.batch_size: int = 1
        self.activation_function: callable[[torch.tensor], torch.tensor] = ReLU

    def set_hp(self, lr: float, bs: int, activation: object) -> None:
        """
        Set hyperparameters for training.
        """
        self.learning_rate = lr
        self.batch_size = bs
        self.activation_function = activation()

    def initialize(self) -> None:
        """
        Initialize all biases to zero and weights using Xavier initialization.
        """
        for i in range(self.num_layers):
            d_in = self.layer_sizes[i]
            d_out = self.layer_sizes[i + 1]
            w_range = np.sqrt(6 / (d_in + d_out))
            W = torch.empty(d_in, d_out, device=device).uniform_(-w_range, w_range)
            self.weights.append(W)
            b = torch.zeros(1, d_out, device=device) 
            self.biases.append(b)
            

    def forward(self, x: torch.tensor) -> torch.tensor:
        """
        Forward propagation through all layers.
        Applies activation function to all layers except the last one.
        """
        self.features = [x.to(device)]  

        for i in range(self.num_layers):  
            x = torch.matmul(x, self.weights[i]) + self.biases[i]
            x = self.activation_function.forward(x)  
            self.features.append(x) 
        return x
    
    def backward(self, delta: torch.Tensor) -> None:
        '''
        This function should backpropagate the provided delta through the entire MLP, and update the weights according to the hyper-parameters
        stored in the class variables.
        '''
        # back propogation starts from the result
        for i in reversed(range(self.num_layers)):
            x = self.features[i]

            delta = self.activation_function.backward(delta,self.features[i+1])
            # Computing gradients
            dW = torch.matmul(x.T,delta) / self.batch_size
            db = torch.sum(delta, dim=0, keepdim=True) / self.batch_size

            # Updating weights and biases with learning rate
            self.weights[i] -= self.learning_rate * dW
            self.biases[i] -= self.learning_rate * db
            delta = torch.matmul(delta,self.weights[i].T)

    # def backward(self, delta: torch.tensor) -> None:
    #     """
    #     Backpropagation through all layers to compute gradients.
    #     Updates weights using gradient descent.
    #     """
    #     # grad_weights = [torch.zeros_like(w) for w in self.weights]
    #     # grad_biases = [torch.zeros_like(b) for b in self.biases]

    #     for i in reversed(range(self.num_layers)):  
    #         X = self.features[i]
    #         dW = torch.matmul(X.T, delta) / self.batch_size  
    #         db = torch.sum(delta,dim=0,keepdim=True) / self.batch_size
            
    #         self.weights[i] -= self.learning_rate * dW
    #         self.biases[i] -= self.learning_rate * db

    #         # if i > 0:
    #         #     delta = (delta @ self.weights[i].T)
    #         #     if i > 1:
    #         #         delta *= self.activation_function.backward(delta,self.features[i-1])

    #         delta = torch.matmul(delta, self.weights[i].T) * self.activation_function.backward(torch.ones_like(X), X)

    # def backward(self, delta: torch.tensor) -> None:
    #     for i in reversed(range(self.num_layers)):  
    #         X = self.features[i]
    #         dW = torch.matmul(X.T, delta) / self.batch_size  
    #         db = torch.sum(delta, dim=0, keepdim=True) / self.batch_size

    #         self.weights[i] -= self.learning_rate * dW
    #         self.biases[i] -= self.learning_rate * db

    #         if i > 0:  # Skip activation function for the input layer
    #             delta = torch.matmul(delta, self.weights[i].T)

    #         # ✅ Ensure shapes match for element-wise multiplication
    #             delta *= self.activation_function.backward(delta, X)  


def TrainMLP(model: MLP, x_train: torch.tensor, y_train: torch.tensor) -> MLP:
    """
    Train the MLP for one epoch using mini-batch gradient descent with GPU support.
    """
    bs = model.batch_size
    N = x_train.shape[0]
    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0  

    for i in tqdm.tqdm(range(N // bs)):
        x = x_train[idx[i * bs:(i + 1) * bs], ...].to(device)
        y = y_train[idx[i * bs:(i + 1) * bs], ...].to(device)

        
        y_hat = model.forward(x)

        
        p = torch.exp(y_hat)
        p /= torch.sum(p, dim=1, keepdim=True)

        
        l = -1 * torch.sum(y * torch.log(p)) ### batch size not required here
        L += l

       
        delta = p - y
        model.backward(delta)

    print("Train Loss:", L / ((N // bs) * bs))



def TestMLP(model: MLP, x_test: torch.tensor, y_test: torch.tensor) -> tuple[float, float]:
    """
    Evaluate the MLP on test data using GPU support.
    """
    bs = model.batch_size
    N = x_test.shape[0]

    rng = np.random.default_rng()
    idx = rng.permutation(N)

    L = 0
    A = 0

    for i in tqdm.tqdm(range(N // bs)):
        x = x_test[idx[i * bs:(i + 1) * bs], ...].to(device)
        y = y_test[idx[i * bs:(i + 1) * bs], ...].to(device)

        y_hat = model.forward(x)

        
        p = torch.exp(y_hat)
        p /= torch.sum(p, dim=1, keepdim=True)

        
        l = -1 * torch.sum(y * torch.log(p))
        L += l.item()

        
        A += torch.sum(torch.argmax(p, dim=1) == torch.argmax(y, dim=1)).item()

    test_loss = L / ((N // bs) * bs)
    test_accuracy = 100 * A / ((N // bs) * bs)

    print(f"Test Loss: {test_loss}, Test Accuracy: {test_accuracy:.2f}%")

    return test_loss, test_accuracy  


def normalize_mnist() -> tuple[torch.tensor, torch.tensor, torch.tensor, torch.tensor]:
    '''
    This function loads the MNIST dataset, then normalizes the "X" values to have zero mean, unit variance.
    '''

    #IMPORTANT!!!#
    #UPDATE THE PATH BELOW!#
    base_path = "C:\\Users\\yoges\\Data_Science_Preparation\\CSCI 5922 Neural Networks and Deep Learning\\Lab Assignments\\Lab1Code\\MNIST\\"
    #^^^^^^^^#


    mnist = MnistDataloader(base_path + "train-images.idx3-ubyte", base_path + "train-labels.idx1-ubyte",
                            base_path + "t10k-images.idx3-ubyte", base_path + "t10k-labels.idx1-ubyte")
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_mean = torch.mean(x_train, dim = 0, keepdim = True)
    x_std = torch.std(x_train, dim = 0, keepdim = True)

    x_train -= x_mean
    x_train /= x_std
    x_train[x_train != x_train] = 0

    x_test -= x_mean
    x_test /= x_std
    x_test[x_test != x_test] = 0


    return x_train, y_train, x_test, y_test

def main():
    """
    Main function to train and evaluate the MLP model on MNIST using GPU.
    """
    x_train, y_train, x_test, y_test = normalize_mnist()

   
    model = MLP([784, 256, 10])  
    model.initialize()
    model.set_hp(lr=1e-3, bs=512, activation=ReLU)  
    
    E = 25
    for _ in range(E):
        TrainMLP(model, x_train, y_train)
        TestMLP(model, x_test, y_test)


if __name__ == "__main__":
    main()


Using device: cuda


100%|██████████| 117/117 [00:00<00:00, 286.05it/s]


Train Loss: tensor(2.4927, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 439.44it/s]


Test Loss: nan, Test Accuracy: 16.19%


100%|██████████| 117/117 [00:00<00:00, 423.14it/s]


Train Loss: tensor(2.2616, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 487.55it/s]


Test Loss: nan, Test Accuracy: 22.76%


100%|██████████| 117/117 [00:00<00:00, 443.96it/s]


Train Loss: tensor(2.1270, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 683.49it/s]


Test Loss: nan, Test Accuracy: 29.14%


100%|██████████| 117/117 [00:00<00:00, 487.38it/s]


Train Loss: tensor(2.0020, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 797.17it/s]


Test Loss: nan, Test Accuracy: 37.53%


100%|██████████| 117/117 [00:00<00:00, 457.31it/s]


Train Loss: tensor(1.8385, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 728.30it/s]


Test Loss: nan, Test Accuracy: 49.77%


100%|██████████| 117/117 [00:00<00:00, 490.60it/s]


Train Loss: tensor(1.6241, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 730.19it/s]


Test Loss: nan, Test Accuracy: 60.53%


100%|██████████| 117/117 [00:00<00:00, 443.96it/s]


Train Loss: tensor(1.4065, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 622.33it/s]


Test Loss: nan, Test Accuracy: 67.50%


100%|██████████| 117/117 [00:00<00:00, 459.58it/s]


Train Loss: tensor(1.2290, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 700.53it/s]


Test Loss: nan, Test Accuracy: 71.60%


100%|██████████| 117/117 [00:00<00:00, 460.58it/s]


Train Loss: tensor(1.0949, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 722.64it/s]


Test Loss: nan, Test Accuracy: 75.04%


100%|██████████| 117/117 [00:00<00:00, 407.92it/s]


Train Loss: tensor(0.9938, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 463.11it/s]


Test Loss: nan, Test Accuracy: 77.22%


100%|██████████| 117/117 [00:00<00:00, 399.29it/s]


Train Loss: tensor(0.9153, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 789.61it/s]


Test Loss: nan, Test Accuracy: 79.29%


100%|██████████| 117/117 [00:00<00:00, 469.48it/s]


Train Loss: tensor(0.8529, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 612.58it/s]


Test Loss: nan, Test Accuracy: 80.61%


100%|██████████| 117/117 [00:00<00:00, 469.19it/s]


Train Loss: tensor(0.8012, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 697.22it/s]


Test Loss: nan, Test Accuracy: 81.55%


100%|██████████| 117/117 [00:00<00:00, 482.72it/s]


Train Loss: tensor(0.7583, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 530.08it/s]


Test Loss: nan, Test Accuracy: 82.46%


100%|██████████| 117/117 [00:00<00:00, 497.93it/s]


Train Loss: tensor(0.7218, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 354.36it/s]


Test Loss: nan, Test Accuracy: 83.22%


100%|██████████| 117/117 [00:00<00:00, 412.30it/s]


Train Loss: tensor(0.6902, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 524.06it/s]


Test Loss: nan, Test Accuracy: 83.61%


100%|██████████| 117/117 [00:00<00:00, 469.12it/s]


Train Loss: tensor(0.6626, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 706.47it/s]


Test Loss: nan, Test Accuracy: 84.32%


100%|██████████| 117/117 [00:00<00:00, 481.27it/s]


Train Loss: tensor(0.6380, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 673.32it/s]


Test Loss: nan, Test Accuracy: 84.93%


100%|██████████| 117/117 [00:00<00:00, 492.33it/s]


Train Loss: tensor(0.6166, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 746.58it/s]


Test Loss: nan, Test Accuracy: 85.30%


100%|██████████| 117/117 [00:00<00:00, 436.13it/s]


Train Loss: tensor(0.5971, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 623.80it/s]


Test Loss: nan, Test Accuracy: 85.67%


100%|██████████| 117/117 [00:00<00:00, 476.11it/s]


Train Loss: tensor(0.5796, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 712.71it/s]


Test Loss: nan, Test Accuracy: 86.00%


100%|██████████| 117/117 [00:00<00:00, 471.01it/s]


Train Loss: tensor(0.5640, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 668.50it/s]


Test Loss: nan, Test Accuracy: 86.48%


100%|██████████| 117/117 [00:00<00:00, 478.46it/s]


Train Loss: tensor(0.5495, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 571.54it/s]


Test Loss: nan, Test Accuracy: 86.74%


100%|██████████| 117/117 [00:00<00:00, 441.60it/s]


Train Loss: tensor(0.5364, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 479.80it/s]


Test Loss: nan, Test Accuracy: 86.96%


100%|██████████| 117/117 [00:00<00:00, 417.94it/s]


Train Loss: tensor(0.5243, device='cuda:0')


100%|██████████| 19/19 [00:00<00:00, 712.00it/s]

Test Loss: nan, Test Accuracy: 87.21%



