# EE954 : Deep Learning Fundamentals
### Assignment 1

**Group Number:** 12  
**Team Members:**  
- Lokesh  (Roll No: 241562482)  
- Akshay (Roll No: _____)

---



General Instructions 

• Late submissions will not be accepted. 

• Any form of plagiarism will result inpenalties. If you refer to any online material or books,cite them properly.

• You may use Google Colab or Kaggle to train your models. 

• The use of the Numpy library is permitted. 

• The use of Tensor Flow library is strictly prohibited. 

• The use of PyTorch library is allowed with restrictions.

### 1. DatasetPreparation (5Marks) 

1.1 Download and Split (2Marks) 

•Download the Fashion-MNIST dataset. You can either download it from here,or import it directly into your code using PyTorch’s torchvision.datasets module.

• Both sources provide separate training and test splits; however, you will need to create a separate validation set from the training data. 

From Perplexity.AI - 
'

Fashion-MNIST is a widely used machine learning dataset consisting of 70,000 grayscale images (28x28 pixels) of fashion items from 10 categories, such as T-shirts, trousers, dresses, and shoes. There are 60,000 images for training and 10,000 for testing. Each image is labeled with one of the 10 clothing classes. Fashion-MNIST was designed as a more challenging, modern replacement for the original MNIST handwritten digits dataset, while maintaining the same format and structure for easy benchmarking and comparison of machine learning algorithms.

Each example is a 28x28 grayscale image of a fashion item, labeled with one of 10 classes:
0: T-shirt/top
1: Trouser
2: Pullover
3: Dress
4: Coat
5: Sandal
6: Shirt
7: Sneaker
8: Bag
9: Ankle boot

'

Documentation on dataset - https://github.com/zalandoresearch/fashion-mnist

In [5]:
import torch
from torchvision import datasets, transforms
from torch.utils.data import random_split, Dataset, DataLoader
import tqdm

def transform_dataset(apply_transform=True):
    # Define the training transform
    if apply_transform:
        transform_pipeline_train = transforms.Compose([
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomVerticalFlip(p=0.5),
            transforms.RandomRotation(degrees=360),
            transforms.ToTensor()
        ])
    else:
        transform_pipeline_train = transforms.ToTensor()

    # Define the validation transform (no augmentation!)
    transform_pipeline_val = transforms.ToTensor()

    # Load dataset with respective transforms
    dataset_train = datasets.FashionMNIST(
        root='./data',
        train=True,
        download=True, 
        transform=transform_pipeline_train
    )

    dataset_val = datasets.FashionMNIST(
        root='./data',
        train=True,
        download=True, 
        transform=transform_pipeline_val
    )

    # Split into train and val — use same size!
    train_size = int(0.8 * len(dataset_train))
    val_size = len(dataset_train) - train_size

    train_dataset, _ = random_split(dataset_train, [train_size, val_size])
    _, val_dataset = random_split(dataset_val, [train_size, val_size])

    return train_dataset, val_dataset

# 🚀 Call the function
train_dataset, val_dataset = transform_dataset(apply_transform=True)


In [6]:
# For datasets created via random_split
print(f"Training set size: {len(train_dataset)}")
print(f"Validation set size: {len(val_dataset)}")


# For a single sample (image, label)
image, label = train_dataset[0]
print(f"Image shape: {image.shape}, Label: {label}")


# Helper function to extract all labels from a Subset
def get_all_labels(subset):
    return [subset[i][1] for i in range(len(subset))]

# Get all labels
train_labels = get_all_labels(train_dataset)
val_labels = get_all_labels(val_dataset)

# Get unique labels
unique_train_labels = set(train_labels)
unique_val_labels = set(val_labels)

print("Unique labels in train_dataset:", unique_train_labels)
print("Unique labels in val_dataset:", unique_val_labels)

all_pixels = torch.cat([train_dataset[i][0].view(-1) for i in range(len(train_dataset))])
min_pixel = all_pixels.min().item()
max_pixel = all_pixels.max().item()
print("Min pixel value:", min_pixel)
print("Max pixel value:", max_pixel)



Training set size: 48000
Validation set size: 12000
Image shape: torch.Size([1, 28, 28]), Label: 7
Unique labels in train_dataset: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Unique labels in val_dataset: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Min pixel value: 0.0
Max pixel value: 1.0


1.2 Data Preprocessing (3 Marks) 

• Define a preprocess function to preprocess the images. (The function should have the same name).
• The function should flatten the images. 
• The function should also normalize the pixel values to the range [0,1]. Depending on how you have implemented the code so far, the pixel values might already be normalized to this range. However, for clarity and completeness, include an explicit normalization step regardless.

In [8]:
def preprocess(image):
    """
    Flattens a 28x28 image to a 784-dim vector and normalizes pixel values to [0, 1].

    Args:
        image (torch.Tensor): Image tensor of shape (1, 28, 28) or (28, 28)
    
    Returns:
        torch.Tensor: Flattened and normalized image of shape (784,)
    """
    image = image.to(torch.float32)  # Ensure float32

    if image.max() > 1: # Explicit normalization (even if already [0,1])
        image = image / 255.0
        
    if image.ndim == 3 and image.shape[0] == 1: # Remove channel dim if present
        image = image.squeeze(0)

    return image.view(-1)  



class PreprocessedDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset  # Original dataset

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image, label = self.dataset[idx]
        return preprocess(image), label


# Wrap your existing datasets
train_dataset_preprocessed = PreprocessedDataset(train_dataset)
val_dataset_preprocessed = PreprocessedDataset(val_dataset)

# Verify
sample_img, sample_label = train_dataset_preprocessed[1]
print(sample_img.shape)          # torch.Size([784])
print(sample_img.min(), sample_img.max())  # Should be tensor(0.) tensor(1.)
print(sample_label)            # Corresponding label
print("Unique labels in train_dataset_preprocessed:", set(get_all_labels(train_dataset_preprocessed)))

torch.Size([784])
tensor(0.) tensor(1.)
2
Unique labels in train_dataset_preprocessed: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


2. MLP Implementation (22 Marks) 

Create a custom class: Define a class named MLP to implement the Multi-Layer Perceptron functionalities. 

You must not use any of PyTorch’s built-in implementations for linear layers, activation functions, or loss computations. 

All linear layers should be implemented manually by explicitly defining weights and biases using functions such as torch.randn and torch.zeros. 

Activation functions and loss functions must also be implemented using their mathematical definitions. 

The class must have the following functions. You can add more depending on your implementation. 

2.1 Forward Pass from Scratch (8 Marks) 

A function named forward to implement the forward pass logic manually. 

2.2 Backward Pass from Scratch (12 Marks) 

Implement a function named backward that manually computes the gradients and performs weight and bias updates. This function must explicitly implement the backward pass logic using the chain rule. Do not use PyTorch’s autograd system (i.e., no .backward() or automatic differentiation). 

2.3 Cross-Entropy Loss Function (2 Mark) 

A function named compute loss to manually calculate cross entropy loss.

In [10]:
def one_hot(y, num_classes):
    """
    Manually one-hot encode a tensor of class indices.
    Args:
        y (torch.Tensor): Class indices, shape (B,)
        num_classes (int): Total number of classes
    Returns:
        torch.Tensor: One-hot encoded tensor, shape (B, num_classes)
    """
    y = y.long()  # Ensure integer type
    B = y.shape[0]
    one_hot_tensor = torch.zeros((B, num_classes), dtype=torch.float32)
    one_hot_tensor[torch.arange(B), y] = 1.0
    return one_hot_tensor


In [44]:
class MLP:

    
    def __init__(self, input_size = 784, output_size = 10, hidden_size = [512, 256, 128, 64], 
                  activation_hidden = 'relu', learning_rate = 0.001, batch_size = 64, 
                  epochs = 10, momentum = 0.9):
    
     
         self.activation_hidden = activation_hidden
         self.lr = learning_rate
         self.num_layers = len(hidden_size) + 1 # Including output layer
         self.weights = []
         self.biases = []
         layer_sizes = [input_size] + hidden_size + [output_size]
         for i in range(self.num_layers):
            # Initialize weights and biases for each layer
            W = torch.randn(layer_sizes[i], layer_sizes[i+1]) * 0.01
            b = torch.zeros(layer_sizes[i+1])
            self.weights.append(W)
            self.biases.append(b)
         self.z=[None]*(self.num_layers)
         self.a=[None]*(self.num_layers+1) # a[0] is the input layer ---> a[1] is the first hidden layer, and so on... check why we need to activate the input layer as well

    def activation_function(self, x,kind='relu'):
        """
        Applies the activation function using ReLU as default.
        Args:
            x (torch.Tensor): Input tensor
        """
        if kind == 'sigmoid':
            return (1/(1+torch.exp(-x))) # Sigmoid activation
        elif kind == 'tanh':
            return (torch.exp(x) - torch.exp(-x)) / (torch.exp(x) + torch.exp(-x)) # Tanh activation
        elif kind == 'softmax':
            z = x - torch.max(x, dim=1, keepdim=True).values
            exp_z = torch.exp(z)
            return exp_z / torch.sum(exp_z, dim=1, keepdim=True)
        elif kind == 'relu':
            return torch.clamp(x, min=0)
        elif kind == 'leaky_relu':
            return torch.where(x > 0, x, 0.01 * x)
        else:
            print("Invalid activation function type. Using ReLU as default.")
            return torch.clamp(x, min=0)
       
    def activation_derivative(self, z, kind='relu'):
        """
        Computes the derivative of the activation function manually (no inbuilt PyTorch activations).
        Args:
            z (torch.Tensor): Pre-activation input
            kind (str): Activation type
        Returns:
            torch.Tensor: Derivative w.r.t. z, dtype float32
        """
        if kind == 'relu':
            return (z > 0)  # d/dz (ReLU) = 1 if z > 0 else 0
    
        elif kind == 'sigmoid':
            sig = 1 / (1 + torch.exp(-z))  # sigmoid(z)
            return (sig * (1 - sig))  # d/dz (sigmoid) = σ(z)(1 - σ(z))
    
        elif kind == 'tanh':
            return (1 - torch.tanh(z) ** 2)  # d/dz (tanh) = 1 - tanh²(z)
    
        elif kind == 'leaky_relu':
            return torch.where(z > 0, torch.ones_like(z), 0.01 * torch.ones_like(z))  # 1 for z > 0, else 0.01
    
        else:
            print(f"[Warning] Activation derivative for '{kind}' not found. Defaulting to ReLU.")
            return (z > 0)


    def forward(self, X):
        
        self.a[0] = X
        for i in range(self.num_layers-1): #Hidden layers
              self.z[i] = self.a[i]@self.weights[i] + self.biases[i]
              self.a[i+1]= self.activation_function(self.z[i], self.activation_hidden)
      # Output layer
        i=self.num_layers-1
        self.z[i] = self.a[i]@self.weights[i] + self.biases[i]
        self.a[i+1]= self.activation_function(self.z[i], 'softmax') # Output layer activation (softmax)
        
        return self.a[-1]  # Return the output of the last layer (softmax activation)
        
    def backward(self, X, y_true):
        m = y_true.shape[0]
        
        y_true_oh = one_hot(y_true, num_classes=self.a[-1].shape[1])  # Manual one-hot
        dz = (self.a[-1] - y_true_oh) / m
    
        grads_W = []
        grads_b = []
        grads_W.insert(0, self.a[-2].T @ dz)
        grads_b.insert(0, torch.sum(dz, axis=0))
    
        for i in reversed(range(self.num_layers - 1)):
            da = dz @ self.weights[i + 1].T
            dz = da * self.activation_derivative(self.z[i], self.activation_hidden)
            grads_W.insert(0, self.a[i].T @ dz)
            grads_b.insert(0, torch.sum(dz, axis=0))
    
        for i in range(self.num_layers):
            self.weights[i] -= self.lr * grads_W[i]
            self.biases[i] -= self.lr * grads_b[i]



    def compute_loss(self, y_true):
        y_pred = self.a[-1]
        log_probs = torch.log(torch.clamp(y_pred, 1e-15, 1 - 1e-15))
        
        # Ensure y_true is integer index type for indexing
        if y_true.dtype != torch.long:
            y_true = y_true.long()
    
        loss = -log_probs[torch.arange(y_true.shape[0]), y_true]
        return torch.mean(loss)


### HyperParametes

In [47]:
# Hyperparameters in relation to model architecture
sample_img, sample_label = train_dataset_preprocessed[1]

# Input Size
input_size =  sample_img.shape # Flattened image size (28x28)
# Output Size
output_size = len(unique_train_labels)  # Number of classes (0-9 for Fashion-MNIST)
# List of number of hidden layers and number of neurons in each hidden layer
hidden_size = [256, 128, 64]  # Example: 3 hidden layers with 256, 128 and 64 neurons respectively

# Hyperparameters in relation to model training
# Optimizer (SGD)
# Learning rate
learning_rate = 0.01  # Learning rate for SGD
# Batch size
batch_size = 32  # Number of samples per gradient update
# Number of epochs
epochs = 50  # Number of epochs to train the model 
# Momentum
momentum = 0.9  # Momentum for SGD

In [49]:
model = MLP(activation_hidden = 'relu')
train_loader = DataLoader(train_dataset_preprocessed, batch_size = batch_size, shuffle=True)
val_loader = DataLoader(val_dataset_preprocessed, batch_size=batch_size, shuffle=False)

In [51]:
from tqdm import tqdm

for epoch in range(epochs):
    print(f"epoch:{epoch+1}")
    for X_batch, y_batch in train_loader:
        # Optional: convert to float32 if using custom NumPy → torch conversion
        X_batch = X_batch.to(dtype=torch.float32)
        y_batch = y_batch.to(dtype=torch.float32)  # one-hot or class labels

        # Forward pass
        y_pred = model.forward(X_batch)

        # Compute loss
        loss = model.compute_loss(y_batch)


        # Backward pass
        model.backward(X_batch, y_batch)
    print(f"loss{loss}")


epoch:1
loss2.3026680946350098
epoch:2
loss2.3026256561279297
epoch:3
loss2.30302095413208
epoch:4
loss2.3028109073638916
epoch:5
loss2.3022427558898926
epoch:6
loss2.3023643493652344
epoch:7
loss2.302272081375122
epoch:8
loss2.3030166625976562
epoch:9
loss2.3024561405181885
epoch:10
loss2.302394151687622
epoch:11
loss2.303654670715332
epoch:12
loss2.3016748428344727
epoch:13
loss2.302753448486328
epoch:14
loss2.3021678924560547
epoch:15
loss2.3015527725219727
epoch:16
loss2.303570032119751
epoch:17
loss2.303300380706787
epoch:18
loss2.3003153800964355
epoch:19
loss2.3015034198760986
epoch:20
loss2.304832935333252
epoch:21
loss2.303502321243286
epoch:22
loss2.302882432937622
epoch:23
loss2.3034214973449707
epoch:24
loss2.3017373085021973
epoch:25
loss2.3007283210754395
epoch:26
loss2.3011744022369385
epoch:27
loss2.304950475692749
epoch:28
loss2.302155017852783
epoch:29
loss2.302532196044922
epoch:30
loss2.304537773132324
epoch:31
loss2.3020496368408203
epoch:32
loss2.301856517791748
e

In [16]:
print(loss)

tensor(0.8327)
