# EE954 : Deep Learning Fundamentals
### Assignment 1

**Group Number:** 12  
**Team Members:**  
- Lokesh  (Roll No: 241562482)  
- Akshay (Roll No: _____)

---



General Instructions 

• Late submissions will not be accepted. 

• Any form of plagiarism will result inpenalties. If you refer to any online material or books,cite them properly.

• You may use Google Colab or Kaggle to train your models. 

• The use of the Numpy library is permitted. 

• The use of Tensor Flow library is strictly prohibited. 

• The use of PyTorch library is allowed with restrictions.

### 1. DatasetPreparation (5Marks) 

1.1 Download and Split (2Marks) 

•Download the Fashion-MNIST dataset. You can either download it from here,or import it directly into your code using PyTorch’s torchvision.datasets module.

• Both sources provide separate training and test splits; however, you will need to create a separate validation set from the training data. 

From Perplexity.AI - 
'

Fashion-MNIST is a widely used machine learning dataset consisting of 70,000 grayscale images (28x28 pixels) of fashion items from 10 categories, such as T-shirts, trousers, dresses, and shoes. There are 60,000 images for training and 10,000 for testing. Each image is labeled with one of the 10 clothing classes. Fashion-MNIST was designed as a more challenging, modern replacement for the original MNIST handwritten digits dataset, while maintaining the same format and structure for easy benchmarking and comparison of machine learning algorithms.

Each example is a 28x28 grayscale image of a fashion item, labeled with one of 10 classes:
0: T-shirt/top
1: Trouser
2: Pullover
3: Dress
4: Coat
5: Sandal
6: Shirt
7: Sneaker
8: Bag
9: Ankle boot

'

Documentation on dataset - https://github.com/zalandoresearch/fashion-mnist

In [2]:
!pip install torch torchvision
import torch
from torchvision import datasets, transforms

# Download Fashion-MNIST training set
#dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())  # Note : Since we are using transforms.ToTensor(), the images will be converted to PyTorch tensors and normalized to [0, 1]. This is relevant for section 1.2 which calls for an explicit normalization step.normalization function for completeness., even though the images have already been normalized to [0, 1] by the transform. from 0 to 255 to 0 to 1.


# Define the augmentation pipeline
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),  # 50% chance to flip horizontally[3]
    transforms.RandomVerticalFlip(p=0.5),    # 50% chance to flip vertically[4][2]
    transforms.RandomRotation(degrees=360),  # Random rotation by any angle between 0 and 360 degrees[1][6]
    transforms.ToTensor()                    # Convert image to tensor and normalize to [0, 1]
])

# Apply the transform when loading the dataset
dataset = datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True, 
    transform=transform
)

# Split into training and validation sets (e.g., 80% train, 20% val)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])




In [3]:
# For datasets created via random_split
print(f"Training set size: {len(train_dataset)}")
print(f"Validation set size: {len(val_dataset)}")

# For a single sample (image, label)
image, label = train_dataset[0]
print(f"Image shape: {image.shape}, Label: {label}")


# Helper function to extract all labels from a Subset
def get_all_labels(subset):
    return [subset[i][1] for i in range(len(subset))]

# Get all labels
train_labels = get_all_labels(train_dataset)
val_labels = get_all_labels(val_dataset)

# Get unique labels
unique_train_labels = set(train_labels)
unique_val_labels = set(val_labels)

print("Unique labels in train_dataset:", unique_train_labels)
print("Unique labels in val_dataset:", unique_val_labels)

all_pixels = torch.cat([train_dataset[i][0].view(-1) for i in range(len(train_dataset))])
min_pixel = all_pixels.min().item()
max_pixel = all_pixels.max().item()
print("Min pixel value:", min_pixel)
print("Max pixel value:", max_pixel)

Training set size: 48000
Validation set size: 12000
Image shape: torch.Size([1, 28, 28]), Label: 7
Unique labels in train_dataset: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Unique labels in val_dataset: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Min pixel value: 0.0
Max pixel value: 1.0


1.2 Data Preprocessing (3 Marks) 

• Define a preprocess function to preprocess the images. (The function should have the same name).
• The function should flatten the images. 
• The function should also normalize the pixel values to the range [0,1]. Depending on how you have implemented the code so far, the pixel values might already be normalized to this range. However, for clarity and completeness, include an explicit normalization step regardless.

In [4]:
from torch.utils.data import Dataset

def preprocess(image):
    """
    Flattens a 28x28 image to a 784-dim vector and normalizes pixel values to [0, 1].
    Args:
        image (torch.Tensor): Image tensor of shape (1, 28, 28) or (28, 28)
    Returns:
        torch.Tensor: Flattened and normalized image of shape (784,)
    Note that if the image is already normalized to the range [0, 1] when we used the tranform.totensor() while calling the dataset, this function will not change it.
    """
    # Ensure image is float and normalize to [0,1]
    image = image.float() / 255.0 if image.max() > 1 else image.float()
    # Flatten the image
    return image.view(-1)

# Create a wrapper dataset to apply preprocessing

class PreprocessedDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image, label = self.dataset[idx]
        return preprocess(image), label  # Apply preprocessing

# Wrap your existing datasets
train_dataset_preprocessed = PreprocessedDataset(train_dataset)
val_dataset_preprocessed = PreprocessedDataset(val_dataset)

# Verify
sample_img, sample_label = train_dataset_preprocessed[1]
print(sample_img.shape)          # torch.Size([784])
print(sample_img.min(), sample_img.max())  # Should be tensor(0.) tensor(1.)
print(sample_label)            # Corresponding label
print("Unique labels in train_dataset_preprocessed:", set(get_all_labels(train_dataset_preprocessed)))

torch.Size([784])
tensor(0.) tensor(1.)
1
Unique labels in train_dataset_preprocessed: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


2. MLP Implementation (22 Marks) 

Create a custom class: Define a class named MLP to implement the Multi-Layer Perceptron functionalities. 

You must not use any of PyTorch’s built-in implementations for linear layers, activation functions, or loss computations. 

All linear layers should be implemented manually by explicitly defining weights and biases using functions such as torch.randn and torch.zeros. 

Activation functions and loss functions must also be implemented using their mathematical definitions. 

The class must have the following functions. You can add more depending on your implementation. 

2.1 Forward Pass from Scratch (8 Marks) 

A function named forward to implement the forward pass logic manually. 

2.2 Backward Pass from Scratch (12 Marks) 

Implement a function named backward that manually computes the gradients and performs weight and bias updates. This function must explicitly implement the backward pass logic using the chain rule. Do not use PyTorch’s autograd system (i.e., no .backward() or automatic differentiation). 

2.3 Cross-Entropy Loss Function (2 Mark) 

A function named compute loss to manually calculate cross entropy loss.

In [3]:
def activation_function(x,type='sigmoid'):
    """
    Applies the activation function.
    Args:
        x (torch.Tensor): Input tensor
    """
    if type == 'sigmoid':
        return (1/(1+torch.exp(-x)))  # Sigmoid activation
    elif type == 'tanh':
        return (1-torch.exp(-x))/(1+torch.exp(-x))  # Tanh activation
    elif type == 'softmax':
        return torch.exp(x) / torch.sum(torch.exp(x), dim=0)
    elif type == 'relu':
        return torch.clamp(x, min=0)
    elif type == 'leaky_relu':
        return torch.where(x > 0, x, 0.01 * x)
    else:
        print("Invalid activation function type. Using ReLU as default.")
        return torch.clamp(x, min=0) # ReLU activation

In [4]:
def compute_loss (y_pred, y_true):
    """
    Computes the loss function.
    Args:
        y_pred (torch.Tensor): Predicted output
        y_true (torch.Tensor): True labels
    """
    return -torch.sum(y_true * torch.log(y_pred + 1e-15))  # Cross-entropy loss & numerical stability for log(0)

### HyperParametes

In [None]:
# Hyperparameters in relation to model architecture
# Input Size
input_size = 784  # Flattened image size (28x28)
# Output Size
output_size = 10  # Number of classes (0-9 for Fashion-MNIST)
# List of number of hidden layers and number of neurons in each hidden layer
hidden_size = [256, 128, 64]  # Example: 3 hidden layers with 256, 128 and 64 neurons respectively
# Activation function for hidden layers
activation_hidden = 'relu'  # Activation function for hidden layers (e.g., 'sigmoid', 'tanh', 'relu', 'leaky_relu')

# Hyperparameters in relation to model training
# Optimizer (SGD)
# Learning rate
learning_rate = 0.01  # Learning rate for SGD
# Batch size
batch_size = 32  # Number of samples per gradient update
# Number of epochs
epochs = 10  # Number of epochs to train the model 
# Momentum
momentum = 0.9  # Momentum for SGD

In [None]:
def forward_pass(X, W, b, activation='sigmoid'):
    """
    Performs a forward pass through the network.
    Args:
        X (torch.Tensor): Input data
        W (torch.Tensor): Weights
        b (torch.Tensor): Biases
        activation (str): Activation function type
    Returns:
        torch.Tensor: Output of the network after applying activation function
    """
    z = torch.matmul(X, W) + b  # Linear transformation
    return activation_function(z, activation)  # Apply activation function

In [None]:
def backward_pass(X, y_true, y_pred, W, b, learning_rate=0.01):
    """
    Performs a backward pass through the network.
    Args:
        X (torch.Tensor): Input data
        y_true (torch.Tensor): True labels
        y_pred (torch.Tensor): Predicted output
        W (torch.Tensor): Weights
        b (torch.Tensor): Biases
        learning_rate (float): Learning rate for weight updates
    Returns:
        torch.Tensor: Updated weights and biases
    """
    # Compute gradients
    dW = torch.matmul(X.T, (y_pred - y_true)) / X.size(0)  # Gradient w.r.t. weights
    db = torch.sum(y_pred - y_true, dim=0) / X.size(0)  # Gradient w.r.t. biases

    # Update weights and biases using gradient descent
    W -= learning_rate * dW  # Update weights
    b -= learning_rate * db  # Update biases

    return W, b

In [3]:
class MLP:
     def __init__(self, input_size = 784, output_size = 10, hidden_size = [256, 128, 64], activation_hidden = 'relu', learning_rate = 0.01, batch_size = 32, epochs = 10, momentum = 0.9):
             self.num_layers = len(hidden_size) + 1 # Including output layer
             self.weights = []
             self.biases = []
             layer_sizes = [input_size] + hidden_size + [output_size]
             for i in range(self.num_layers):
                # Initialize weights and biases for each layer
                W = torch.randn(layer_sizes[i], layer_sizes[i+1]) * 0.01
                b = torch.zeros(layer_sizes[i+1])
                self.weights.append(W)
                self.biases.append(b)
             self.z=[none]*(self.num_layers)
             self.a=[none]*(self.num_layers+1) # a[0] is the input layer ---> a[1] is the first hidden layer, and so on... check why we need to activate the input layer as well
     
     def forward(self, X):
            self.a[0] = X
            for i in range(self.num_layers-1): #Hidden layers
                  self.z[i] = self.a[i]@self.weights[i] + self.biases[i]
                  self.a[i+1]= activation_function(self.z[i], activation_hidden)
          # Output layer
            i=self.num_layers-1
            self.z[i] = self.a[i]@self.weights[i] + self.biases[i]
            self.a[i+1]= activation_function(self.z[i], 'softmax')
            # Output layer activation (softmax)
            # return self.a[-1]  # Return the output of the last layer (softmax activation)
    
     def backward(self, X,y_true):
         
         """Performs backpropagation through the network.
          Args:
          X : Input data
          y_true : True labels"""
         m=y_true.shape[0]
         grads_W = []
         grads_b = []
         dz= (self.a[-1]-y_true)/m # Gradient of the loss w.r.t. output layer activation
         grads_W.insert(0,self.a[-2].T @ dz)
         grads_b.insert(0,np.sum(dz, axis=0))
         for i in reversed(range(self.num_layers - 1)):
            da = dz @ self.weights[i+1].T
            dz = da * (self.z[i] > 0).astype(float)  # ReLU derivative
            grads_W.insert(0, self.a[i].T @ dz)
            grads_b.insert(0, np.sum(dz, axis=0))
            
        # Update parameters
         for i in range(self.num_layers):
            self.weights[i] -= self.lr * grads_W[i]
            self.biases[i] -= self.lr * grads_b[i]

     def compute_loss(self, y_true): 
         """Computes the loss function.
          Args:y_true : True labels"""
         return -np.sum(y_true * np.log(np.clip(self.a[-1], 1e-15, 1-1e-15)))  # Cross-entropy loss & numerical stability for log(0)