# About: Deep Learning '23 Assignment 1


We will perform Image Classification task on the MNIST dataset which has 70,000 28*28 images labelled into 10 classes. 

**Total Marks: 60**


**Fill these**

Name: Abhishek Garg

Roll Number: 22BM6JP03

**Instructions:**

1. We have left code cells blank for you to fill up with appropriate code. Do not add any extra code cells. Strictly follow the format and fill up the cells with the correct code. Refer to cell comments for what to fill in that cell.

2. *Do not* use any training frameworks like PyTorch Lightning. This assignment will test your ability to write custom training loops.

3. Save the notebook with cell outputs of all cells. The cell outputs  will be used for evaluating your submission.




In [4]:
import torch
import torch.nn as nn
import random
import numpy as np

from torchvision import datasets, transforms
from torch.utils.data import random_split, DataLoader


## Add any other imports here

In [2]:
SEED=42
torch.manual_seed(SEED)
random.seed(SEED)
np.random.seed(SEED)

## Getting the data

In [3]:
train_data = datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor())
test_data = datasets.MNIST('data', train=False, download=True, transform=transforms.ToTensor())
train, val = random_split(train_data, [50000, 10000], generator=torch.Generator().manual_seed(SEED))

train_loader = DataLoader(train, batch_size=64, shuffle=True)
val_loader = DataLoader(val, batch_size=64, shuffle=False)
test_loader = DataLoader(test_data, batch_size=64, shuffle=False)

print(len(train), len(val), len(test_data))

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw

50000 10000 10000


## Defining the Model [18 marks]

You will define 3 models, with 2, 3, 4 hidden layers respectively. Lets call these models A, B, C. We will be studying the comparitive performance of these 3 models on this task.

Use ReLU as the activation function for all three models. Later we will experiment with other activation functions as well.

### Model A

Architecture:

1. Input Layer 
2. Hidden Layer (Dimension Size - 64)
3. Activation Function
4. Hidden Layer (Dimension Size - 128)
5. Activation Function
6. Output Layer (Dimension Size = Number of Classes = 10)

In [14]:
# Model A Definition 
class NetRelu(nn.Module):
    
    # Constructor
    def __init__(self, data_input, H1_output, H2_output, Data_out):
        super(NetRelu, self).__init__()
        self.linear1 = nn.Linear(data_input, H1_output)
        self.linear2 = nn.Linear(H1_output, H2_output)
        self.linear3 = nn.Linear(H2_output, Data_out)
    
    # Prediction
    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.linear1(x))  
        x = torch.relu(self.linear2(x))
        x = self.linear3(x)
        return x


# Fill in appropriately while maintaining the name of the variable
model = NetRelu(28*28,64,128,10)

### Model B


Architecture:

1. Input Layer 
2. Hidden Layer (Dimension Size - 64)
3. Activation Function
4. Hidden Layer (Dimension Size - 128)
5. Activation Function
6. Hidden Layer (Dimension Size - 256)
7. Activation Function
8. Output Layer (Dimension Size = Number of Classes = 10)

In [15]:
# Model B Definiton

class NetRelu(nn.Module):
    
    # Constructor
    def __init__(self, data_input, H1_output, H2_output, H3_output, Data_out):
        super(NetRelu, self).__init__()
        self.linear1 = nn.Linear(data_input, H1_output)
        self.linear2 = nn.Linear(H1_output, H2_output)
        self.linear3 = nn.Linear(H2_output, H3_output)
        self.linear4 = nn.Linear(H3_output, Data_out)
    
    # Prediction
    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.linear1(x))  
        x = torch.relu(self.linear2(x))
        x = torch.relu(self.linear3(x))
        x = self.linear4(x)
        return x

# Use the same variable name
model = NetRelu(28*28,64,128,256,10)

### Model C


Architecture

1. Input Layer 
2. Hidden Layer (Dimension Size - 64)
3. Activation Function
4. Hidden Layer (Dimension Size - 128)
5. Activation Function
6. Hidden Layer (Dimension Size - 256)
7. Activation Function
8. Hidden Layer (Dimension Size - 512)
9. Activation Function
10. Output Layer (Dimension Size = Number of Classes = 10)

In [16]:
# Model C Definition
class NetRelu(nn.Module):
    
    # Constructor
    def __init__(self, data_input, H1_output, H2_output, H3_output,H4_output, Data_out):
        super(NetRelu, self).__init__()
        self.linear1 = nn.Linear(data_input, H1_output)
        self.linear2 = nn.Linear(H1_output, H2_output)
        self.linear3 = nn.Linear(H2_output, H3_output)
        self.linear4 = nn.Linear(H3_output, H4_output)
        self.linear5 = nn.Linear(H4_output, Data_out)
    
    # Prediction
    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.linear1(x))  
        x = torch.relu(self.linear2(x))
        x = torch.relu(self.linear3(x))
        x = torch.relu(self.linear4(x))
        x = self.linear5(x)
        return x

# Use the same variable name
model = NetRelu(28*28,64,128,256,512,10)

## Loss Function & Optimizer [2 marks]

* Loss Function: Cross Entropy Loss
* Optimizer : Adam

Use PyTorch Library versions for these two.

In [10]:
# Use the same variable names
criterion = nn.CrossEntropyLoss()
learning_rate = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

## Training Loop [30 marks]

We give you the freedom to choose Hyperparameters like learing rate, number of epochs etc, but take care to use the **same** hyperparameters for all the 3 models. Also clearly state the hyperparameters you have chosen

For each model, You need to report these metrics: Train Loss, Val Loss, Train Accuracy, Val Accuracy at the end of each epoch.

Also plot the graphs of the following (in separate cells)
1. Train Loss & Val Loss V/s Epoch
2. Train Accuracy & Val Accuracy V/s Epoch

In [11]:
# Define the hyperparameters (same for all 3 models) here
learning_rate = 0.01
num_epochs= 10


### Model A 



In [17]:
# Training Loop for model A

def train(model, criterion, train_loader, validation_loader, optimizer, epochs=100):
    i = 0
    metrics = {'training_loss': [], 'validation_accuracy': []}  
    epoch_losses_train = []
    epoch_losses_valid = []
    iteration_valid_losses=[]
    
    for epoch in range(epochs):
        runningLoss=0
        for i, (x, y) in enumerate(train_loader):
            optimizer.zero_grad()
            z = model(x.view(-1, 28 * 28))
            loss = criterion(z, y)
            loss.backward()
            optimizer.step()
            runningLoss += loss.item()
            metrics['training_loss'].append(loss.data.item())
        epoch_losses_train.append(runningLoss / len(train_loader))
        
        runningLossValid= 0
        with torch.no_grad():
           for i, (inputs, labels) in enumerate(val_loader):
               outputs = model(inputs)
               loss = criterion(outputs, labels)
               runningLossValid += loss.item()
               iteration_valid_losses.append(loss.item())
           epoch_losses_valid.append(runningLossValid / len(val_loader))
        
        correct = 0
        for x, y in test_loader:
            z = model(x.view(-1, 28 * 28))
            _, label = torch.max(z, 1)
            correct += (label == y).sum().item()
    
        accuracy = 100 * (correct / len(test_data))
        metrics['validation_accuracy'].append(accuracy)
        print(f"Epoch {epoch:3}. Training Loss: {epoch_losses_train[-1]:.6f},  Validation Loss: {epoch_losses_valid[-1]:.6f} , Accuracy: {correct} / {len(test_data)} [{100 * correct / len(test_data):.2f}%]")
            
    
    return metrics

In [18]:
training_results_relu = train(model, criterion, train_loader, test_loader, optimizer, epochs=num_epochs)

Epoch   0. Training Loss: 2.302280,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   1. Training Loss: 2.302280,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   2. Training Loss: 2.302283,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   3. Training Loss: 2.302272,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   4. Training Loss: 2.302288,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   5. Training Loss: 2.302285,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   6. Training Loss: 2.302272,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   7. Training Loss: 2.302284,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   8. Training Loss: 2.302276,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]
Epoch   9. Training Loss: 2.302273,  Validation Loss: 2.302259 , Accuracy: 1010 / 10000 [10.10%]


In [None]:
# Plot Graph of Train & Val Loss vs Epoch (together in same plot) for model A
plt.plot(training_results_relu['training_loss'], label='relu')
plt.plot(training_results_relu['validation_accuracy'])
plt.ylabel('loss')
plt.title('training loss iterations')
plt.legend()

In [None]:
# Plot Graph of Train & Val Accuracy vs Epoch (together in same plot) for model A

### Model B


In [None]:
# Training Loop for model B

In [None]:
# Plot Graph of Train & Val Loss vs Epoch (together in same plot) for model B

In [None]:
# Plot Graph of Train & Val Accuracy vs Epoch (together in same plot) for model B

### Model C


In [None]:
# Training Loop for model C

In [None]:
# Plot Graph of Train & Val Loss vs Epoch (together in same plot) for model C

In [None]:
# Plot Graph of Train & Val Accuracy vs Epoch (together in same plot) for model C

## Choosing a Activation Function [10 marks]

Based on the best performing model you found above, define 2 more models with these 2 activation functions (1 activation function is used throughout the model definiation). Use these Activation funcitons 


*   Tanh
*   LeakyRELU

In [None]:
# Leaky ReLU model definiton

# Tanh model definition


# Maintain these variable names
model_lrelu = ...
model_tanh = ...

### Training 

Train these two models with the same hyperparameters. Train in separate cells given below, and report the same metrics descrived previously (train_loss, val_loss, train_acc, val_acc)

In [None]:
# Training Loop for LRELU

In [None]:
# Training Loop for TanH

### Results on Test Set

Report the Test Set classfication accuracy for the three activation functions (ReLU, LeakyReLU & TanH) and state which activation function gave the best performance on test set

In [None]:
# Define how to calculate Accuracy on Test Set

In [None]:
# Accuracy of RELU model

In [None]:
# Accuracy of TanH model

In [None]:
# Accuracy of LeakyReLU model

Fill in these with the values you obtained from training.

* ReLU model Test Set Accuracy: `....` %
* TanH model Test Set Accuracy: `....` %
* LeakReLU model Test Set Accuracy: `....` %