#### **Welcome to Assignment 4 on Deep Learning for Computer Vision.**
This assignment is based on the content you learned in Week-4.

#### **Instructions**
1. Use Python 3.x to run this notebook
2. Write your code only in between the lines 'YOUR CODE STARTS HERE' and 'YOUR CODE ENDS HERE'. You should not change anything else in the code cells, if you do, the answers you are supposed to get at the end of this assignment might be wrong.
3. Read documentation of each function carefully.
4. All the Best!

In [1]:
# Please DO NOT modify this cell.

import os
import os.path as osp
import random

import numpy as np
import torch

def set_seed(seed: int):

    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
    os.environ["PYTHONHASHSEED"] = str(seed)

### We will be training a 1-hidden layer multi-layer perceptron on a randomly generated dataset.

In [2]:
# Please DO NOT modify this cell.

num_features = 10
classes = [0, 1, 2, 3, 4]
num_classes = len(classes)

num_samples = 100
num_train = 70
num_test = num_samples - num_train

In [3]:
# Please DO NOT modify this cell.
# We are creating a random feature set and a random label set.
# The features and labels have no semantic meaning and might as well be garbage.

set_seed(2022)

features = np.random.random_sample((num_samples, num_features))
labels = np.random.choice(classes, size = num_samples)

# Train-test split
x_train = features[:num_train]
x_test = features[num_train:num_samples]

x_train = torch.Tensor(x_train)
x_test = torch.Tensor(x_test)

y_train = labels[:num_train]
y_test = labels[num_train:num_samples]

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

In [4]:
# Please DO NOT modify this cell.

print(f"Train features: {x_train.shape}")
print(f"Test features: {x_test.shape}")

print(f"Train labels: {y_train.shape}")
print(f"Train labels: {y_test.shape}")


Train features: torch.Size([70, 10])
Test features: torch.Size([30, 10])
Train labels: torch.Size([70])
Train labels: torch.Size([30])


In [5]:
# Create train and test TensorDatasets from the respective numpy arrays

#### YOUR CODE STARTS HERE ####

train_dataset = torch.utils.data.TensorDataset(x_train, y_train)
test_dataset = torch.utils.data.TensorDataset(x_test, y_test)

#### YOUR CODE ENDS HERE ####

In [6]:
# Create dataloaders using the datasets created in the previous cell.
# Use a batch size of 64

#### YOUR CODE STARTS HERE ####

batch_size = 64

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = batch_size)

#### YOUR CODE ENDS HERE ####

### Network Definition

In [7]:
# Please DO NOT modify this cell.

# Number of neurons in the hidden layer of our MLP
num_hidden = 512

In [8]:
import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, num_features, num_classes, num_hidden):
        super(MLP, self).__init__()
        
        #### YOUR CODE STARTS HERE ####
        
        # define a linear layer with output channels as 32
        self.hidden = nn.Linear(num_features, num_hidden)
        # Define a ReLU activation
        self.relu = nn.ReLU()
        # define a linear layer with output features corresponding to the number of classes
        self.classifier = nn.Linear(num_hidden, num_classes)
        
        #### YOUR CODE ENDS HERE ####

    def forward(self, x):
        # Use the layers defined above in a sequential way (folow the same as the layer definitions above) and 
        # write the forward pass, use a relu activation after the hidden layer
        
        #### YOUR CODE STARTS HERE ####
        
        out = self.hidden(x)
        out = self.relu(out)
        out = self.classifier(out)
        
        #### YOUR CODE ENDS HERE ####
        
        return out

### Training and Inference

In [9]:
def train(model, device, train_loader, optimizer, criterion, epoch):
    model.train()
    epoch_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        
        #### YOUR CODE STARTS HERE ####
        
        # send the data, target to the device
        data = data.to(device)
        target = target.to(device)
    
        # flush out the gradients stored in optimizer
        optimizer.zero_grad()
        
        # pass the batch to the model and assign the output to variable named y_pred
        output = model(data)
        
        # calculate the loss (use CrossEntropyLoss in pytorch)
        loss = criterion(output, target)
        
        # do a backward pass
        loss.backward()
        
        # update the weights
        optimizer.step()
          
        #### YOUR CODE ENDS HERE ####
    
        # Store loss
        epoch_loss += loss.item() * data.shape[0]
        
    print(f"Train Average Loss: {epoch_loss/len(train_loader.dataset):.2f}")

In [10]:
def test(model, device, test_loader, criterion, mode):
    model.eval()
    test_loss = 0
    correct = 0
    
    with torch.no_grad():
        for data, target in test_loader:
            #### YOUR CODE STARTS HERE ####
            
            # send data, target to the device
            data = data.to(device)
            target = target.to(device)
            
            # pass the image to the model and assign the output to variable named output
            output = model(data)
            #### YOUR CODE ENDS HERE ####
            
            test_loss += criterion(output, target).item() * data.shape[0]  # sum up batch loss
            pred = output.argmax(dim = 1, keepdim = True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_acc = 100. * correct / len(test_loader.dataset)
    
    print(f"{mode} Average loss: {test_loss:.2f}")
    print(f"{mode} Accuracy: {correct}/{len(test_loader.dataset)} ({test_acc:.2f}%)")

In [11]:
set_seed(2022)

num_epochs = 300

#### YOUR CODE STARTS HERE ####

# check availability of GPU and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize MLP model
model = MLP(num_features, num_classes, num_hidden).to(device)

# Define Adam Optimizer with a learning rate of 0.001
optimizer = torch.optim.Adam(model.parameters(), lr = 0.001)

# Define CrossEntropyLoss as the criterion
criterion = nn.CrossEntropyLoss()

#### YOUR CODE ENDS HERE ####

for epoch in range(1, num_epochs+1):
    print(f"\nEpoch: {epoch}/{num_epochs}")
    
    train(model, device, train_loader, optimizer, criterion, epoch)
    test(model, device, test_loader, criterion, mode = "Test")


Epoch: 1/300
Train Average Loss: 1.64
Test Average loss: 1.58
Test Accuracy: 12/30 (40.00%)

Epoch: 2/300
Train Average Loss: 1.61
Test Average loss: 1.59
Test Accuracy: 10/30 (33.33%)

Epoch: 3/300
Train Average Loss: 1.60
Test Average loss: 1.60
Test Accuracy: 10/30 (33.33%)

Epoch: 4/300
Train Average Loss: 1.59
Test Average loss: 1.61
Test Accuracy: 10/30 (33.33%)

Epoch: 5/300
Train Average Loss: 1.59
Test Average loss: 1.61
Test Accuracy: 10/30 (33.33%)

Epoch: 6/300
Train Average Loss: 1.58
Test Average loss: 1.61
Test Accuracy: 10/30 (33.33%)

Epoch: 7/300
Train Average Loss: 1.58
Test Average loss: 1.61
Test Accuracy: 10/30 (33.33%)

Epoch: 8/300
Train Average Loss: 1.57
Test Average loss: 1.61
Test Accuracy: 9/30 (30.00%)

Epoch: 9/300
Train Average Loss: 1.56
Test Average loss: 1.62
Test Accuracy: 9/30 (30.00%)

Epoch: 10/300
Train Average Loss: 1.56
Test Average loss: 1.62
Test Accuracy: 9/30 (30.00%)

Epoch: 11/300
Train Average Loss: 1.55
Test Average loss: 1.62
Test Acc

### Question 1

What are total number of parameters in the model? 

1. 8197
2. 18521
3. 8356
4. 9105

In [12]:
print(sum(p.numel() for p in model.parameters()))

8197


### Question 2

Report the final train accuracy (If you are not getting the exact number shown in 
options, please report the closest number).

1. 58%
2. 93%
3. 100%
4. 89%

In [13]:
test(model, device, train_loader, criterion, mode = "Train")

Train Average loss: 0.27
Train Accuracy: 70/70 (100.00%)


### Question 3

Report the final test accuracy (If you are not getting the exact number shown in 
options, please report the closest number).

1. 30%
2. 40%
3. 10%
4. 70%

In [14]:
test(model, device, test_loader, criterion, mode = "Test")

Test Average loss: 2.91
Test Accuracy: 9/30 (30.00%)


### [Optional] 

- Think about the experimented that we just conducted. 
- We trained a 1 hidden layer MLP with a completely random dataset (garbage, effectively) with no connection between the features and the labels. In spite of that, the MLP model was able to memorize the training set.
- Consequently, the model was unable to discern any meaningful patterns in the test set.
- While this may not be a practically useful, it goes to show the power of simple 1 hidden layer MLPs.
- Read through the following paper for a better treatment of this phenomenon: __Understanding deep learning requires rethinking generalization__ https://arxiv.org/abs/1611.03530 (ICLR 2017)