#
<span style="font-family: Times New Roman; font-size: 20px;">
<h1 align='center'> 
MLP Model
</h1>

<span style="font-family: Times New Roman; font-size: 10px;">
<h1> 
This notebook contains an MLP model with comparable number of parameters as VGG16 and comparison of model's performance with the other models used in Q1_part1. The distribution of neurons in layers can be choosen as per the requirement.
</h1>

##
<span style="font-family: Times New Roman; fontsize = 20px;">
<h2> 
MLP Model Class Definition
</h2>

In [10]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision
from torchvision.transforms import transforms
import tensorflow as tf
import torch.nn.functional as F
import os,shutil

In [11]:
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.features = nn.Sequential(
            nn.Linear(3*256*256, 756),
            nn.ReLU(inplace=True),
            nn.Linear(756, 128),
            nn.ReLU(inplace=True),
            nn.Linear(128, 10),
            nn.ReLU(inplace=True),
        )
        self.classifier = nn.Linear(10, 2)
        
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.features(x)
        x = self.classifier(x)
        return x

In [12]:
model = MLP()
#no. of parameters
total_params = sum(p.numel() for p in model.parameters())
print(f'Total Parameters: {total_params:,}')

Total Parameters: 148,734,612


##
<span style="font-family: Times New Roman; font-size: 20px;">
<h2> 
Data Loading
</h2>

In [13]:
# Loading the data
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor()
])
test_data = torchvision.datasets.ImageFolder(root='data/test', transform=transform)
train_data = torchvision.datasets.ImageFolder(root='data/train', transform=transform)
train_it = torch.utils.data.DataLoader(train_data, batch_size=5, shuffle=True)
test_it = torch.utils.data.DataLoader(test_data, batch_size=5, shuffle=True)

##
<span style="font-family: Times New Roman; font-size: 20px;">
<h2> 
Training Function
</h2>

In [16]:
def train(model, train_it,writer, epochs=10):
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
    step = 0
    for epoch in range(epochs):
        for i, (inputs, targets) in enumerate(train_it):
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            with writer.as_default():
                tf.summary.scalar('Training Loss (MLP Model)', loss.item(), step)
            step += 1
        print(f'Epoch: {epoch+1}, Loss: {loss.item()}')

In [18]:
if(os.path.exists('log')):
    shutil.rmtree('log')
writer = tf.summary.create_file_writer('log')
train(model, train_it, writer, epochs=10)
writer.close()

Epoch: 1, Loss: 0.7133498191833496
Epoch: 2, Loss: 0.6803696155548096
Epoch: 3, Loss: 0.7407587170600891
Epoch: 4, Loss: 0.6820197701454163
Epoch: 5, Loss: 0.7337031364440918
Epoch: 6, Loss: 0.7066976428031921
Epoch: 7, Loss: 0.7055037021636963
Epoch: 8, Loss: 0.7051143646240234
Epoch: 9, Loss: 0.6849305033683777
Epoch: 10, Loss: 0.7235835194587708


##
<span style="font-family: Times New Roman; font-size: 20px;">
<h2> 
Testing the MLP Model on Test Dataset
</h2>

In [19]:
def predict(model, test_it):
    correct = 0
    total = 0
    pred = []
    import numpy as np
    with torch.no_grad():
        for inputs, targets in test_it:
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()
            pred.append(predicted.numpy())
    print(f'Accuracy: {100 * correct / total}')

In [20]:
predict(model, test_it)

Accuracy: 50.0


In [21]:
predict(model, train_it)

Accuracy: 50.0


In [None]:
%tensorboard

$$
\begin{aligned}
& \text {Model Comparison Table}\\
&\begin{array}{cccc}
\hline \hline \text { Model } & \text { Training time } & \text { Training loss } & \text { Training Accuracy } & \text { Testing accuracy } & \text { No. of Model para. } \\
\hline VGG1block & 1:18 & 0.146 & 100 & 95 & 134,223,009 \\
VGG3Block & 0:31 & 0.0009 & 100 & 90 & 33,652,065 \\
VGG3Block\ (data aug) & 0:31 & 0.0001 & 100 & 92.5 & 33,652,065 \\
VGG16\ final\ MLP & 1:02 & 0.998 & 99.375 & 97.5 & 138,617,929 (260,385) \\
VGG16\ all & 3:30 & 2.62e-05 & 100 & 100 &  138,487,753\\
MLP\ Model & 14:22 & 0.7235 & 50 & 50 &  148,734,612\\
\hline
\end{array}
\end{aligned}
$$

##
<span style="font-family: Times New Roman; font-size: 20px;">
<h2> 
Subjective Questions
</h2>

###
<span style="font-family: Times New Roman; font-size: 20px;">
<h3> 
Question 1: Are the results as expected? Why or why not?
</h3>

####
<span style="font-family: Times New Roman; font-size: 23px;">
<p>
The results are as expected. The VGG Models with 1 and 3 blocks having the lower accuracy. On data-augmentation the accuracy of the model should increase on the test-set and we can see that happening. The VGG16 model with only final MLP layers as trainable shows a good accuracy of 97.5%. The VGG16 model with all layers trainable shows 100% accuracy. This happened because we allowed all the layers of the VGG model to be trainable and the model was able to learn more features of the images accurately and also took less time than the VGG Models with 1 block because it is trained from the scratch. While in the VGG16 models we are training these pre-trained models which is why it is taking less time. The MLP model has the lowest accuracy because with the parameters comparable to the CNN models, the model is not able to learn the dependencies between the nearby pixels of the image and hence the accuracy is low. Also, the no. of neurons in the layer 1 drastically reduces to 756 in layer 2 which results in huge loss of information.
</p>

###
<span style="font-family: Times New Roman; font-size: 20px;">
<h3> 
Question 2: Does data augmentation help? Why or why not?
</h3>

####
<span style="font-family: Times New Roman; font-size: 23px;">
<p>
YES! Data-Augmentation helps the model to learn about the classes better and also helps in generalizing the model. The model then tends to not learn the background and other features of the image which are not important for the classification. This helps in increasing the accuracy of the model on the test-set.
</p>

###
<span style="font-family: Times New Roman; font-size: 20px;">
<h3> 
Question 3: Does it matter how many epochs you fine tune the model? Why or why not?
</h3>

####
<span style="font-family: Times New Roman; font-size: 23px;">
<p>
Yes! The number of epochs matter because the model learns the features of the image in each epoch. If the model is trained for more epochs then the model will learn more features of the image and will be able to classify the images more accurately. But if the model is trained for more epochs then the model will overfit the training data and will not be able to generalize the model. Hence, the number of epochs should be choosen in such a way that the model is able to learn the features of the image and also generalize the model.
</p>

###
<span style="font-family: Times New Roman; font-size: 20px;">
<h3> 
Question 4: Are there any particular images that the model is confused about? Why or why not?
</h3>

####
<span style="font-family: Times New Roman; font-size: 23px;">
<p>
YES! There are some images which are even for us is hard to tell that what class it belongs to. The model is also confused about these images because the features of the image are not clear. Hence, the model is confused about these images.
Some images are:
</p>

<img src="False1.png" alt="False1">
<img src="False2.png" alt="False2">

<p>
The first image is of a baby cheetah and the second image is also of a cheetah. But these two images are very difficult of us to tell if it is a cheetah or leopard. Their appearance is very similar and hence the model is also confused about these images.