<a href="https://colab.research.google.com/github/ak1909552/Artificial-Neural-Networks/blob/main/assignments/assignment1/Assignment1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Imports

In [None]:
import torch
import torch.nn as nn
from torch.autograd import Variable
from torchvision import datasets
from torchvision import transforms
import matplotlib.pyplot as plt
from collections import OrderedDict

<a name='mnist_mlp'></a>
## The MNIST_MLP class

This class has been modified to create models with hidden layers. The basic modification is the use of `nn.Sequential()` which allows layers and activation functions to be stacked together. As a result, `foward()` remains much the same and is easy to implement. This also allows the `model.parameters()` call in the main code to work without having to implement it in the `MNIST_MLP` class.

In [None]:
class MNIST_MLP(nn.Module):
    
    def __init__(self, layer_sizes=[784, 10], activation=None):
        super().__init__()
        self.layer_sizes = layer_sizes 
        
        # Different activations that you can use in forward() method.
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        self.tanh = nn.Tanh()
        self.identity = nn.Identity()

        '''
        self.activation is the activation function determined by the constructor argument
        Defaults to self.relu
        '''
        
        if activation == 'sigmoid':
            self.activation = self.sigmoid
        elif activation == 'tanh':
            self.activation = self.tanh
        elif activation == 'identity':
            self.activation = self.identity
        else:
            self.activation = self.relu

        '''
        self.network represents the neural network to be constructed. 

        nn.Sequential accepts ordered dictionaries (dictionaries with the order of keys preserved)
        to create a stacked network. 

        The structure of self.layer_dict is as follows:
        {
            'layer<i>' : nn.Linear(dim<i>, dim<i+1>),
            'activation<i>' : self.activation,
            ...
        }

        Based on the requirements of the questions, the last layer does not have an activation function.
        '''
        
        self.layer_dict = OrderedDict()

        for i in range(len(layer_sizes) - 1):
          self.layer_dict[f'layer{i}'] = nn.Linear(layer_sizes[i], layer_sizes[i+1])
          self.layer_dict[f'activation{i}'] = self.activation
        
        if len(layer_sizes) > 2:
            del self.layer_dict[f'activation{len(layer_sizes) - 2}']
            # self.layer_dict[f'activation{len(layer_sizes) - 2}'] = nn.Softmax(1)

        self.network = nn.Sequential(self.layer_dict) 

    
    def forward(self, input):

        '''
        The forward pass remains the same, and looks very clean :)
        '''

        input = input.view(-1, self.layer_sizes[0])
        # Switch from activation maps to vectors
        x = self.network(input)
        return x
    
    # This function maps the network on the device that is passed as argument.
    # If your device doesn't have a GPU, it set device='cpu'.
    def set_device(self, device):
        self.device = device
        self.to(self.device)
    
    # This function trains the model on the data passed as arguments.

    def fit(self, mnist_train_loader, num_epochs=1, mnist_valid_loader=None):
        train_loss_history = []
        train_acc_history = []
        valid_loss_history = []
        valid_acc_history = []
        
        for epoch in range(num_epochs):
            
            self.train() # Set to the training mode.
            iter_loss = 0
            iter_acc = 0
            for i, (items, classes) in enumerate(mnist_train_loader):
                items = Variable(items).to(self.device)
                classes = Variable(classes).to(self.device)

                self.optimizer.zero_grad()     # Clear off the gradients from any past operation
                outputs = self.forward(items)      # Do the forward pass
                loss = self.criterion(outputs, classes) # Calculate the loss
                loss.backward()           # Calculate the gradients with help of back propagation
                self.optimizer.step()          # Ask the optimizer to adjust the parameters based on the gradients
                iter_loss += loss.data # Accumulate the loss
                iter_acc += (torch.max(outputs.data, 1)[1] == classes.data).sum()
                print("\r", i + 1, "/", len(mnist_train_loader), ", Loss: ", loss.data/len(items), end="")
            train_loss_history += [iter_loss.cpu().detach().numpy()]
            train_acc_history += [(iter_acc/len(mnist_train_loader.dataset)).cpu().detach().numpy()]
            print("\tTrain: ", train_loss_history[-1], train_acc_history[-1], end="")
            
            self.eval() # Set to the evaluation mode.
            iter_loss = 0
            iter_acc = 0
            for i, (items, classes) in enumerate(mnist_valid_loader):
                items = Variable(items).to(self.device)
                classes = Variable(classes).to(self.device)

                outputs = self(items)      # Do the forward pass
                iter_loss += self.criterion(outputs, classes).data
                iter_acc += (torch.max(outputs.data, 1)[1] == classes.data).sum()
            valid_loss_history += [iter_loss.cpu().detach().numpy()]
            valid_acc_history += [(iter_acc/len(mnist_valid_loader.dataset)).cpu().detach().numpy()]
            print("\tValidation: ", valid_loss_history[-1], valid_acc_history[-1])
        
        return train_loss_history, train_acc_history, valid_loss_history, valid_acc_history



<a name="perceptron"></a>
## The perceptron class

This class implements the perceptron learning algorithm (PLA). It creates a single fully connected layer and initializes it with random weights. In the `forward`() method, the output is calculated as $w.x + b$ and the `torch.heaviside()` function is used as activation. In the `fit()` method, the weights and bias are calculated using the following formula:
$$
w \leftarrow w + \alpha \hspace{1mm} \times \hspace{1mm} (y(v) \hspace{1mm} - \hspace{1mm} \hat{y}(v)) \hspace{1mm} \times \hspace{1mm} v
$$

The **bias** has been considered as a weight and inputs are affected accordingly.

A simple description of algorithm:

![PLA](https://miro.medium.com/max/1032/1*PbJBdf-WxR0Dd0xHvEoh4A.png)

In [None]:
class perceptron(nn.Module):
    '''
    Implements the perceptron learning algorithm (PLA) as defined by Rosenblat.
    Inherits various properties from nn.Module.

    methods:
    
    1) __init__():

    Parameters:
      layer_sizes: Integer array, length = 2
      Defines the number of inputs and the number of perceptrons in a fully connected layer. Default = [784, 2]

      learning_rate: float
      Defines the learning rate of the algorithm. Default = 0.0001

    Function:
      simple constructor. Initializes the network with random weights and sets the learning_rate

    2) forward():

    Parameters:
      inputs: Torch.Tensor()
      Inputs of the mnist data_loader

    Function:
      Simple forward pass of perceptron. Computes the pre-activation as w.x + b. The activation
      function is simply torch.Heaviside(). 

    Returns:
      Torch.Tensor() containing the activation values of each tensor.

    3) fit():
    
    Parameters:
      train_loader: torch.utils.data.Dataloader()
      Contains the training data 

      val_loader: torch.utils.data.DataLoader()
      Contains the evaluation data. Defaults to None

      epochs: int
      Number of epochs to train the percpetron for. Defaults to 1.
    '''

    def __init__(self, layer_sizes=[784, 2], learning_rate=0.0001):
        super().__init__()
        self.layer_sizes = layer_sizes     
        self.fc1 = nn.Linear(layer_sizes[0], layer_sizes[1])
        self.fc1.weight.data = torch.randn(layer_sizes[1], layer_sizes[0])
        self.learning_rate = learning_rate
    
    def set_device(self, device):
        self.device = device
        # self.fc1.to(self.device)
        self.to(self.device)

    def forward(self, inputs):
        inputs_flattened = inputs.view(-1, self.layer_sizes[0])
        pre_activation = self.fc1(inputs_flattened)
        outputs = torch.heaviside(pre_activation, torch.tensor([0.0]).to(self.device))
        return outputs
    
    def fit(self, train_loader, val_loader = None, epochs = 1):
        
        train_accuracy = []
        test_accuracy = []

        trens = len(train_loader.dataset)
        vals = len(val_loader.dataset)
        
        self.train()
        
        iw = torch.randn(1,785).to(self.device)   ## store the inputs and 1
        wb = torch.randn(2, 785).to(self.device)  ## store the weights and bias

        for epoch in range(epochs):
            tren = 0
            for i, (items, classes) in enumerate(train_loader):
                items = Variable(items).to(self.device)
                classes = Variable(classes).to(self.device)
                outputs = self(items)

                ## optimization

                weights = self.fc1.weight.data
                bias = self.fc1.bias.data

                for j in range(len(items)):
                    actual = classes[j]
                    predicted = outputs[j][0]
                    

                    if predicted != actual:
                        tren = tren + 1
                        inputs = items[j].view(-1, self.layer_sizes[0])

                        iw[0][:-1] = inputs
                        iw[0][-1] = torch.tensor([1.0])

                        wb[0][:-1] = weights[0]
                        wb[0][-1] = bias[0]
                        wb[1][:-1] = weights[1]
                        wb[1][-1] = bias[1]

                        wb[0] = wb[0] + self.learning_rate*(actual - predicted)*iw
                        wb[1] = wb[1] + self.learning_rate*(predicted - actual)*iw
                        self.fc1.weight.data[0] = wb[0][:-1]
                        self.fc1.weight.data[1] = wb[1][:-1]
                        self.fc1.bias.data[0] = wb[0][-1]
                        self.fc1.bias.data[1] = wb[1][-1] 

            tren_correct = trens - tren
            train_accuracy += [(tren_correct / trens)]
            print(f'Epoch {epoch + 1}. Trained with accuracy of {train_accuracy[epoch]:.4f}.')

            self.eval()
            val = 0
            for i, (items, classes) in enumerate(val_loader):
                items = Variable(items).to(self.device)
                classes = Variable(classes).to(self.device)
                outputs = self(items)

                for j in range(len(outputs)):
                    actual = classes[j]
                    predicted = outputs[j][0]

                    if predicted != actual:
                        val = val + 1

            val_correct = vals - val
            test_accuracy += [(val_correct / vals)]
            print(f'Epoch {epoch + 1}. Tested with accuracy of {test_accuracy[epoch]:.4f}.', end = '\n\n')

        return train_accuracy, test_accuracy

## The `mnist_loader()` function

In [None]:
def mnist_loader(batch_size=512, classes=None):
    transform=transforms.Compose([transforms.ToTensor()])
    mnist_train = datasets.MNIST('./data', train=True, download=True, transform=transform)
    mnist_valid = datasets.MNIST('./data', train=False, download=True, transform=transform)
    
    # Select the classes which you want to train the classifier on.
    if classes is not None:
        mnist_train_idx = (mnist_train.targets == -1)
        mnist_valid_idx = (mnist_valid.targets == -1)
        for class_num in classes:
            mnist_train_idx |= (mnist_train.targets == class_num)
            mnist_valid_idx |= (mnist_valid.targets == class_num) 
        
        mnist_train.targets = mnist_train.targets[mnist_train_idx]
        mnist_valid.targets = mnist_valid.targets[mnist_valid_idx]
        mnist_train.data = mnist_train.data[mnist_train_idx]
        mnist_valid.data = mnist_valid.data[mnist_valid_idx]
    
    mnist_train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=1)
    mnist_valid_loader = torch.utils.data.DataLoader(mnist_valid, batch_size=batch_size, shuffle=True, num_workers=1)
    return mnist_train_loader, mnist_valid_loader

## Setting the device

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


## Question 1:

The solution implements the perceptron class. The training is done on the classes 0 and 1. The accuracies for training and evaluation are plotted. For details, refer to [the perceptron class](#perceptron) above. 
A sample run gives the following result:

<img src="https://github.com/ak1909552/hostgifs/blob/main/Screen%20Shot%202022-10-06%20at%206.52.24%20PM.png?raw=true" alt="drawing" width="500"/>


In [None]:
batch_size = 512 # Reduce this if you get out-of-memory error
    
mnist_train_loader, mnist_valid_loader = mnist_loader(batch_size=batch_size, classes=[0,1])
perc = perceptron()
perc.set_device(device)
tah, vah = perc.fit(mnist_train_loader, mnist_valid_loader, 20)
plt.figure()
plt.plot(tah, label='Train Accuracy')
plt.plot(vah, label='Validation Accuracy')
plt.legend()
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.tight_layout()
plt.savefig('Question_1.pdf')

## Question 2

The modified `MNIST_MLP` class is able to create a multi-layered model. The linear activation function available in torch is `nn.Identity()`. For details, refer to the [`MNIST_MLP`](#mnist_mlp) class. The model is trained for 20 epochs with the following sample result.

<img src="https://github.com/ak1909552/hostgifs/blob/main/Screen%20Shot%202022-10-06%20at%206.53.51%20PM.png?raw=true" alt="drawing" width="500"/>

In [None]:
batch_size = 512
mnist_train_loader, mnist_valid_loader = mnist_loader(batch_size=batch_size)
model = MNIST_MLP(layer_sizes=[784, 20, 20, 10], activation='identity')
model.set_device(device)
# Our loss function and Optimizer

model.criterion = nn.CrossEntropyLoss()
model.optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) #lr is the learning_rate
# Train model for 20 epochs
tlh, tah, vlh, vah = model.fit(mnist_train_loader, num_epochs=20, mnist_valid_loader=mnist_valid_loader)
plt.figure()
plt.plot(tah, label='Train Accuracy')
plt.plot(vah, label='Validation Accuracy')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.tight_layout()
plt.savefig('Question_2.pdf')

## Question 3

The code compares the performace of different activation functions, namely *relu*, *sigmoid* and *tanh*. The model performs best with *relu*, followed closely by *tanh*. Further, the graph also give us an idea of the learning rate. Here too, *relu* is seen to have a significantly better learning rate than the other activation functions. The following table summarizes the results:

| Activation | Accuracy | Learning rate |
|:-----------|:--------:|:-------------:|
| relu | 90-92% | High |
| tanh | 85-88% | Moderate |
| sigmoid | 60-62% | Low |



Following is a sample output:

<img src="https://github.com/ak1909552/hostgifs/blob/main/Screen%20Shot%202022-10-06%20at%206.53.29%20PM.png?raw=true" alt="drawing" width="500"/>

In [None]:
plt.figure()
for activation in (['relu', 'sigmoid', 'tanh']):
    # The model with activation
    print(activation, end = '\n\n')
    model = MNIST_MLP(layer_sizes=[784, 20, 20, 10], activation=activation)
    model.set_device(device)
    # Our loss function and Optimizer
    model.criterion = nn.CrossEntropyLoss()
    model.optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) #lr is the learning_rate
    # Train model for 20 epochs
    tlh, tah, vlh, vah = model.fit(mnist_train_loader, num_epochs=20, mnist_valid_loader=mnist_valid_loader)
    # plt.plot(tah)
    plt.plot(vah, label=activation + ' activation')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.tight_layout()
plt.savefig('Question_3.pdf')

## Question 4

The code compares the performance of 3 models of increasing depths, 3, 4 and 5. All the models perform well. However, depth of 5 gives the best results, followed by 4 and 3. The learning rates of the models are also similar, indicating that there aren't too many sub-features to be learnt. Provided is a sample output:

<img src="https://github.com/ak1909552/hostgifs/blob/main/Screen%20Shot%202022-10-06%20at%206.54.13%20PM.png?raw=true" alt="drawing" width="500"/>

In [None]:
layer_sizes_list = [
    [784, 20, 10],
    [784, 20, 20, 10],
    [784, 50, 30, 20, 10]
]
plt.figure()
for d, layer_sizes in enumerate(layer_sizes_list):
    print(layer_sizes, end = '\n\n')
    # The model with activation
    model = MNIST_MLP(layer_sizes=layer_sizes, activation='relu')
    model.set_device(device)
    # Our loss function and Optimizer
    model.criterion = nn.CrossEntropyLoss()
    model.optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) #lr is the learning_rate
    # Train model for 20 epochs
    tlh, tah, vlh, vah = model.fit(mnist_train_loader, num_epochs=20, mnist_valid_loader=mnist_valid_loader)
    # plt.plot(tah)
    plt.plot(vah, label='Depth = ' + str(d+3))
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.tight_layout()
plt.savefig('Question_4.pdf')

## Question 5

The code compares the performance of varying width of the hidden layers, 5, 20 and 50. The performance is best with width of 50, followed by 20 and 5. Width of 50 also gives a significant increase in the learning rate. The following table summarizes the results:

| Width | Accuracy | Learning rate |
|:-----------|:--------:|:-------------:|
| 5 | 55-60% | Low |
| 20 | 88-90% | Moderate |
| 50 | 89-93% | High |

Provided is a sample output:

<img src="https://github.com/ak1909552/hostgifs/blob/main/Screen%20Shot%202022-10-06%20at%206.54.42%20PM.png?raw=true" alt="drawing" width="500"/>

In [None]:
layer_sizes_list = [
        [784, 5, 5, 10],
        [784, 20, 20, 10],
        [784, 50, 50, 10]
    ]
plt.figure()
for layer_sizes in layer_sizes_list:
    # The model with activation
    print(layer_sizes, end = '\n\n')
    model = MNIST_MLP(layer_sizes=layer_sizes, activation='relu')
    model.set_device(device)
    # Our loss function and Optimizer
    model.criterion = nn.CrossEntropyLoss()
    model.optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) #lr is the learning_rate
    # Train model for 20 epochs
    tlh, tah, vlh, vah = model.fit(mnist_train_loader, num_epochs=20, mnist_valid_loader=mnist_valid_loader)
    # plt.plot(tah)
    plt.plot(vah, label='Width = ' + str(layer_sizes[1]))
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.tight_layout()
plt.savefig('Question_5.pdf')

Link to visit the [colab notebook](https://colab.research.google.com/drive/1TzFBMGDndZNgzCzyrfK_qmKAeid61ypU?usp=sharing).