# Foundations of Artificial Intelligence and Machine Learning
## A Program by IIIT-H and TalentSprint
#### To be done in the Lab


The objective of this experiment is to implement MLP and BP from the last lectures in pytorch.

In this experiment we will be using MNIST database. The MNIST database is a dataset of handwritten digits. It has 60,000 training samples, and 10,000 test samples. Each image is represented by 28 x 28 pixels, each containing a value 0 - 255 with its gray scale value.

It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

In [None]:
#importing torch packages
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable
import matplotlib.pyplot as plt

#### Initialize Hyper-parameters

Hyper-parameters are the powerful arguments that are set up upfront and will not be updated along with the training of the neural network.

In [None]:
#hyperparameters
input_size = 784       # The image size = 28 x 28 = 784
hidden_size = 500      # The number of nodes at the hidden layer
num_classes = 10       # The number of output classes. In this case, from 0 to 9
num_epochs = 5         # The number of times entire dataset is trained
batch_size = 32      # The size of input data took for one iteration
learning_rate = 0.001  # The speed of convergence

Now, we'll load the MNIST data

In [None]:
#Loading the train set file
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

#Loading the test set file
test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

Loading the dataset

In [None]:
#loading the train dataset
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

#loading the test dataset
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

Let us Visualize the data

In [None]:
dataiter = iter(test_loader)
images, labels = dataiter.next()

# print images
import torchvision
import numpy as np


def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.axis('off')

imshow(torchvision.utils.make_grid(images))
print(labels.size())

In [None]:
images.shape

### Feedforward Neural Network

The FNN includes two fully-connected layers (i.e. fc1 & fc2) and a non-linear ReLU layer in between. Normally we call this structure 1-hidden layer FNN, without counting the output layer (fc2) in.
By running the forward pass, the input images (x) can go through the neural network and generate a output.

Let's define the network as a Python class. We have to write the __init__() and forward() methods, and PyTorch will automatically generate a backward() method for computing the gradients for the backward pass.

class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Net, self).__init__() # Inherited from the parent class nn.Module
        self.fc1 = nn.Linear(input_size, hidden_size) # 1st Full-Connected Layer
        self.relu = nn.ReLU()# Non-Linear ReLU Layer: max(0,x)
        self.fc2 = nn.Linear(hidden_size, num_classes) # 2nd Full-Connected Layer: 500 (hidden node) -> 10 (output class)
        self.softmax = nn.Softmax(dim=1)
    
    def forward(self, x):  # Forward pass: stacking each layer together
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.softmax(out)
        return out

#hyperparameters
input_size = 784       # The image size = 28 x 28 = 784
hidden_size = 500      # The number of nodes at the hidden layer
num_classes = 10       # The number of output classes. In this case, from 0 to 9
num_epochs = 5         # The number of times entire dataset is trained
batch_size = 32      # The size of input data took for one iteration
learning_rate = 0.001  # The speed of convergence

In [None]:
net = nn.Sequential(
          nn.Linear(input_size, hidden_size),
          nn.ReLU(),
          nn.Linear(hidden_size, num_classes),
          nn.Softmax(dim=1)
        )

#### Creating a neural network object

#### Loss and Optimizer

Loss function (criterion) decides how the output can be compared to a class, which determines how good or bad the neural network performs. And the optimizer chooses a way to update the weight in order to converge to find the best weights in this neural network.

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

### Train the neural network

**Forward Pass** The forward pass refers to calculation process, values of the output layers from the inputs data. Its traversing through all neurons from first to last layer.

**Backward Pass** The backward pass refers to process of counting changes in weights, using gradiend descent algorithm or similar. Computation is made from last layer, backward to the first layer.

In [None]:
loss_c = []

In [None]:
for epoch in range(2):  # loop over the dataset multiple times
    net.train(True)
    running_loss = 0.0
    for i, data in enumerate(train_loader):
        # get the images
        images, labels = data
        images = images.view(-1, 28*28)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        loss_c.append(loss.item())
        if (i+1) % 200 == 0:    # print every 200 mini-batches
            print('[Epoch %d, Batch %5d/%5d] loss: %.3f' %
                  (epoch + 1, i + 1, len(train_loader), float(running_loss) / 200))
            running_loss = 0.0

print('Finished Training')

In [None]:
plt.plot(np.arange(len(loss_c)), loss_c)

Performing testing on the network

In [None]:
correct = 0
total = 0
for data in test_loader:
    net.eval()
    images, labels = data
    images = images.view(-1, 28*28)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

### Exercise 1

Change the batch size to 200 and try calculating the loss and accuracy for both the training and testing data. What do you observe?

In [None]:
# Your Code Here

### Exercise 2

Change the batch size to 5. Try calculating the loss and accuracy for both the training and testing data. What do you observe?

In [None]:
# Your Code Here