## **In this simple neural network we will train our model on the familiar MNIST Database to recognize handwritten digits.**

## Importing the MNIST Database
We import labeled handwritten digits from the MNIST database. This is a convenient repository as the images have been size-normalized into a 28x28 pixel bounding box. We further normalize the tensor images by means of the 'Normalize' transform using the mean (0.1307) and standard deviation of the dataset (0.3081). The output tensor is computed as, output = (x-mean)/std.



In [25]:
import torch
from torchvision import datasets, transforms

transform = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize((0.1307,), (0.3081,))
            ])

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=True, download=True, transform=transform))

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=False, transform=transform))

## Neural Network with 4 Layers
Here, I have employed the hyperbolic tangent function for the hidden layers, and the sigmoid function for the output layer. I specifically used the tanh function as it brings the mean close to zero, making the training easier for subsequent layers.

In [26]:
import time
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self, epochs=10):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(784, 128)
        self.linear2 = nn.Linear(128, 64)
        self.linear3 = nn.Linear(64, 10)

        self.epochs = epochs

    def forward_pass(self, x):
        x = self.linear1(x)
        x = torch.tanh(x)
        x = self.linear2(x)
        x = torch.tanh(x)
        x = self.linear3(x)
        x = torch.sigmoid(x)
        return x
    
    def one_hot_encode(self, y):
        encoded = torch.zeros([10], dtype=torch.float64)
        encoded[y[0]] = 1.
        return encoded

    def train(self, train_loader, optimizer, criterion):
        start_time = time.time()
        loss = None

        for iteration in range(self.epochs):
            for x,y in train_loader:
                y = self.one_hot_encode(y)
                optimizer.zero_grad()
                output = self.forward_pass(torch.flatten(x))
                loss = criterion(output, y)
                loss.backward()
                optimizer.step()

            print('Epoch: {0}, Time Spent: {1:.2f}s, Loss: {2}'.format(
                iteration+1, time.time() - start_time, loss
            ))

## Results
I have used the Binary Cross Entropy with Logit Loss as my loss fucntion as it was giving me better results as opposed to a simple Binary Cross Entropy. For the optimization function, I went with the reliable Adam optimizer, as it is relatively fast without compromising on accuracy.

In [None]:
model = Net()

optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.BCEWithLogitsLoss()

model.train(train_loader, optimizer, criterion)

Epoch: 1, Time Spent: 133.30s, Loss: 0.6931471825587292
Epoch: 2, Time Spent: 327.62s, Loss: 0.693147182500419
Epoch: 3, Time Spent: 549.89s, Loss: 0.6931471824867155
