# Deep Learning Tutorial 2: MLP network & Convolutional network
In this demonstration we will train a neural network to perform classification on the MNIST handwritten digits dataset. We will use the PyTorch Library, a high level API for building and training neural networks. 

We also import numpy, matplotlib, hiddenlayer and tqdm for visualization, so that we can take a better look at the inner workings of our models.

In [None]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torchvision import datasets, transforms, models
from tqdm import tqdm
import matplotlib.pyplot as plt
import hiddenlayer as hl
import numpy as np

## Hyperparameters

Here we define some hyperparameters that we will use for training. Later on we will change some parameters to see how they influence the training, but for now we will leave them as is.

In [None]:
#Hyperparameters
torch.manual_seed(1234) #Setting a seed for the random nuber generator in pyTorch
learning_rate = 0.01
batch_size = 48
criterion = nn.CrossEntropyLoss()
epochs = 10 #How many times we loop through our training data
MNIST_mean = 0.1307
MNIST_std = 0.3081

## Loading MNIST

Next we will load the MNIST training and testing datasets. Using the mean and standard deviation of the MNIST dataset we normalize the data. The MNIST training set contains 60000 images and the test set contains 10000 images where each image is represented as 28 x 28 pixels.

In [None]:
#Load Data
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((MNIST_mean,), (MNIST_std,))])
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True)

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False)

Lets take a look at the data we loaded:

In [None]:
def imshow(img):
    
    npimg = img.numpy()
    plt.imshow(255 * np.transpose(npimg, (1, 2, 0)).astype(np.uint8))
    plt.show()

dataiter = iter(torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False))
images, labels = dataiter.next()

imshow(torchvision.utils.make_grid(images))
print('Labels: ', ' '.join('%s, ' % (labels.data).cpu().numpy()[j] for j in range(4)))

In [None]:
trainset

## Building the neural network

Now it's time to build our neural network. In the constructor we will define the pyTorch neural network components we wish to use, while in the forward function we define the data flow for the forward pass. For the backward pass pyTorch uses automatic differentiation to calculate the losses based on the graph structure we have defined in our forward function.

In [None]:
class MLPNet(nn.Module):
    """MLP network"""
    def __init__(self):
        super(MLPNet, self).__init__()
        self.fc1 = nn.Linear(28*28, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)           #Reshape input tensor.
        x = F.relu(self.fc1(x))         #Fully connected hidden layer with 500 neurons.
        x = self.fc2(x)                 #10 fully connected output neurons for our digits. 
        return x 
net = MLPNet()
print(net)

## Visualizing the neural network

Here is what the neural network graph we created looks like:

In [None]:
transforms = [
    hl.transforms.Prune("Constant"),
    hl.transforms.Prune("Const"),
    hl.transforms.Fold("Linear > Relu", "Linear-Relu"),
    hl.transforms.Fold("Conv > Relu", "Conv-Relu"),
]
hl_graph = hl.build_graph(net, torch.zeros([1, 1, 28, 28]), transforms=transforms)
hl_graph.theme = hl.graph.THEMES["blue"].copy()
hl_graph

## Training and testing loops

Now we will define our training and testing loops. 

In [None]:
def train(model, train_loader, optimizer, epoch, history, canvas, oldHistory=None):
    """Training"""
    model.train()
    for batch, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch % (60000/(batch_size*10)) == 0 and batch != 0: # Print loss 10 times per epoch
            history.log((epoch*len(train_loader.dataset) + batch * len(data)), loss=loss)
            if oldHistory is None:
                with canvas:
                    canvas.draw_plot([history["loss"]])
            else:
                with canvas:
                    canvas.draw_plot([oldHistory["loss"], history["loss"]], labels=["MLP-loss", "CNN-loss"])
            print('Train Epoch: {} [{}/{}]\tLoss: {:.6f}'.format(epoch, batch * len(data), len(train_loader.dataset), loss.item()))

def test(model, test_loader):
    """Testing"""
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() 
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

## Training the network

Finally, we will run the training. We will use stochastic gradient descent (SGD) for optimization.

In [None]:
#Train the network
history1 = hl.History()
canvas1 = hl.Canvas()
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
epochs = 1
for epoch in range(1, epochs + 1):
    train(net, train_loader, optimizer, epoch, history1, canvas1)
    test(net, test_loader)
print('Finished Training')

## Deep Convolutional Neural Network

Next we will make some modifications to the network to see if we can improve our accuracy and/or speed up the training. We add two convolutional layers with max pooling and change the optimizer to adam and adjust learning rate accordingly.

In [None]:
class DeepConvolutionalNet(nn.Module):
    """Convolutional neural network"""
    def __init__(self):
        super(DeepConvolutionalNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net2 = DeepConvolutionalNet()
print(net2)
hl_graph = hl.build_graph(net2, torch.zeros([1, 1, 28, 28]), transforms=transforms)
hl_graph.theme = hl.graph.THEMES["blue"].copy()
hl_graph

In [None]:
history2 = hl.History()
canvas2 = hl.Canvas()
optimizer = optim.Adam(net2.parameters(), lr=0.001)
for epoch in range(1, epochs + 1):
    train(net2, train_loader, optimizer, epoch, history2, canvas2, history1)
    test(net2, test_loader)
print('Finished Training')

In summary we loaded the MNIST dataset, built a neural network using pyTorch, made training and testing loops for our data and trained the network using pyTorch backpropagation and stochastic gradient descent. Finally we increased the capacity of the network by adding layers and changed our optimizer to Adam. Now it's your turn! Can you find even better hyperparameters? Try changing the batch_size and learning_rate and if you want you can try adding convolutional and dropout layers to the network.

Congratulations, you reached the end of this tutorial!