# MLP for Digit Recognition

We will use pytorch to train the same MLP as in the demo for digit recognition. Pytorch is a very friendly and convenient framework to implement deep learning models (note that other frameworks exist, such as TensorFlow, Theano, Caffe, matconvnet)

We will use torchvision to load the dataset and normalize the images. More specifically, the output of torchvision datasets are PILImage images in the range [0, 1]. We will transform them to Tensors in the normalized range [-1, 1]. Tensors are the standard representation used by pytorch.

In [5]:
import torch
import torchvision
import torchvision.transforms as transforms

In [6]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.MNIST(root='./mnist_data', train=True,
                                        download=False, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.MNIST(root='./mnist_data', train=False,
                                       download=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

Let us look at some of the images

In [7]:
import matplotlib.pyplot as plt
import numpy as np

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    
# show images
imshow(torchvision.utils.make_grid(images))
plt.show()

<Figure size 640x480 with 1 Axes>

Let us now define a Multilayer Perceptron. For any network, the __init__ function defines the different layers that will be used in the network, and the forward function defines the forward pass through the network. The backward pass is computed automatically using autograd (an automatic differentiation tool).

In [8]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28*28, 300)
        self.fc2 = nn.Linear(300, 100)
        self.fc3 = nn.Linear(100, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

Let us use the cross-entropy loss and SGD with momentum to optimize the network parameters

In [9]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Now we can train the network. To this end, we can simply loop over the data iterator (trainloader), pass the samples forward through the network, compute the loss, backpropagate and update the parameters.

In [10]:
from torch.autograd import Variable
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        # reshape the images as vectors
        inputs = inputs.view(-1,28*28)
        
        

        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)

        # clear the gradients of the variables
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.data[0]
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')



[1,  2000] loss: 0.773
[1,  4000] loss: 0.382
[1,  6000] loss: 0.308
[1,  8000] loss: 0.258
[1, 10000] loss: 0.241
[1, 12000] loss: 0.213
[1, 14000] loss: 0.199
[2,  2000] loss: 0.160
[2,  4000] loss: 0.157
[2,  6000] loss: 0.153
[2,  8000] loss: 0.141
[2, 10000] loss: 0.140
[2, 12000] loss: 0.124
[2, 14000] loss: 0.127
Finished Training


We can then test the network on the test data

In [11]:
correct = 0
total = 0
for data in testloader:
    images, labels = data
    images = images.view(-1,28*28)
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 95 %


Let us now look at the accuracy per class

In [12]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
for data in testloader:
    images, labels = data
    images = images.view(-1,28*28)
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    c = (predicted == labels).squeeze()
    for i in range(4):
        label = labels[i]
        class_correct[label] += c[i]
        class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        i, 100 * class_correct[i] / class_total[i]))

RuntimeError: value cannot be converted to type uint8_t without overflow: 980.000000