In [2]:
import time
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

For training our first feedforward NN, we will use the in built datasets, in this case the popular MNIST dataset. $torchvision.datasets$ contains several popular datasets like MNIST, Fashion-MNIST, EMNIST, CIFAR etc.

$torchvision.transforms$ contains common image transforms that we will need for the MNIST dataset.

Before we get started, let us import the training and test data from MNIST. 

In [3]:
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),  
                            download=True)

In [4]:
test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

These will get saved in the $data$ folder and their processed forms will be stored in the $data/processed$ folder.

The next step is setting up the hyperparameters.

In [5]:
# Input is 28x28 pixels = 784 (since single channel)
input_size = 784
# The number of hidden layers may be emperically chosen but we set our best estimate to begin with
hidden_size = 500
# 10 classes since 10 digits
num_classes = 10
# Let us train it for 10 epochs i.e. 10 training cycles
num_epochs = 10
# Depending on how powerful your machine is you can increase the batch size
batch_size = 100
# To avoid overfitting, we start with a low learning rate
learning_rate = 0.001

Now that we have procured the data and set the hyperparameters, we will set up a pipeline for the input.

In [6]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

In [7]:
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

The data preparation phase is over, we have the data, a configured pipeline and now we will create the model. For simplicity we will create a single layer model.

In [8]:
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

Congratulations on creating your first model. Go ahead and experiment with the __learning rate__, __hidden layer size__, __batch size__ and __number of epochs__ once you are done to understand how they affect your model's performance.

In [9]:
net = Net(input_size, hidden_size, num_classes)
net.cuda()

    Found GPU0 GeForce GTX 1060 with Max-Q Design which requires CUDA_VERSION >= 8000 for
     optimal performance and fast startup time, but your PyTorch was compiled
     with CUDA_VERSION 7050. Please install the correct PyTorch binary
     using instructions from http://pytorch.org
    


Net(
  (fc1): Linear(in_features=784, out_features=500)
  (relu): ReLU()
  (fc2): Linear(in_features=500, out_features=10)
)

We transfered the NN to the GPU using $net.cuda()$. Also note that we used a fully connected layer and the outermost layers and used a ReLU activation in the hidden layer.

Now that we have our model we need to decide what loss function we want to use and the optimizer that will minimize our loss function. We choose the Cross Entropy (or log) loss function and the [Adam Optimizer](https://arxiv.org/abs/1412.6980v8).

In [10]:
criterion = nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

We now train the data i.e. we iterate over the test data and pass it to the model. We calculate loss, backpropagate and then try to optimize our loss function.

A common practice while training is to print the status and the value of the loss after every $n$ steps. Here we choose $100$. 

Let us get an estimate of how long this process takes using the $time$ package.

In [11]:
start = time.time()
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  
        # Convert torch tensor to Variable for calculations
        images = Variable(images.view(-1, 28*28).cuda())
        labels = Variable(labels.cuda())
        
        # Forward
        optimizer.zero_grad()  # zero the gradient buffer
        outputs = net(images)
        loss = criterion(outputs, labels)
        # Backward
        loss.backward()
        # Optimize
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' 
                   %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))

end = time.time()
print(end - start)

Epoch [1/10], Step [100/600], Loss: 0.3295
Epoch [1/10], Step [200/600], Loss: 0.1577
Epoch [1/10], Step [300/600], Loss: 0.2016
Epoch [1/10], Step [400/600], Loss: 0.2591
Epoch [1/10], Step [500/600], Loss: 0.2471
Epoch [1/10], Step [600/600], Loss: 0.1076
Epoch [2/10], Step [100/600], Loss: 0.1606
Epoch [2/10], Step [200/600], Loss: 0.1400
Epoch [2/10], Step [300/600], Loss: 0.0607
Epoch [2/10], Step [400/600], Loss: 0.1162
Epoch [2/10], Step [500/600], Loss: 0.0700
Epoch [2/10], Step [600/600], Loss: 0.1095
Epoch [3/10], Step [100/600], Loss: 0.0613
Epoch [3/10], Step [200/600], Loss: 0.0401
Epoch [3/10], Step [300/600], Loss: 0.0401
Epoch [3/10], Step [400/600], Loss: 0.0181
Epoch [3/10], Step [500/600], Loss: 0.0543
Epoch [3/10], Step [600/600], Loss: 0.0765
Epoch [4/10], Step [100/600], Loss: 0.0500
Epoch [4/10], Step [200/600], Loss: 0.0591
Epoch [4/10], Step [300/600], Loss: 0.0197
Epoch [4/10], Step [400/600], Loss: 0.0195
Epoch [4/10], Step [500/600], Loss: 0.1489
Epoch [4/10

After 10 epochs we get a loss of $0.015$ and the overall training time was $58$ seconds. 

Note: Total number of steps = size_of_dataset / batch_size 

Congratualtions on training your first model with an image dataset. No go ahead and test the model's performance and repeat this process after changing the hyperparameters.

In [12]:
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28)).cuda()
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted.cpu() == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

Accuracy of the network on the 10000 test images: 98 %


When you are finally satisfied with your results you can even save your model for future use as below.

In [13]:
torch.save(net.state_dict(), 'mnist_fnn.pkl')