In [1]:
import time
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

In this lecture we will build a Convolutional NN (CNN), we will use the same dataset i.e. MNIST

Let us import the training and test data from MNIST. 

In [2]:
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),  
                            download=True)

In [3]:
test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

These will get saved in the $data$ folder (Should already be present from lecture 2) and their processed forms will be stored in the $data/processed$ folder.

The next step is setting up the hyperparameters.

In [4]:
# 10 classes since 10 digits
num_classes = 10
# Let us train it for 10 epochs i.e. 10 training cycles
num_epochs = 10
# Depending on how powerful your machine is you can increase the batch size
batch_size = 100
# To avoid overfitting, we start with a low learning rate
learning_rate = 0.001

Now that we have procured the data and set the hyperparameters, we will set up the pipeline for the input as we did before.

In [5]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

In [6]:
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

The data preparation phase is over, we have the data, a configured pipeline and now we will create the CNN model. 

We will create a 2 CNN layer model with Batch normalization, ReLU and Max Pooling.

In [7]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # For the 1st CNN layer we use 1 input image channel,
        # 16 output channels, Kernel of 5x5 and Padding of 2
        # Followed by BatchNorm, ReLU activation and Max Pooling
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2))
        # For the 2nd CNN layer we use 16 input channel (output of layer 1),
        # 32 output channels, Kernel of 5x5 and Padding of 2
        # Followed by BatchNorm, ReLU activation and Max Pooling        
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2))
        # The last layer is a FC layer with 10 output channels (1 for each class label)
        self.fc = nn.Linear(7*7*32, num_classes)
        
    def forward(self, x):
        # Output of each layer is fed as input to the next layer
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

Congratulations on creating your first model. Go ahead and experiment with the __learning rate__, __kernel size__, __batch size__ and __number of epochs__ once you are done to understand how they affect your model's performance.

In [8]:
cnn = CNN()
cnn.cuda()

    Found GPU0 GeForce GTX 1060 with Max-Q Design which requires CUDA_VERSION >= 8000 for
     optimal performance and fast startup time, but your PyTorch was compiled
     with CUDA_VERSION 7050. Please install the correct PyTorch binary
     using instructions from http://pytorch.org
    


CNN(
  (layer1): Sequential(
    (0): Conv2d (1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (layer2): Sequential(
    (0): Conv2d (16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (fc): Linear(in_features=1568, out_features=10)
)

We transfered the NN to the GPU using $cnn.cuda()$. Also note that we used a fully connected layer and the outermost layers and used a ReLU activation in the hidden layer.

Now that we have our model we need to decide what loss function we want to use and the optimizer that will minimize our loss function. We choose the Cross Entropy (or log) loss function and the [Adam Optimizer](https://arxiv.org/abs/1412.6980v8).

In [9]:
criterion = nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate)

We now train the data i.e. we iterate over the test data and pass it to the model. We calculate loss, backpropagate and then try to optimize our loss function.

A common practice while training is to print the status and the value of the loss after every $n$ steps. Here we choose $100$. 

Let us get an estimate of how long this process takes using the $time$ package.

In [10]:
start = time.time()
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  
        # Convert torch tensor to Variable for calculations
        images = Variable(images.cuda())
        labels = Variable(labels.cuda())
        
        # Forward
        optimizer.zero_grad()  # zero the gradient buffer
        outputs = cnn(images)
        loss = criterion(outputs, labels)
        # Backward
        loss.backward()
        # Optimize
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Iter [%d/%d], Loss: %.4f' 
                   %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))

end = time.time()
print(end - start)

Epoch [1/10], Iter [100/600], Loss: 0.2073
Epoch [1/10], Iter [200/600], Loss: 0.1442
Epoch [1/10], Iter [300/600], Loss: 0.0766
Epoch [1/10], Iter [400/600], Loss: 0.1518
Epoch [1/10], Iter [500/600], Loss: 0.0589
Epoch [1/10], Iter [600/600], Loss: 0.0371
Epoch [2/10], Iter [100/600], Loss: 0.0241
Epoch [2/10], Iter [200/600], Loss: 0.0289
Epoch [2/10], Iter [300/600], Loss: 0.0152
Epoch [2/10], Iter [400/600], Loss: 0.0381
Epoch [2/10], Iter [500/600], Loss: 0.0418
Epoch [2/10], Iter [600/600], Loss: 0.0224
Epoch [3/10], Iter [100/600], Loss: 0.0374
Epoch [3/10], Iter [200/600], Loss: 0.0297
Epoch [3/10], Iter [300/600], Loss: 0.0622
Epoch [3/10], Iter [400/600], Loss: 0.0257
Epoch [3/10], Iter [500/600], Loss: 0.0633
Epoch [3/10], Iter [600/600], Loss: 0.0187
Epoch [4/10], Iter [100/600], Loss: 0.0128
Epoch [4/10], Iter [200/600], Loss: 0.0022
Epoch [4/10], Iter [300/600], Loss: 0.0024
Epoch [4/10], Iter [400/600], Loss: 0.0410
Epoch [4/10], Iter [500/600], Loss: 0.0042
Epoch [4/10

After 10 epochs we get a loss of $0.002$ and the overall training time was $173$ seconds. 

Note: Total number of iterations = size_of_dataset / batch_size 

Congratualtions on training your first model with an image dataset. No go ahead and test the model's performance and repeat this process after changing the hyperparameters.

In [11]:
cnn.eval()
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.cuda())
    outputs = cnn(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted.cpu() == labels).sum()
    
print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

Accuracy of the network on the 10000 test images: 99 %


When you are finally satisfied with your results you can even save your model for future use as below.

In [12]:
torch.save(cnn.state_dict(), 'mnist_cnn.pkl')