The CIFAR-10 dataset is one of the most commonly used datasets for deep learning and image classification, popular becasue of its versatility and ease of use. This notebook will walk you through the process of running an image classifciation algorithm on the CIFAR-10 dataset using PyTorch. It will also go through the ideas behind deep learning image classification, acting as a practical tutorial to image classification with PyTorch. 

To begin with, we'll need to start by importing all the necessary modules and libraries.

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

PyTorch has a convenient and simple way of passing image data into the deep neural network model - the DataLoader. When we use the DataLoader, we first have to set up the dataset we want the loader to pass to the model. CIFAR-10 is so common that PyTorch has the dataset prepackaged in it, but we also need to specify certain transforms that the dataset will use. Transforms are arguments we pass to the dataset function that control how the dataset is handled when it is loaded. The most important transform is `ToTensor`, as this makes the images a tensor that the model can interpret, but it is also important to normalize the image data, as the pixels have different values and when unnormalized they can cause the image classifier to learn the wrong patterns. 

You may also want to engage in some data augmentation, and add extra arguments to the transforms for the traning set These transforms are different perturbations, like warping and shifting of the images, that can improve the classifier's robustness and make it able to recognize objects under more circumstances.

In [None]:
# define the transforms to use
# creates a tensor with all values divided in half to normalize

train_transforms = transforms.Compose(
    [transforms.RandomRotation(30),
     transforms.RandomHorizontalFlip(),
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

test_transforms = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Create training and testing datasets
train_data = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=train_transforms)
# loader creates an iterable object for later use
train_loader = torch.utils.data.DataLoader(train_data, batch_size=4,
                                          shuffle=True, num_workers=0)

test_data = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=test_transforms)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=4,
                                         shuffle=False, num_workers=0)

We need to specify the number of classes here, as the number of classes should be the output number for the final layer of the moel.

In [None]:
num_classes = 10

We can now go about setting up the model. In PyTorch, we can create a custom network model by inheriting from `nn.Module` and then specifying the layers we want to use. We declare the individual layers we want to use in the class construction and then define a forward pass function to carry out the training. We'll need the convolutional layers, the Max Pooling layers, and the Linear layers.

As for the activation functions and the flattening of the tensor, that can be carried out with the `Functional` module from PyTorch (letting you use the functions as is), and with the `view` function, respectively.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        # the functional nn allows you to get more explicit than nn.sequential
        # which means you have to manually define the parameters
        self.conv1 = nn.Conv2d(3, 64, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(64, 128, 5)

        self.fc1 = nn.Linear(128 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, num_classes)

    # Define the forward pass, overwrite X for different layers
    def forward(self, x):

        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # .view behaves like -1 in numpy.reshape(), i.e. the actual value for this dimension will
        # be inferred so that the number of elements in the view matches the original number of elements.
        x = x.view(-1, 128 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Now we just need to instantiate the custom classifier by making an instance of it as a variable.

In [None]:
# instantiate the class model

model = Net()

We also need to define the loss function and the optimizer that we want to use.

In [None]:
# define the loss function and optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

We can now go about defining the training loop for the model.  We'll specify a number of epochs to train for and then get the training instances from the trainloader. We can then get the inputs and send them through the model, saving the outputs of the model as a variable. After this, we'll use the outputs and get the loss by comparing the outputs and the ground truth labels with our specified criterion. After the loss has been calculated, we can do backpropogation and then carry out optimization using the optimizer. We'll also print out some statistics at the end of every epoch.

In [None]:
# train the network

for epoch in range(65):

    running_loss = 0.0

    # for every instance/example in the train_loader, get data and index
    for i, data in enumerate(train_loader, 0):
        # get the inputs
        inputs, labels = data

        # be sure to zero the gradients for every new epoch
        optimizer.zero_grad()

        # Instantiate forward + backward + optimize
        # define the outputs as a function of the net on inputs
        outputs = model(inputs)
        # set the loss as difference between labels and outputs with chosen criterion
        loss = criterion(outputs, labels)
        # Carry out backprop
        loss.backward()
        # Optimize
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0


print('Finished Training')

We now want to evaluate the performance of the trained classifier. We'll use the test dataloader to get the images and the correct labels. We don't have to show the images to check the performance of the classifier, but it may prove useful. We need to convert the images back to their previous un-normalized state if we are going to show them. 

In [None]:
# use the test_loader to get images and labels for the test set

data_iter = iter(test_loader)
images, labels = data_iter.next()

def imshow(img):
    # because we normalized the data, we need to put it back to its original state
    img = img / 2 + 0.5
    # create a numpy array out of the image
    img = img.numpy()
    # transpose image to format - (height, width, color)
    plt.imshow(np.transpose(img, (1, 2, 0)))
    plt.show()

# show images
imshow(torchvision.utils.make_grid(images))

# define all the classes we have in the dataset

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# print output
print('Actual: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

# need to make the outputs just the images run through the network
outputs = model(images)

# getting the most likely class for the prediction
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))

Now we want to see how the network performs against the testing data. First, we set the torch setting to "no_grad", so the gradients aren't updated as we test the network. Then for all the data in the dataloader, we need to get the images and labels. We can then run the images through the model and get the predictions, just like before. This time though, we compare the predicted data to the label, and if the values are the same we count it as being correct.

In [None]:
# check to see how the network as a whole performs

correct = 0
total = 0

#  "with torch.no_grad()" temporarily sets all the requires_grad flag to false
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        # total is the size of labels, every label
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Model accuracy on test data: %d %%' % (100 * correct / total))