In [0]:
%matplotlib inline


# Training an image Classifier with PyTorch

This is it. You have seen how to define neural networks, compute loss and make updates to the weights of the network.

We will be testing on the common CIFAR10 dataset. Fortunately, PyTorch provides the ``torchvision`` package that has loaders for common datasets such as this one.

## CIFAR10
The CIFAR-10 dataset contains 60k 32x32 color images (3 channels) in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6k images of each class. Furthermore the dataset is partitioned into 50k images for training and 10k for testing. 

## Training an image classifier

We will do the following steps in order:

1. Load and normalizing the CIFAR10 training and test datasets using ``torchvision``
2. Define a Convolution Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data




### 1. Loading and normalizing CIFAR10
Using ``torchvision``, it’s extremely easy to load CIFAR10.

In [0]:
import torch
import torchvision
import torchvision.transforms as transforms

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

The output of torchvision datasets are PILImage images of range [0, 1].

Torchvision also provides transform operations that allows us to normalize, modify and augment our datasets. Here we transform the images to Tensors of normalized range [-1, 1].



In [0]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

classes = trainset.classes

Let's see some of the training images, just for the kicks.



In [0]:
import matplotlib.pyplot as plt
import numpy as np

# functions to show an image
def imshow(img):
    img = img / 2 + 0.5 # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join(f'{classes[labels[j]]}' for j in range(4)))

### 2. Define a Convolution Neural Network
Let's define a convolutional neural network.

#### What is a convolution?


You can think of it as a sliding window matrix multiplication.

In the context of Neural Networks they allow to reduce the number of parameters in a layer (think how many weights do you need in a full connected neuron layer versus a convolution that maintain resolution among input and output), while preserving spatial locality.
And depending on the parametrization we can achieve several results:

3x3 kernel, no padding and no strides:

![conv_nono](https://github.com/vdumoulin/conv_arithmetic/raw/master/gif/no_padding_no_strides.gif)

3x3 kernel, same padding (half the size of the kernel) and no strides:

![conv_halfno](https://github.com/vdumoulin/conv_arithmetic/raw/master/gif/same_padding_no_strides.gif)

3x3 kernel, full padding and no strides:

![conv_fullno](https://github.com/vdumoulin/conv_arithmetic/raw/master/gif/full_padding_no_strides.gif)

3x3 kernel, same padding and strides:

![conv_halfyes](https://github.com/vdumoulin/conv_arithmetic/raw/master/gif/padding_strides.gif)

#### MaxPooling

We want the network to learn the general concepts behind our images.

We don't have to maintain the resolution of the image through all the network, we can downsample or upsample the image as needed per our requirements.

The Convolution operation can already do that (depending on the parameters, we control the output size).

Another way is through the Pooling operations, that can reduce the size of the input by applying some local operation like an average or a maximum.

A maxpool will do something like this:
![maxpool](http://cs231n.github.io/assets/cnn/maxpool.jpeg)


#### Our network

In [0]:
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net().to(device)

### 3. Define a Loss function and optimizer
Let's use a Classification Cross-Entropy loss and SGD with momentum.

In [0]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

### 4. Train the network
This is when things start to get interesting. 
To train the network we simply have to loop over our data iterator, and feed the inputs to the network and optimize.

We can loop over the whole dataset multiple times, each iteration is called an epoch. Each set of images that we pass to the network at the same time is a batch. 

In [0]:
NUM_EPOCHS = 2
for epoch in range(NUM_EPOCHS):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch+1}, {i+1 :5}] loss: {running_loss/2000 :.3}')
            running_loss = 0.0

print('Finished Training')

### 5. Test the network on the test data

We have trained the network for 2 passes over the training dataset.
But we need to check if the network has learnt anything at all.

We will check this by predicting the class label that the neural network
outputs, and checking it against the ground-truth. If the prediction is
correct, we add the sample to the list of correct predictions.

Let's see first an image from the test set to get familiar.

In [0]:
dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]}' for j in range(4)))

Okay, now let us see what the neural network thinks these examples above are:



In [0]:
outputs = net(images.to(device))

The outputs are energies/probabilities for the 10 classes.
Higher the energy for a class, the more the network _thinks_ that the image is of the particular class.
So, let's get the index of the highest energy:



In [0]:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join(f'{classes[predicted[j]]}' for j in range(4)))

The results seem pretty good.

How does the network perform on the whole dataset?



In [0]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total} %')

That looks waaay better than chance, which is 10% accuracy (randomly picking a class out of 10 classes).
Seems like the network learnt something.

Finally, we can also check what are the classes that performed well, and the classes that did not perform well:



In [38]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print(f'Accuracy of {classes[i] : >10} : {100 * class_correct[i] / class_total[i] : 2}%')

Accuracy of   airplane :  55.4%
Accuracy of automobile :  80.8%
Accuracy of       bird :  48.6%
Accuracy of        cat :  46.5%
Accuracy of       deer :  54.4%
Accuracy of        dog :  46.8%
Accuracy of       frog :  72.2%
Accuracy of      horse :  70.6%
Accuracy of       ship :  84.2%
Accuracy of      truck :  72.2%


## Where to now?

Try to modify the network and see how your changes affect the results.