<a href="https://colab.research.google.com/github/bptripp/ai-course/blob/main/digit_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Categorizing Images with Convolutional Networks

It is usually much faster to train a deep network on a graphical processing unit (GPU) than a computer's standard central processing unit (CPU). Before proceeding, select "Change runtime type" from the "Runtime" menu, and make sure a GPU option is selected from among the "Hardware accelerator" choices. Default values are fine for any other choices.

Once you have done that, the code below should print "cuda:0". CUDA stands for Compute Unified Device Architecture. It is a low-level interface that PyTorch uses to run code on a GPU. If it says "cpu" then check your hardware accelerator and try again.

*Run the code below to create a code reference to the GPU "device".*

In [1]:
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


The code below imports some packages and defines a small convolutional network class similar to the one you saw earlier. It is meant to classify images of handwritten digits from 0-9.

*Run this code to get started.*

In [2]:
import numpy as np
import torch.nn as nn
import torch.nn.functional as F

class ConvolutionalNetwork(nn.Module):
  def __init__(self):
    super(ConvolutionalNetwork, self).__init__()

    self.conv1 = nn.Conv2d(1, 32, 3, 1)
    self.conv2 = nn.Conv2d(32, 64, 3, 1)
    self.dropout = nn.Dropout2d(.5)
    self.fc1 = nn.Linear(9216, 128)
    self.fc2 = nn.Linear(128, 10)

  def forward(self, input):
    x = self.conv1(input)
    x = F.relu(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = F.max_pool2d(x, 2)
    x = self.dropout(x)
    x = torch.flatten(x, 1)
    x = self.fc1(x)
    x = F.relu(x)
    return self.fc2(x)

Digit recognition was the first thing convolutional networks were used for in the 1980s. The motivation was to automatically sort US mail by ZIP code. To help solve this problem, a set of 70,000 digit images was curated in the Modified National Institute of Standards and Technology (MNIST) dataset (https://en.wikipedia.org/wiki/MNIST_database).

You should now obtain the MNIST data so that you can use it to train the network. MNIST is one of the most popular machine learning datasets, so PyTorch knows where it is and will download it for you. The code below will also prepare training and testing subsets of the data and corresponding "DataLoader" objects that organize the examples in these datasets into "batches". The network will process one batch of examples at a time.

*Run the code below to download and organize the data.*

In [3]:
from torchvision import datasets, transforms

directory = 'data' # the folder where the dataset will be saved on this server

# this object will put images in the right form for input to the network
transform = transforms.Compose([
  transforms.ToTensor(),
  transforms.Normalize((0.1307,), (0.3081,))
])

# these objects contain training and testing subsets of the data
train_set = datasets.MNIST(directory, train=True, download=True, transform=transform)
test_set = datasets.MNIST(directory, train=False, download=True, transform=transform)

batch_size = 1000 # the size of groups of examples that are used for gradient descent steps

#these "loader" objects will provide batches of examples from the datasets
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
validation_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=True)

You have defined the ConvolutionalNetwork class, but you still have to create an object of that class. A technical detail in using the network object is that the GPU has its own on-board memory, and the network must be moved to that memory in order to run efficiently.

You will also need an "optimizer" object to update the network's parameters based on the gradient of the loss for each batch. Create an optimizer object of the SGD class (for "stochastic gradient descent") that comes with PyTorch's "optim" package. This code also sets the optimizer's "learning rate" parameter, which determines the size of gradient descent steps. If this is too big, the network may skip right over the best parameters and become worse with training.

*Run the code below to create network and optimizer objects and move the network to the GPU.*

In [4]:
model = ConvolutionalNetwork()
model.to(device) # move model to GPU

optimizer = torch.optim.SGD(model.parameters(), lr=0.1) # lr is the learning rate

You will now create a function that loops through all the batches of image/label examples in a given dataset, runs them through the network, and keeps track of the average loss and accuracy. This function can optionally receive your optimizer object as a parameter, in which case it will also perform backpropagation and gradient descent to update the parameters.   

*Run the code below to create a function that passes a dataset through the network.*  

In [5]:
def process_dataset(model, loader, optimizer=None):
  """
  model: the deep network
  loader: the dataloader of the dataset that is to be procesed
  optimizer: an object that updates the network's parameters
  """

  # create lists of losses and accuracies for each batch in the dataset
  losses = []
  accuracies = []

  for inputs, labels in loader: # this is a loop through all the batches
    print('.', end='') # prints a dot for each batch (this serves as a progress bar)

    inputs = inputs.to(device) # move inputs to the GPU
    labels = labels.to(device) # move labels to the GPU
    outputs = model(inputs) # run the inputs through the network
    loss = nn.CrossEntropyLoss()(outputs, labels) # calculate the loss

    if optimizer is not None:
      optimizer.zero_grad() # delete any gradients from previous steps
      loss.backward() # perform backpropagation to calculate new gradients
      optimizer.step() # perform gradient descent to improve parameters

    # calculate the fraction of predictions in this batch that were correct
    _, predictions = torch.max(outputs, 1) # predicted digits are output neurons with highest values
    fraction_correct = torch.sum(predictions == labels.data).item() / len(predictions)

    # add this batch's loss and fraction-correct to the lists
    losses.append(loss.item())
    accuracies.append(fraction_correct)

  # print the average loss and accuracy over the whole dataset
  print('{} loss: {:.4f} accuracy: {:.4f}'.format(
      'Validation' if optimizer is None else 'Training',
      np.mean(losses),
      np.mean(accuracies)))

Now everything is ready to train the network. You are using a GPU, you have network and optimizer objects, you have downloaded the data and created data loaders to access it one batch at a time. You also have a function that runs all the batches in a dataset through the network, calculates the average loss and accuracy, and updates the parameters as needed.

The code below runs several "epochs" (passes through the dataset) of training. After each epoch, the code runs the validation data through the model to find the validation loss. Recall that although optimization can only directly reduce the training loss, the validation loss is a better indication of how the network would perform with new examples.

*Finally, run the code below to train the network for ten epochs. Note how loss and accuracy on the validation dataset change with each epoch.*

In [6]:
num_epochs = 10
for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch+1, num_epochs))

    model.train() # put model in training mode
    process_dataset(model, train_loader, optimizer=optimizer)

    model.eval() # put model in evaluation mode
    process_dataset(model, validation_loader)


Epoch 1/10
............................................................Training loss: 1.0770 accuracy: 0.7015
..........Validation loss: 0.2880 accuracy: 0.9150
Epoch 2/10
............................................................Training loss: 0.2719 accuracy: 0.9198
..........Validation loss: 0.1799 accuracy: 0.9475
Epoch 3/10
............................................................Training loss: 0.1802 accuracy: 0.9469
..........Validation loss: 0.1303 accuracy: 0.9599
Epoch 4/10
............................................................Training loss: 0.1381 accuracy: 0.9597
..........Validation loss: 0.0973 accuracy: 0.9709
Epoch 5/10
............................................................Training loss: 0.1161 accuracy: 0.9662
..........Validation loss: 0.0902 accuracy: 0.9720
Epoch 6/10
............................................................Training loss: 0.0985 accuracy: 0.9716
..........Validation loss: 0.0671 accuracy: 0.9792
Epoch 7/10
.......................