<a href="https://colab.research.google.com/github/bptripp/ai-course/blob/main/digit_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Categorizing Images with Convolutional Networks

The code below imports some packages and defines a small convolutional network class similar to the one you saw earlier. It is meant to classify images of handwritten digits from 0-9.

*Run this code to get started.*

In [3]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class ConvolutionalNetwork(nn.Module):
  def __init__(self):
    super(ConvolutionalNetwork, self).__init__()

    self.conv1 = nn.Conv2d(1, 32, 3, 1)
    self.conv2 = nn.Conv2d(32, 64, 3, 1)
    self.dropout = nn.Dropout2d(.5)
    self.fc1 = nn.Linear(9216, 128)
    self.fc2 = nn.Linear(128, 10)

  def forward(self, input):
    x = self.conv1(input)
    x = F.relu(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = F.max_pool2d(x, 2)
    x = self.dropout(x)
    x = torch.flatten(x, 1)
    x = self.fc1(x)
    x = F.relu(x)
    return self.fc2(x)

Digit recognition was the first thing convolutional networks were used for in the 1980s. The motivation was to automatically sort US mail by ZIP code. To help solve this problem, a set of 70,000 digit images was curated in the Modified National Institute of Standards and Technology (MNIST) dataset (https://en.wikipedia.org/wiki/MNIST_database).

You will now load the MNIST data and train a ConvolutionalNetwork with it. MNIST is one of the most popular machine learning datasets, so PyTorch knows where it is and will download it for you. The code below will also prepare training and testing datasets and corresponding "DataLoader" objects that provide batches of examples from these datasets.

*Run the code below to download and organize the data.*

In [4]:
from torchvision import datasets, transforms

directory = 'data' # the folder where the dataset will be saved on this server

# this object will convert images into the right form for input to the network
transform = transforms.Compose([
  transforms.ToTensor(),
  transforms.Normalize((0.1307,), (0.3081,))
])

# these objects contain training and testing subsets of the data
train_set = datasets.MNIST(directory, train=True, download=True, transform=transform)
test_set = datasets.MNIST(directory, train=False, download=True, transform=transform)

batch_size = 1000 # the size of groups of examples that are used for gradient descent steps

#these objects will provide batches of examples from the datasets
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
validation_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 78382328.12it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 112894402.45it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 79007155.96it/s]

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz





Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 20729628.69it/s]


Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw



You will now create a function that loops through all the batches of image/label examples in a dataset, runs them through the network, and keeps track of the average loss and accuracy in the process. If this function gets an optimizer object as a parameter, it will also perform backpropagation and a variation of gradient descent to update the parameters.   

*Run the code below to create a function that passes a dataset through the network.*  

In [9]:
def process_dataset(model, loader, optimizer=None):
  """
  model: the deep network
  loader: the dataloader for the dataset to be processed
  optimizer: object that updates parameters
  """

  # create lists of losses and accuracies for each batch in the dataset
  losses = []
  accuracies = []

  for inputs, labels in loader:
    print('.', end='') # prints a dot for every batch as a simple progress bar

    inputs = inputs.to(device)
    labels = labels.to(device)
    outputs = model(inputs) # run the inputs through the network
    loss = nn.CrossEntropyLoss()(outputs, labels) # calculate loss

    if optimizer is not None:
      optimizer.zero_grad()
      loss.backward() # backpropagation
      optimizer.step() # gradient descent

    # calculate the fraction of predictions in this batch that were correct
    _, predictions = torch.max(outputs, 1)
    fraction_correct = torch.sum(predictions == labels.data).item() / len(predictions)

    # add this batch's loss and fraction-correct to the lists
    losses.append(loss.item())
    accuracies.append(fraction_correct)

  # print the average loss and accuracy over the whole dataset
  print('{} loss: {:.4f} accuracy: {:.4f}'.format(
      'Validation' if optimizer is None else 'Training',
      np.mean(losses),
      np.mean(accuracies)))

TODO

In [7]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


TODO

In [11]:
from torch import optim

model = ConvolutionalNetwork()
model.to(device);
optimizer = optim.SGD(model.parameters(), lr=0.1)

# train_model(model, optimizer, 3);
num_epochs = 3
for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch+1, num_epochs))

    model.train() # put model in training mode
    process_dataset(model, train_loader, optimizer=optimizer)

    model.eval() # put model in evaluation mode
    process_dataset(model, validation_loader)


Epoch 1/3
............................................................Training loss: 1.0323 accuracy: 0.6948
..........Validation loss: 0.3377 accuracy: 0.9077
Epoch 2/3
............................................................Training loss: 0.2763 accuracy: 0.9181
..........Validation loss: 0.2005 accuracy: 0.9405
Epoch 3/3
............................................................Training loss: 0.1870 accuracy: 0.9455
..........Validation loss: 0.1346 accuracy: 0.9604
