<a href="https://colab.research.google.com/github/bptripp/ai-course/blob/main/digit_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Categorizing Images with Convolutional Networks

Below are some import statements and the class definition for a small convolutional network that you have seen earlier, which is meant to classify images of handwritten digits. Run this code to get started.  

In [2]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

class DigitNetwork(nn.Module):
  def __init__(self):
    super(DigitNetwork, self).__init__()

    self.conv1 = nn.Conv2d(1, 32, 3, 1)
    self.conv2 = nn.Conv2d(32, 64, 3, 1)
    self.dropout = nn.Dropout2d(.5)
    self.fc1 = nn.Linear(9216, 128)
    self.fc2 = nn.Linear(128, 10)

  def forward(self, input):
    x = self.conv1(input)
    x = F.relu(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = F.max_pool2d(x, 2)
    x = self.dropout(x)
    x = torch.flatten(x, 1)
    x = self.fc1(x)
    x = F.relu(x)
    return self.fc2(x)

Digit recognition was the first task to which convolutional networks were applied in the 1980s. The motivation was to automatically sort US mail by ZIP code. To help solve this problem, a set of 70,000 digit images was curated in the Modified National Institute of Standards and Technology (MNIST) dataset (https://en.wikipedia.org/wiki/MNIST_database).

We will now load the MNIST data and train a DigitNetwork with it. MNIST is one of the most popular machine learning datasets, so pytorch knows where it is on the internet and will download it for us. The code below will also prepare training and testing datasets and corresponding "DataLoader" objects that provide batches of examples from these datasets.

In [3]:
from torchvision import datasets, transforms

directory = 'data' # where we will save the dataset on this server

transform = transforms.Compose([
  transforms.ToTensor(),
  transforms.Normalize((0.1307,), (0.3081,))
])

train_set = datasets.MNIST(directory, train=True, download=True, transform=transform)
test_set = datasets.MNIST(directory, train=False, download=True, transform=transform)

batch_size = 1000 # the size of a group of examples we will process at once
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
validation_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=True)

# image_datasets = {'train': train_set, 'validation': test_set}
# dataloaders = {'train': train_loader, 'validation': test_loader}

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 343919257.86it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 28131837.86it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 171725260.02it/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 5223616.33it/s]


Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw



TODO

In [22]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

criterion = nn.CrossEntropyLoss()


def process_batch(model, inputs, labels, optimizer=None):
  inputs = inputs.to(device)
  labels = labels.to(device)
  outputs = model(inputs)
  loss = criterion(outputs, labels)

  if optimizer is not None:
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

  _, predictions = torch.max(outputs, 1)
  fraction_correct = torch.sum(predictions == labels.data).item() / len(predictions)

  return loss.item(), fraction_correct


def loop_through_batches(model, loader, optimizer=None):
    losses = []
    accuracies = []
    for inputs, labels in loader:
      print('.', end='')
      batch_loss, batch_acc = process_batch(model, inputs, labels, optimizer=optimizer)
      losses.append(batch_loss)
      accuracies.append(batch_acc)

    print('{} loss: {:.4f} accuracy: {:.4f}'.format(
        'Validation' if optimizer is None else 'Training',
        np.mean(losses),
        np.mean(accuracies)))


def train_model(model, optimizer, num_epochs):
  for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch+1, num_epochs))

    model.train() # put model in training mode
    loop_through_batches(model, train_loader, optimizer=optimizer)

    model.eval() # put model in evaluation mode
    loop_through_batches(model, validation_loader)

  return model

In [5]:
print(device)

cuda:0


In [23]:
from torch import optim

model = DigitNetwork()
model.to(device);
optimizer = optim.SGD(model.parameters(), lr=0.1)

train_model(model, optimizer, 3);

Epoch 1/3
............................................................Training loss: 0.9431 accuracy: 0.7251
..........Validation loss: 0.2552 accuracy: 0.9251
Epoch 2/3
............................................................Training loss: 0.2399 accuracy: 0.9292
..........Validation loss: 0.1619 accuracy: 0.9490
Epoch 3/3
............................................................Training loss: 0.1614 accuracy: 0.9526
..........Validation loss: 0.1051 accuracy: 0.9683
