1. CIFAR10 Dataset
The CIFAR-10 dataset consists of 50,000 training images, 10,000 test images, 10 categories, with 6,000 images per category. The images are of size 32×32×3. The image below lists the 10 categories, with 10 randomly displayed images from each category

In [36]:
from torchvision.datasets import CIFAR10
from torchvision.transforms import Compose
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
import torch.nn as nn
import torchvision
import torch
import torch.nn.functional as F
import time
import numpy as np
import torch.optim as optim

In [32]:
train = CIFAR10(root='data', train=True, download=True,transform=Compose([ToTensor()]))
valid = CIFAR10(root='data', train=False, download=True,transform=Compose([ToTensor()]))

Files already downloaded and verified
Files already downloaded and verified


In [33]:
# Number of datasets
print('Number of training samples:', len(train.targets))
print('Number of testing samples:', len(valid.targets))

# Dataset shape
print("Dataset shape:", train[0][0].shape)

# Dataset classes
print("Dataset classes:", train.class_to_idx)

Number of training samples: 50000
Number of testing samples: 10000
Dataset shape: torch.Size([3, 32, 32])
Dataset classes: {'airplane': 0, 'automobile': 1, 'bird': 2, 'cat': 3, 'deer': 4, 'dog': 5, 'frog': 6, 'horse': 7, 'ship': 8, 'truck': 9}


In [34]:
train = CIFAR10(root='data', train=True, transform=Compose([ToTensor()]))
dataloader = DataLoader(train, batch_size=8, shuffle=True)
for x, y in dataloader:
    print(x.shape)
    print(y)
    break

torch.Size([8, 3, 32, 32])
tensor([2, 8, 0, 7, 4, 9, 6, 8])


2. Building the Image Classification Network
  The network structure we are going to build is as follows:

  Input shape: 32x32

  First Convolutional Layer: Input 3 channels, output 6 channels, Kernel Size: 3x3

  First Pooling Layer: Input 30x30, output 15x15, Kernel Size: 2x2, Stride: 2

  Second Convolutional Layer: Input 6 channels, output 16 channels, Kernel Size: 3x3

  Second Pooling Layer: Input 13x13, output 6x6, Kernel Size: 2x2, Stride: 2

  First Fully Connected Layer: Input 576 dimensions, output 120 dimensions

  Second Fully Connected Layer: Input 120 dimensions, output 84 dimensions

  Output Layer: Input 84 dimensions, output 10 dimensions

  We apply the ReLU activation function after each convolution operation to introduce non-linearity into the network.

In [37]:
class ImageClassification(nn.Module):


    def __init__(self):

        super(ImageClassification, self).__init__()

        self.conv1 = nn.Conv2d(3, 6, stride=1, kernel_size=3)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(6, 16, stride=1, kernel_size=3)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.linear1 = nn.Linear(576, 120)
        self.linear2 = nn.Linear(120, 84)
        self.out = nn.Linear(84, 10)


    def forward(self, x):

        x = F.relu(self.conv1(x))
        x = self.pool1(x)

        x = F.relu(self.conv2(x))
        x = self.pool2(x)

        x = x.reshape(x.size(0), -1)
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))

        return self.out(x)

3. Writing the Training Function

For training, we use the multi-class cross-entropy loss function and the Adam optimizer. The implementation code is as follows:

In [40]:
def train():

    # Load the CIFAR10 training set and convert it to a tensor
    transform = Compose([ToTensor()])
    cifar10 = torchvision.datasets.CIFAR10(root='data', train=True, download=True, transform=transform)

    # Build the image classification model
    model = ImageClassification()
    # Define the loss function
    criterion = nn.CrossEntropyLoss()
    # Define the optimizer
    optimizer = optim.Adam(model.parameters(), lr=1e-3)
    # Number of epochs
    epochs = 100

    for epoch_idx in range(epochs):

        # Create the data loader
        dataloader = DataLoader(cifar10, batch_size=8, shuffle=True)
        # Total sample count
        sam_num = 0
        # Total loss
        total_loss = 0.0
        # Start time
        start = time.time()
        correct = 0

        for x, y in dataloader:
            # Pass input to the model
            output = model(x)
            # Compute the loss
            loss = criterion(output, y)
            # Zero gradients
            optimizer.zero_grad()
            # Backpropagation
            loss.backward()
            # Update parameters
            optimizer.step()

            correct += (torch.argmax(output, dim=-1) == y).sum()
            total_loss += (loss.item() * len(y))
            sam_num += len(y)

        print('epoch:%2s loss:%.5f acc:%.2f time:%.2fs' %
              (epoch_idx + 1,
               total_loss / sam_num,
               correct / sam_num,
               time.time() - start))

    # Save the model
    torch.save(model.state_dict(), 'model/image_classification.bin')


In [41]:
train()

Files already downloaded and verified
epoch: 1 loss:1.60087 acc:0.41 time:49.98s
epoch: 2 loss:1.30874 acc:0.53 time:52.45s
epoch: 3 loss:1.19858 acc:0.57 time:52.14s
epoch: 4 loss:1.12035 acc:0.60 time:50.97s
epoch: 5 loss:1.06235 acc:0.62 time:51.71s
epoch: 6 loss:1.01503 acc:0.64 time:51.50s
epoch: 7 loss:0.97992 acc:0.65 time:52.01s
epoch: 8 loss:0.94423 acc:0.67 time:52.98s
epoch: 9 loss:0.92218 acc:0.68 time:54.11s
epoch:10 loss:0.89140 acc:0.68 time:55.02s
epoch:11 loss:0.87472 acc:0.69 time:55.94s
epoch:12 loss:0.85397 acc:0.70 time:55.45s
epoch:13 loss:0.83217 acc:0.70 time:53.93s
epoch:14 loss:0.81138 acc:0.71 time:53.90s
epoch:15 loss:0.79646 acc:0.71 time:53.97s
epoch:16 loss:0.77884 acc:0.72 time:53.84s
epoch:17 loss:0.76344 acc:0.73 time:53.82s
epoch:18 loss:0.75280 acc:0.73 time:53.44s
epoch:19 loss:0.73790 acc:0.74 time:53.62s
epoch:20 loss:0.72384 acc:0.74 time:53.63s
epoch:21 loss:0.70968 acc:0.75 time:53.41s
epoch:22 loss:0.70075 acc:0.75 time:53.23s
epoch:23 loss:0.

RuntimeError: Parent directory model does not exist.

4. Writing the Prediction Function
We load the trained model and make predictions on the 10,000 samples in the test set to evaluate the model's accuracy on the test set.

In [42]:
def test():

    # Load the CIFAR10 test set and convert it to a tensor
    transform = Compose([ToTensor()])
    cifar10 = torchvision.datasets.CIFAR10(root='data', train=False, download=True, transform=transform)
    # Create the data loader
    dataloader = DataLoader(cifar10, batch_size=18, shuffle=True)
    # Load the model
    model = ImageClassification()
    model.load_state_dict(torch.load('model/image_classification.bin'))
    model.eval()

    total_correct = 0
    total_samples = 0
    for x, y in dataloader:
        output = model(x)
        total_correct += (torch.argmax(output, dim=-1) == y).sum()
        total_samples += len(y)

    print('Acc: %.2f' % (total_correct / total_samples))

From the results of the program's execution, the network model's accuracy on the test set is not very high. We can adjust the network in the following ways:

Increase the number of output channels in the convolutional layers

Increase the number of parameters in the fully connected layers

Adjust the learning rate

Change the optimization method

Modify the activation function

And so on...

I modified the learning rate from 1e-3 to 1e-4, and increased the network's parameter count as shown in the code below:

In [43]:
class ImageClassification(nn.Module):

    def __init__(self):
        super(ImageClassification, self).__init__()

        self.conv1 = nn.Conv2d(3, 32, stride=1, kernel_size=3)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 128, stride=1, kernel_size=3)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.linear1 = nn.Linear(128 * 6 * 6, 2048)
        self.linear2 = nn.Linear(2048, 2048)
        self.out = nn.Linear(2048, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)

        x = F.relu(self.conv2(x))
        x = self.pool2(x)

        # Since the last batch might not be a full 32, we need to flatten based on batch size
        x = x.reshape(x.size(0), -1)

        x = F.relu(self.linear1(x))
        x = F.dropout(x, p=0.5)

        x = F.relu(self.linear2(x))
        x = F.dropout(x, p=0.5)

        return self.out(x)
