# **Image Classification: Convolutional Neural Networks**

In this Jupyter notebook, we will implement a fully functioning ConvNet model using PyTorch. We will use the model to conduct image classification on the FashionMNST dataset.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(0)
torch.use_deterministic_algorithms(True)

from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

%matplotlib inline
np.random.seed(1)

***

## **1. Load Data**

We load the FashionMNIST dataset provided by PyTorch. We can also change the `download` param to `False`, and copy the `data` folder to the current folder.

See <https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader> for more information.

In [2]:
training_data = datasets.FashionMNIST(
    root="../data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="../data",
    train=False,
    download=True,
    transform=ToTensor()
)

batch_size = 64

train_loader = DataLoader(training_data, batch_size=batch_size)
test_loader = DataLoader(test_data, batch_size=batch_size)

***

## **2. Examine Data Size**

Now, we can examine the size of the training/test data, which is important for determining some of the parameters of our model

In [3]:
for i, (X, y) in enumerate(train_loader):
    if i > 0:
        break

print('X.shape: ', X.shape)
print('Y.shape: ', y.shape)

X.shape:  torch.Size([64, 1, 28, 28])
Y.shape:  torch.Size([64])


***

## **3. Build the Model**

We will need to define our ConvNet model as a subclass of `torch.nn.Module`. Because we have already imported `torch.nn` as `nn`, we can specify the baseclass simply as `nn.Module`.

We need to override two functions in defining the class, `__init__()` and `forward()`.
- All the parameters, including the convolutional, pooling, and fully-connected layers are defined in `__init__()`. They are declared and initialized as members of the class, using the `self.` notation in Python. 
- The forward pass of the computational graph is defined in `forward()`. This function takes as input the training data, and call all operations (conv, pool, etc.) sequentially on the data. The output of a preceding operation is used as the input for the following operation. 

- We define the model so that the architecture is as follows: <br>
    Conv1 -> ReLU -> BatchNorm-> MaxPool1 -> \
    Conv2 -> ReLU -> BatchNorm-> MaxPool2 -> \
    FullyConnected -> Softmax.
  <br> In which,
    - `conv1` has filter size $f=3$, stride $s=1$, padding $p=0$, the number of filters $n_f=6$
    - `conv2` has filter  $f=3$, stride $s=2$, padding $p=0$, the number of filters $n_f=12$;
    - all max-pool layers use filter  $f=2$ (stride $s=2$ by default).
  <br>
- *Note* that the *RELU* activation function is implemented in `forward()` rather than `__init__()`, using `F.relu()`, in which `F` is short for `torch.nn.functional` (imported at the beginning).

- The `in_features` of `self.fc` is the total number of output units after the `self.pool2` layer.
- The `out_features` of `self.fc` should match the number of classes in FashionMNIST dataset, which is 10.
- We use the following formula to compute the height and width of ouputs from conv layers.
\begin{equation}\text{Output} = (\lfloor\frac{n+2p-f}{s}\rfloor + 1)\times(\lfloor\frac{n+2p-f}{s}\rfloor + 1)\end{equation}
- For the output of model, we need to use `nn.logSoftmax()`.

In [4]:
class ConvNetModel(nn.Module):
    def __init__(self, debug=False):
        super(ConvNetModel, self).__init__()
        self.debug = debug

        # The first convolutional layer has in_channels=1, out_channels=6, kernel_size=3, with default stride=1 and padding=0
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, stride=1, padding=0)
        self.bn1 = nn.BatchNorm2d(num_features=6)
        # The first pooling layer is a maxpool with a square window of kernel_size=2 (default stride is same as kernel_size)
        self.pool1 = nn.MaxPool2d(kernel_size=2)

        # The second convolutional layer
        # NOTE: Its in_channels should match the out_channels of conv1
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=3, stride=2, padding=0)
        self.bn2 = nn.BatchNorm2d(num_features=12)
        # The second pooling layer is maxpool with a square window of kernel_size=2
        self.pool2 = nn.MaxPool2d(kernel_size=2)
        
        # The fully-connected layer
        self.flatten = nn.Flatten()
        self.fc = nn.Linear(12*3*3, 10) # Use nn.Linear, and you need to specify the correct in_features and out_features
        
        # Softmax layer
        self.output = nn.LogSoftmax(dim=1) # Use nn.LogSoftmax(), specify the dim correctly        
    
    def forward(self, x):
        # Conv1 -> ReLU -> Batchnorm1-> Pool1
        x = self.pool1(F.relu(self.bn1(self.conv1(x))))
        if self.debug:
            print('output shape of pool1:', x.shape)
        
        # Conv2 -> ReLU -> Batchnorm2 -> Pool2
        x = self.pool2(F.relu(self.bn2(self.conv2(x))))
        if self.debug:
            print('output shape of pool2:', x.shape)
        
        # Flatten the output from the last pooling layer
        x = self.flatten(x)
        
        # Call the fully-connected layer, followed by a F.relu()
        x = F.relu(self.fc(x))
        
        # Call softmax layer
        x = self.output(x)        
        return x

In [5]:
model = ConvNetModel(debug=False)

# Test code
torch.manual_seed(0)
input_data = torch.randn(64, 1, 28, 28)
output = model(input_data)
print('output.size():', output.size())

output.size(): torch.Size([64, 10])


***

## **4. Train and Evaluate**

Now we will use the functions we have implemented above to build a full model. Then we train the model on the sign language dataset.

We can refer to the official documents: See <https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html> and <https://pytorch.org/docs/stable/optim.html> for more information.

In [6]:
def train_loop(dataloader, model, loss_fn, optimizer, verbose=True):
    for i, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if verbose and i % 100 == 0:
            loss = loss.item()
            current_step = i * len(X)
            print(f"loss: {loss:>7f}  [{current_step:>5d}/{len(dataloader.dataset):>5d}]")

In [7]:
@torch.no_grad()
def test_loop(dataloader, model, loss_fn):
    test_loss, correct = 0, 0

    for X, y in dataloader:
        pred = model(X)
        loss = loss_fn(pred, y)
        test_loss += loss.item()
        # Add the number of correct prediction in the current batch to `correct`
        correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= len(dataloader)
    test_acc = correct / len(dataloader.dataset)
    print(f"Test Error: \n Accuracy: {(100*test_acc):>0.1f}%, Avg loss: {test_loss:>8f} \n")

Next, we execute the following cell to start the training and testing loop.

**Note** that a different loss function, `nn.NLLLoss()` should be used, instead of `nn.CrossEntropyLoss()`, because we already used softmax as the output layer in the model.

In [8]:
# Reset the model
model = ConvNetModel()
learning_rate = 1e-3

# Use the correct loss function
loss_fn = nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    # Call train_loop(); use verbose=False to see less information
    train_loop(train_loader, model, loss_fn, optimizer, verbose=False)
    # Call test_loop()
    test_loop(test_loader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
Test Error: 
 Accuracy: 77.5%, Avg loss: 0.696391 

Epoch 2
-------------------------------
Test Error: 
 Accuracy: 78.6%, Avg loss: 0.645757 

Epoch 3
-------------------------------
Test Error: 
 Accuracy: 79.5%, Avg loss: 0.619855 

Epoch 4
-------------------------------
Test Error: 
 Accuracy: 79.7%, Avg loss: 0.606201 

Epoch 5
-------------------------------
Test Error: 
 Accuracy: 79.6%, Avg loss: 0.598903 

Epoch 6
-------------------------------
Test Error: 
 Accuracy: 79.8%, Avg loss: 0.591905 

Epoch 7
-------------------------------
Test Error: 
 Accuracy: 80.0%, Avg loss: 0.586152 

Epoch 8
-------------------------------
Test Error: 
 Accuracy: 80.1%, Avg loss: 0.582046 

Epoch 9
-------------------------------
Test Error: 
 Accuracy: 80.1%, Avg loss: 0.579308 

Epoch 10
-------------------------------
Test Error: 
 Accuracy: 80.2%, Avg loss: 0.577434 

Done!


We were able to reach above 80% test accuracy.

We have successfully built a convolutional neural network model for image classification.