# Building a Convolutional Neural Network (CNN) with PyTorch

In this notebnook, we will use PyTorch to build a Convolutional Neural Network (CNN)

**Attention:** The code in this notebook creates Google Cloud resources that can incur costs.

Refer to the Google Cloud pricing documentation for details.

For example:

* [Vertex AI Pricing](https://cloud.google.com/vertex-ai/pricing)


## Import libraries

We start by importing the necessary libraries

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim

## Load and process the dataset

In the next cell, we wil load the [CIFAR-10 dataset](https://keras.io/api/datasets/cifar10/), which is a dataset that consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.  The classes are: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', and 'truck'. The following is a description of what our code will do.

### Define the transformations to be performed on our input data

```
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
```
This code calls `torchvision.transforms.Compose()` to create a transformation object named `transform` that performs the following steps:
* transforms.ToTensor(): Convert each image (PIL Image) to a PyTorch tensor. 
* transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)): Normalize the pixel values of each image by subtracting the mean value (0.5) from each color channel (red, green, blue), and dividing each channel by its standard deviation (also 0.5 in this case).

### Download the datasets
```
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
```
This code creates two datasets, `trainset` and `testset`, by loading the CIFAR-10 dataset from the torchvision library. We use the following dataset parameters:
* root='./data': This specifies the directory where the dataset will be downloaded or stored.
* train=True and train=False: This indicates whether to load the training or test split of the dataset.
* download=True: This downloads the dataset if it's not already downloaded in our directory.
* transform=transform: Applies the previously defined image transformations to the loaded images.

### Create data loaders
Data loaders are iterators that provide data in batches for training and evaluation.
The following code creates two data loaders, `trainloader` and `testloader`, from the loaded datasets:
```
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)
```
The DataLoader parameters are used as follows:
* batch_size=4: This specifies the number of images to be included in each batch.
* shuffle=True (for trainloader): This randomly shuffles the training data before each epoch to improve model generalization.
* shuffle=False (for testloader): This maintains the original order of the test data for consistent evaluation.
* num_workers=2: This specifies the number of subprocesses to use for data loading (using multiple workers can improve the performance by parallelizing data loading).

In [None]:
# Define transformations to be applied on each image of CIFAR-10
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Loading the training set and test set of CIFAR-10
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

## Define our CNN model

The code in the next cell will define a CNN with the following layers (see the descriptions in the text in Chapter-14 in our book for reference):

* **Convolutional layers:** These layers extract features from the input images using filters. 
* **Max pooling layers:** These layers reduce the dimensionality of the features, making the model more efficient and less prone to overfitting. We also downsample the input by taking the maximum value in 2x2 patches.
* **Linear layers:** These are fully connected layers that perform the final classification. The first fully connected layer takes the flattened input (16 * 5 * 5) from the last convolutional layer, and has 120 neurons, and we then add another fully connected layer with 84 neurons. We use ReLU activation for the first two fully connected layers, and then we define an output layer with 10 neurons (one for each class) and softmax activation to produce probability scores. This is what provides the probability at which the input image was a member of each class.

The `forward` function defines how input data (x) flows through the network during the forward pass for inference. We can break it down as follows:

### First convolutional block
```
x = self.pool(nn.functional.relu(self.conv1(x)))
```
This applies the first convolutional block, as follows:
* self.conv1(x): Applies the first convolutional layer to the input x, extracting features from the input image.
* nn.functional.relu(): Applies the ReLU (Rectified Linear Unit) activation function, introducing non-linearity to the output of the convolutional layer.
* self.pool(): Applies max pooling, downsampling the spatial dimensions of the feature maps to reduce computational complexity and enhance spatial invariance.
The output of this line, which is a downsampled feature map with non-linear activations, is stored back in `x` for further processing.

### Second convolutional block
```
x = self.pool(nn.functional.relu(self.conv2(x)))
```
This applies the same operations as the previous line, but with the second convolutional layer (self.conv2), further refining the feature representations.

### Flattening

```
x = x.view(-1, 16 * 5 * 5)
```
This flattens the feature maps by reshaping the output of the convolutional blocks (which are 3D tensors) into a 1D vector, preparing the data to be fed into the fully connected layers. The following are the values used:
* -1: Automatically infers the correct dimension for the batch size.
* 16 * 5 * 5: Calculates the total number of elements in the flattened feature maps, ensuring that all information is preserved during the reshaping process.

### First fully connected layer
```
x = nn.functional.relu(self.fc1(x))
```
This applies the first fully connected layer, as follows:
* self.fc1(x): Applies the first fully connected layer, performing a linear transformation on the flattened features.
* nn.functional.relu(): Applies the ReLU activation function again to introduce non-linearity.

### Second fully connected layer
```
x = nn.functional.relu(self.fc2(x))
```
This performs the same operations as the previous line, but with the second fully connected layer (self.fc2), further processing the features.

### Final fully connected layer

```
x = self.fc3(x)
```
This applies the final fully connected layer (self.fc3), producing the output of the network. In this case, no activation function is applied after this layer, as we can leave raw the output values for tasks like classification where probabilities are needed.

In [None]:
# Define a Convolutional Neural Network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(nn.functional.relu(self.conv1(x)))
        x = self.pool(nn.functional.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net() # Instantiate the model

## Train our CNN model

In the next cell, we perform the following steps to train our CNN model:
* Specify our loss function (cross entropy for classification) and optimizer (stochastic gradient descent, with a learning rate of 0.001, and momentum of 0.9)
* Loop over the dataset multiple times, and in each iteration:
1. Perform the forward pass: `outputs = net(inputs)`: This feeds the input images through the neural network (net) to get its predictions (outputs).
1. Calculate the loss.
1. Perform the backward pass: `loss.backward()`: This starts the backpropagation, which computes the gradients of the loss with respect to the model's parameters. These gradients indicate how much each parameter contributed to the loss and how they should be adjusted to improve performance.
1. Update the parameters: `optimizer.step()`: This uses the computed gradients to update the model's parameters in the direction that minimizes the loss. 
1. Track the loss: `running_loss += loss.item()`: This accumulates the loss over multiple batches to get a more stable estimate of the overall training loss.
1. Print the progress ever 2000 batches.
1. Reset the loss for the next batch of 2000 iterations: `running_loss = 0.0`

When training is finished, we save our trained model.

In [None]:
# Define a loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# Train the network
for epoch in range(20):  # loop over the dataset multiple times
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

# Save the trained model
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

## Load and evaluate our CNN model

Next, we will load our saved model and evaluate it. The code in the next cell performs the following steps:

### Load the Saved Model:

* `net = Net()`: This creates a new instance of the neural network class Net (i.e., an empty model with the defined architecture).
* `net.load_state_dict(torch.load(PATH))`: This loads the trained parameters (weights and biases) of the trained model we saved in the previous step.
* `net.eval()`: This puts the model into evaluation mode, which disables some operations like dropout and batch normalization that are used during training but aren't necessary for evaluation. 

### Evaluate the model on the Test Dataset:

#### Disable gradient calculation
* `with torch.no_grad()`: This context manager temporarily disables gradient calculation, as it's not needed for evaluation and can slightly improve performance.

#### Loop over the test batches:
* `for data in testloader`: This loop iterates over the batches of data provided by the testloader data loader.
* `images, labels = data`: Unpacks each batch into input images and corresponding ground truth labels.
* `outputs = net(images)`: Feeds the input images through the model to get its predictions.
* `_, predicted = torch.max(outputs.data, 1)`: Finds the class with the highest probability for each image (the model's predicted class).
* `total += labels.size(0)`: Increments the total number of samples processed.
* `correct += (predicted == labels).sum().item()`: Compares the predicted classes with the true labels and counts the number of correct predictions.

Finally, we print the calculated accuracy.

In [None]:
# Load the saved model
net = Net()
net.load_state_dict(torch.load(PATH))

# Evaluate the model on the test dataset
net.eval()  
correct = 0 # Initialize counter to zero
total = 0 # Initialize counter to zero
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total} %')

## Get predictions from our model

The code in the next cell will perform the following steps:

1. Import libraries (matplotlib.pyplot and numpy)
1. Define the classes for mapping to numerical representations in source dataset ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
1. Define a function to display images from the dataset. This function takes an image as input and displays it using Matplotlib's imshow function. It also un-normalizes the image pixels, which are often normalized between 0 and 1 during preprocessing, to display them correctly, and converts the image tensor to a NumPy array, which is compatible with Matplotlib.
1. Get some random testing images and labels
1. Print the images and their ground truth
1. Predict labels for the images and print them

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Class labels in CIFAR-10
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# Get some random testing images
dataiter = iter(testloader) # Create an iterator over the test dataset using the testloader data loader.
images, labels = next(dataiter) # Get the next batch of images and their corresponding ground truth labels from the iterator.

# Print images
imshow(torchvision.utils.make_grid(images)) # Create a grid of the images 
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]}' for j in range(4))) # Print the ground truth labels of the displayed images.

# Predict labels for the images
outputs = net(images)
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join(f'{classes[predicted[j]]}'
                              for j in range(4)))