Guide for the code: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

**CNN Model for the CIFAR-10 Dataset**

Importing libraries

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
!pip install torchinfo
import torchinfo

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchinfo
  Downloading torchinfo-1.7.1-py3-none-any.whl (22 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.7.1


First, the dataset was loaded using the DataLoader pytorch function and the torchvision library. The 10 classes of the dataset were defined, and the transforms.Compose function was used to apply transformations to the original dataset. This transformations were then normalized. A batch_size of 4 was initially selected but then it was changed to 16 since it produced better results.

In [2]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 16

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


**CNN Model**

For the CNN model, the following arquitecture was used, in this order: 

---The first part of the architecture consists of 3 blocks with the following structure:

-2 Convolutional layers: Starting with 3 channels, each convolutional layer increased the number of channels to 16, 32, 64, 128, 256 and finally 512 channels. All convolutional layers have a 3x3 Filter.

-Batch Normalization: Normalizes data at the end of each convolution layer, and gives the network more stability when training, since all values and weights have the same scale.

-Activation Function: Then, ReLU activation is used, which is a non-linear activation function, for the output of the batch normalization.

-Max Pooling Layer: The max pooling layer downsamples the size of the image. For the first block, a 2x2 filter with 2 stride was used, and for the next 2 blocks, a 1x1 filter with 1 stride was selected.

-Dropout Layer: At the end of each block, a 50% dropout layer is applied.

---Then, after the 3 CNN blocks, the features are given to the prediction model (second part of the architecture):

-Flatten Layer: The output of the last block is flattened (except for the batch) to a size of 18432 features (6x6x512).

-Dense Layer: This first dense layer processes the features extracted from the CNN blocks and gives an output of 1024 features. 

-Activation Function: A ReLU activation function is used for the Dense layer.

-Dense Layer: This second layer further processes the features and gives an output of 256 features.

-Activation Function: A ReLU activation function is used for the Dense layer.

-Output Layer: This layer gives the final output of the model an gives an output of 10 features that correspond to each of the 10 classes. A linear activation is used on this final layer.

In [3]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.norm1 = nn.BatchNorm2d(16)
        self.norm2 = nn.BatchNorm2d(32)
        self.norm3 = nn.BatchNorm2d(64)
        self.norm4 = nn.BatchNorm2d(128)
        self.norm5 = nn.BatchNorm2d(256)
        self.norm6 = nn.BatchNorm2d(512)
        self.drop = nn.Dropout(0.5)
        self.conv2 = nn.Conv2d(16, 32, 3)
        self.pool2 = nn.MaxPool2d(1, 1)
        self.conv3 = nn.Conv2d(32, 64, 3)
        self.conv4 = nn.Conv2d(64, 128, 3)
        self.conv5 = nn.Conv2d(128, 256, 3)
        self.conv6 = nn.Conv2d(256, 512, 3)
        self.fc1 = nn.Linear(512 * 6 * 6, 1024)
        self.fc2 = nn.Linear(1024, 256)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, x):
        x  = F.relu(self.norm1(self.conv1(x)))
        x = F.relu(self.norm2(self.conv2(x)))
        x = self.pool1(x)
        x = self.drop(x)
        x = F.relu(self.norm3(self.conv3(x)))
        x = F.relu(self.norm4(self.conv4(x)))
        x = self.pool2(x)
        x = self.drop(x)
        x = F.relu(self.norm5(self.conv5(x)))
        x = F.relu(self.norm6(self.conv6(x)))
        x = self.pool2(x)
        x = self.drop(x)
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = self.drop(x)
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

Summary of the model outputs and parameters for each layer, and the total number of trainable parameters in the model.

In [4]:
torchinfo.summary(Net(), [(3, 32, 32)], batch_dim = 0, verbose = 0)

Layer (type:depth-idx)                   Output Shape              Param #
Net                                      [1, 10]                   --
├─Conv2d: 1-1                            [1, 16, 30, 30]           448
├─BatchNorm2d: 1-2                       [1, 16, 30, 30]           32
├─Conv2d: 1-3                            [1, 32, 28, 28]           4,640
├─BatchNorm2d: 1-4                       [1, 32, 28, 28]           64
├─MaxPool2d: 1-5                         [1, 32, 14, 14]           --
├─Dropout: 1-6                           [1, 32, 14, 14]           --
├─Conv2d: 1-7                            [1, 64, 12, 12]           18,496
├─BatchNorm2d: 1-8                       [1, 64, 12, 12]           128
├─Conv2d: 1-9                            [1, 128, 10, 10]          73,856
├─BatchNorm2d: 1-10                      [1, 128, 10, 10]          256
├─MaxPool2d: 1-11                        [1, 128, 10, 10]          --
├─Dropout: 1-12                          [1, 128, 10, 10]          --
├

Loss function and opttimizer:

-For the loss function, Cross-Entropy was selected since this is a multi-class classification problem.

-For the optimizer, 2 different optimizers were tested: Adam and Stochastic Gradient Descent (SGD). SGD performed better than Adam, with a learning rate of 1e-3 and momentum of 0.9.

In [5]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Checking if CUDA GPUs are available, if not, CPU is used.

In [9]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

print(device)

net.to(device)

cuda:0


Net(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1))
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (norm1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (norm2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (norm3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (norm4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (norm5): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (norm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (drop): Dropout(p=0.5, inplace=False)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
  (pool2): MaxPool2d(kernel_size=1, stride=1, padding=0, dilation=1, ceil_mode=False)
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (conv4): Conv2d(64, 128, kernel_size=(3, 3), strid

**Training of the model**

The model was trained for 50 epochs, the input is processed by the model, then the loss function is updated and the Backward pass is computed. The training loss is printed every 2000 processed training patterns (images).

In [11]:
for epoch in range(50):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].to(device), data[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

[1,  2000] loss: 1.737
[2,  2000] loss: 1.378
[3,  2000] loss: 1.197
[4,  2000] loss: 1.088
[5,  2000] loss: 1.000
[6,  2000] loss: 0.932
[7,  2000] loss: 0.879
[8,  2000] loss: 0.835
[9,  2000] loss: 0.793
[10,  2000] loss: 0.763
[11,  2000] loss: 0.736
[12,  2000] loss: 0.713
[13,  2000] loss: 0.681
[14,  2000] loss: 0.659
[15,  2000] loss: 0.643
[16,  2000] loss: 0.625
[17,  2000] loss: 0.616
[18,  2000] loss: 0.592
[19,  2000] loss: 0.576
[20,  2000] loss: 0.554
[21,  2000] loss: 0.555
[22,  2000] loss: 0.534
[23,  2000] loss: 0.523
[24,  2000] loss: 0.509
[25,  2000] loss: 0.500
[26,  2000] loss: 0.486
[27,  2000] loss: 0.472
[28,  2000] loss: 0.466
[29,  2000] loss: 0.459
[30,  2000] loss: 0.446
[31,  2000] loss: 0.435
[32,  2000] loss: 0.427
[33,  2000] loss: 0.415
[34,  2000] loss: 0.410
[35,  2000] loss: 0.403
[36,  2000] loss: 0.394
[37,  2000] loss: 0.388
[38,  2000] loss: 0.377
[39,  2000] loss: 0.364
[40,  2000] loss: 0.365
[41,  2000] loss: 0.355
[42,  2000] loss: 0.346
[

**Testing of the model**

The model is tested on 10000 images from the CIFAR-10 Dataset. The class that corresponds to the highest output value of the model is selected as the prediction. Then, all correct values are added, divided by the total and multiplied by 100%, to obtain the test accuracy. The final result achieved with this model is 78% accuracy.

In [12]:
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Accuracy of the network on the 10000 test images: 78 %


For each class, the accuracy is calculated with the already trained model.

In [13]:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

Accuracy for class: plane is 79.2 %
Accuracy for class: car   is 89.3 %
Accuracy for class: bird  is 67.2 %
Accuracy for class: cat   is 63.0 %
Accuracy for class: deer  is 78.3 %
Accuracy for class: dog   is 64.7 %
Accuracy for class: frog  is 81.0 %
Accuracy for class: horse is 81.5 %
Accuracy for class: ship  is 82.9 %
Accuracy for class: truck is 88.5 %


The following results were obtained:

Accuracy for class: plane is 79.2 %

Accuracy for class: car   is 89.3 %

Accuracy for class: bird  is 67.2 %

Accuracy for class: cat   is 63.0 %

Accuracy for class: deer  is 78.3 %

Accuracy for class: dog   is 64.7 %

Accuracy for class: frog  is 81.0 %

Accuracy for class: horse is 81.5 %

Accuracy for class: ship  is 82.9 %

Accuracy for class: truck is 88.5 %

In conclusion, it is easier for the model to predict objects like cars and ships, but lacks accuracy when predicting animals like cats.