<h1>Convolutional Neural Networks (CNNs) in PyTorch</h1>

Let's build a simple CNN for image classification using the CIFAR-10 dataset.

<h5><b>Step 1:</b> Import Libraries and Prepare Dataset</h5>
First, import the necessary libraries and load the CIFAR-10 dataset using <b>torchvision</b>.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Define transformations for the training set
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),  # Randomly flip the image horizontally
    transforms.RandomCrop(32, padding=4),  #  Before performing the random crop it applies padding of 4 pixels,  Randomly crop the image | width and height of the random crop will be 32 pixels
    transforms.ToTensor(),  # Convert the image to a PyTorch tensor
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))  # transforms.Normalize(mean, std)
    ])
    # Normalize the image | first () - mean values for each of the three color
    # channels (Red, Green, and Blue respectively) of the input images and second () -  standard deviation values for each of the three color channels 
    # (Red, Green, and Blue respectively) of the input images.
'''
        Normalization is used to adjust the pixel values of the input image so that they have a mean of 0 and a standard deviation of 1. 
        This typically leads to faster convergence during training and can improve the overall performance of the neural network.
'''

# Load CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
]))
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') # 10 Classes


Files already downloaded and verified
Files already downloaded and verified


<h5><b>Step 2:</b> Define the CNN Architecture</h5>
Next, define the CNN architecture by creating a class that inherits from <b>nn.Module.</b>

In [3]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        # synatx -> nn.Conv2d(in_channels, out_channels, kernel_size)
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)  # 3 input channels, 16 output channels or feature maps, 3x3 kernel (16 output channel is going to match with next layer's input which is 16)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1) # does not change the spatial dimensions (due to padding=1)
        self.pool = nn.MaxPool2d(2, 2)               # 2x2 max pooling | This layer reduces the spatial dimensions of the feature maps by a factor of 2, effectively performing downsampling.
        # syntax - > nn.Linear(in_features, out_features)
        self.fc1 = nn.Linear(32 * 8 * 8, 120)   # Fully Connected Layer | 120 Output features
        
        # Before feeding the data into the fully connected layer, the output feature map needs to be flattened. 
        # Because, Fully connected layers operate on 1D vectors. Each neuron in a fully connected layer has connections to every element of the input vector. 
        
        self.fc2 = nn.Linear(120, 84)           # Fully Connected Layer | 120 input features, 84 Ouput features
        self.fc3 = nn.Linear(84, 10)            # Fully Connected Layer | 84 input features, 10 output features (As we have 10 Classes/labels)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))  # ReLU activation function | self.pool is an instance of a pooling layer defined earlier in the model, 
        x = self.pool(torch.relu(self.conv2(x)))  # ReLU activation function | self.pool is an instance of a pooling layer defined earlier in the model, 
        x = x.view(-1, 32 * 8 * 8)  # Flatten the tensor
                                    # The view method in PyTorch is used to reshape a tensor without changing its data.
                                    # the -1 dimension based on the total number of elements in the tensor and the sizes of the other specified dimensions.
                                    # The -1 tells PyTorch to infer the size of the first dimension (batch_size) so that the total number of elements remains consistent.
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Instantiate the network
net = SimpleCNN()
print(net)


SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=2048, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


<b>(Optional) To print the detailed above architecture</b>

In [9]:
from torchsummary import summary

summary(net,(3,32,32)) # Example input size for a model expecting 32x32 images with 3 channel

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 32, 32]             448
         MaxPool2d-2           [-1, 16, 16, 16]               0
            Conv2d-3           [-1, 32, 16, 16]           4,640
         MaxPool2d-4             [-1, 32, 8, 8]               0
            Linear-5                  [-1, 120]         245,880
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
Total params: 261,982
Trainable params: 261,982
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.24
Params size (MB): 1.00
Estimated Total Size (MB): 1.25
----------------------------------------------------------------


<b>(Optional) Or, to show the defined model architecture in form of diagram using torchviz</b>

"torchviz" expecting Graphviz (https://graphviz.org/download/)

In [38]:
# This will check and fix Graphviz Path issue, if anythime exist 

import os
import subprocess

# Adjust the path to where Graphviz is installed
os.environ["PATH"] += os.pathsep + r'C:\Program Files\Graphviz\bin'


# Check if the dot executable is available in the PATH
dot_path = subprocess.run(["where", "dot"], capture_output=True, text=True).stdout
print(dot_path)

C:\Program Files\Graphviz\bin\dot.exe



In [44]:
import graphviz
from torchviz import make_dot


x = torch.randn(1, 3, 32, 32) # Returns a tensor with random numbers from the standard normal distribution.
                              # Batch size of 1, 3 color channels, 32x32 pixels
y = net(x) 

make_dot(y, params=dict(net.named_parameters())).render("model_architecture", format="png")

'model_architecture.png'

<b>(Optional) Generate architecture diamgra, Using TensorBoard </b>

In [65]:
from torch.utils.tensorboard import SummaryWriter

In [69]:
# Create a SummaryWriter to write to TensorBoard
writer = SummaryWriter('runs/simple_model')

x = torch.randn(1, 3, 32, 32) 

# Add the model graph to TensorBoard
writer.add_graph(net, x)
writer.close()

In [73]:
!tensorboard --logdir=runs    # While this is runing, visit to http://localhost:6006, to check the output

^C


<h5><b>Step 3:</b> Define Loss Function and Optimizer</h5>

<b>CrossEntropyLoss:</b> Combines nn.LogSoftmax() and nn.NLLLoss() in one single class.<br>
<b>SGD Optimizer:</b> Updates the network parameters using stochastic gradient descent with momentum.

In [31]:
criterion = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)  # Stochastic Gradient Descent with momentum

<h5><b>Step 4:</b> Train the CNN</h5>
Implement the training loop to train the CNN on the CIFAR-10 dataset.

In [33]:
num_epochs = 10

for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = net(inputs)

        # Compute loss
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()

        # Update weights
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 99:  # Print every 100 mini-batches
            print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(trainloader)}], Loss: {running_loss / 100:.4f}')
            running_loss = 0.0

print('Finished Training')


Epoch [1/10], Step [100/500], Loss: 2.3029
Epoch [1/10], Step [200/500], Loss: 2.2929
Epoch [1/10], Step [300/500], Loss: 2.2777
Epoch [1/10], Step [400/500], Loss: 2.2400
Epoch [1/10], Step [500/500], Loss: 2.1573
Epoch [2/10], Step [100/500], Loss: 2.0751
Epoch [2/10], Step [200/500], Loss: 2.0145
Epoch [2/10], Step [300/500], Loss: 1.9694
Epoch [2/10], Step [400/500], Loss: 1.9443
Epoch [2/10], Step [500/500], Loss: 1.9008
Epoch [3/10], Step [100/500], Loss: 1.8522
Epoch [3/10], Step [200/500], Loss: 1.8084
Epoch [3/10], Step [300/500], Loss: 1.7791
Epoch [3/10], Step [400/500], Loss: 1.7508
Epoch [3/10], Step [500/500], Loss: 1.6969
Epoch [4/10], Step [100/500], Loss: 1.6780
Epoch [4/10], Step [200/500], Loss: 1.6539
Epoch [4/10], Step [300/500], Loss: 1.6397
Epoch [4/10], Step [400/500], Loss: 1.6091
Epoch [4/10], Step [500/500], Loss: 1.6146
Epoch [5/10], Step [100/500], Loss: 1.5933
Epoch [5/10], Step [200/500], Loss: 1.5711
Epoch [5/10], Step [300/500], Loss: 1.5560
Epoch [5/10

<h5><b>Step 5:</b> Evaluate the Model</h5>
Evaluate the trained model on the test set to measure its performance.

In [37]:
correct = 0
total = 0
'''
torch.no_grad() => temporarily disable gradient calculations.
It is helpful during inference (evaluation) because you don't need gradients for backpropagation when you are just evaluating the model.
'''
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)   # finds the class with the highest score for each image. 
        total += labels.size(0)     #  count of the total number of samples processed so far. 
                                    #  label.size(0) gives the number of labels in the current batch, which is the same as the number of images in that batch.
        correct += (predicted == labels).sum().item() # counts how many of the predicted labels match the true labels
'''
(predicted == label) creates a tensor of boolean values, where each element is True if the prediction matches the true label. 
.sum() counts the number of True values (i.e., correct predictions) in the batch. 
.item() converts the resulting tensor to a Python number, which is then added to correct_prediction.
'''

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total:.2f}%')


Accuracy of the network on the 10000 test images: 55.40%
