<a href="https://colab.research.google.com/github/drpetros11111/Deep-Learning-and-Neural-Networks-Theory-and-Applications-with-PyTorch/blob/main/Residual_Network1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

# Import Libraries

-----------
-------------
#1. import torch

Purpose:

This line imports the core PyTorch library, which provides the fundamental data structures (like tensors) and functions for building and training neural networks.

Reasoning:

It's the foundation of any PyTorch project, giving you access to essential tools for working with tensors, automatic differentiation, and GPU acceleration.

-------------------
#2. import torch.nn as nn

Purpose:
This imports the torch.nn module, which contains a variety of pre-built neural network layers (like convolutional layers, linear layers, activation functions) and tools for creating neural network architectures.

The as nn part creates a shorter alias for the module, making it more convenient to use in your code (e.g., nn.Linear instead of torch.nn.Linear).

Reasoning:
It simplifies the process of defining and building neural networks by providing ready-made components.

------------------
#3. import torchvision

Purpose:
This imports the torchvision package, which is part of the PyTorch ecosystem and focuses on computer vision tasks.

It provides datasets (like CIFAR-10, ImageNet), model architectures (like ResNet, AlexNet), and image transformations (like resizing, cropping).

Reasoning:
It's extremely helpful for loading and preprocessing image data, using pre-trained models, and applying common image transformations for computer vision applications.

--------------------
#4. import torchvision.transforms as transforms

Purpose:
This imports the torchvision.transforms module, specifically designed for applying transformations to image data.

It offers a wide range of transformations, including resizing, cropping, color adjustments, data augmentation techniques, and converting images to tensors.

The as transforms part creates a shorter alias for convenience.

Reasoning:
It's crucial for preparing image data for neural network training by allowing you to easily apply various transformations and data augmentation strategies.

In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
num_epochs = 25
batch_size = 100
learning_rate = 0.001

# Image preprocessing modules
transform = transforms.Compose([
    transforms.Pad(4),
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32),
    transforms.ToTensor()])

# CIFAR-10 dataset
train_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                             train=True,
                                             transform=transform,
                                             download=True)

test_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                            train=False,
                                            transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

# Data Loaders

------------------
#1. Device Configuration

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

##Purpose:
This line determines whether a CUDA-enabled GPU is available.

If so, it sets the device to 'cuda' for GPU acceleration; otherwise, it defaults to 'cpu'.

##Reasoning:
Utilizing a GPU can significantly speed up the training process, especially for computationally intensive tasks like deep learning.

----------------------------
#2. Hyperparameters

    num_epochs = 25
    batch_size = 100
    learning_rate = 0.001

##Purpose:
These lines define hyperparameters for the training process:

##num_epochs:
The number of times the entire training dataset is iterated over.

##batch_size:
The number of training samples processed in each iteration.

##learning_rate:
Controls the step size during optimization, influencing how quickly the model learns.

##Reasoning:
Hyperparameters are crucial for controlling the training process and achieving optimal model performance.

--------------------
#3. Image Preprocessing

    transform = transforms.Compose([
       transforms.Pad(4),
       transforms.RandomHorizontalFlip(),
       transforms.RandomCrop(32),
       transforms.ToTensor()])

##Purpose:
This code defines a sequence of image transformations using transforms.Compose.

 These transformations will be applied to the training images.

##transforms.Pad(4):
Adds padding of 4 pixels on each side of the image.

##transforms.RandomHorizontalFlip():
Randomly flips the image horizontally.

##transforms.RandomCrop(32):
Randomly crops a 32x32 section from the image.

##transforms.ToTensor():
Converts the image to a PyTorch tensor.

##Reasoning:
These transformations are commonly used for data augmentation, which helps improve the model's generalization ability and prevent overfitting.

----------------------
#4. CIFAR-10 Dataset and Data Loaders

    train_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                             
    train=True,
    transform=transform,
    download=True)

    test_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                            
    train=False,
                                            
    transform=transforms.ToTensor())

train_loader = torch.utils.data.DataLoader
    (dataset=train_dataset,
    batch_size=batch_size,
    shuffle=True)

    test_loader = torch.utils.data.DataLoader     
    (dataset=test_dataset,
    batch_size=batch_size,
    shuffle=False)

##Purpose:
This code loads the CIFAR-10 dataset, which consists of 60,000 32x32 color images in 10 classes.

##train_dataset:
Loads the training set and applies the defined transformations.

##test_dataset:
Loads the test set without augmentations.

##train_loader and test_loader:
Create data loaders to efficiently iterate over the datasets in batches during training and testing.

##Reasoning:
Data loaders provide a convenient way to access and manage the dataset during training and evaluation.

They handle batching, shuffling, and loading data onto the specified device.

![resnetimage](https://user-images.githubusercontent.com/30661597/78585170-f4ac7c80-786b-11ea-8b00-8b751c65f5ca.PNG)

# What is a Residual Network (ResNet)?

----------------
---------------------

A Residual Network (ResNet) is a type of deep convolutional neural network (CNN) architecture that introduces the concept of skip connections or residual connections.

These connections allow the network to learn residual functions instead of directly mapping inputs to outputs.

-------------------------------
#Key Idea behind ResNets:

Traditional deep neural networks can suffer from the vanishing gradient problem, where gradients become very small during backpropagation, hindering the training of deeper layers.

ResNets address this issue by using skip connections to bypass some layers and directly connect earlier layers to later layers.

This allows gradients to flow more easily through the network, enabling the training of much deeper networks.

---------------------------
#Structure of a Residual Block:

The fundamental building block of a ResNet is the residual block.

A typical residual block consists of:

##Two convolutional layers:
These layers perform feature extraction.

##Batch normalization:
This helps stabilize training and improve performance.

##ReLU activation:
This introduces non-linearity.

##Skip connection:
This adds the input of the block to the output, creating a shortcut path for information to flow.

--------------------------
#Benefits of ResNets:

Enable training of very deep networks: ResNets have been successfully used to train networks with hundreds or even thousands of layers.

Improved performance: ResNets have achieved state-of-the-art results on various image recognition tasks.

Easier optimization: The skip connections help alleviate the vanishing gradient problem, making ResNets easier to train.
Applications of ResNets:

ResNets have been widely used in various computer vision tasks, including:

Image classification
Object detection
Semantic segmentation
Image generation

In [None]:
def conv3x3(in_channels, out_channels, stride=1):
    return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = conv3x3(in_channels, out_channels, stride)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(out_channels, out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        if self.downsample:
            residual = self.downsample(x)
        out += residual
        out = self.relu(out)
        return out

Define the conv3x3 function and the ResidualBlock class:

--------------------
#1. conv3x3 Function

    def conv3x3(in_channels, out_channels, stride=1):
        return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)

##Purpose:
This function creates a 3x3 convolutional layer with the specified input and output channels, stride, and padding.

It sets bias=False to disable bias for the convolutional layer.

##Reasoning:
It provides a convenient way to create a standard 3x3 convolutional layer, commonly used in convolutional neural networks (CNNs).

-------------------------
#2. ResidualBlock Class

    class ResidualBlock(nn.Module):
         def __init__(self, in_channels, out_channels, stride=1, downsample=None):
         super(ResidualBlock, self).__init__()
           self.conv1 = conv3x3(in_channels, out_channels, stride)
           self.bn1 = nn.BatchNorm2d(out_channels)
           self.relu = nn.ReLU(inplace=True)
           self.conv2 = conv3x3(out_channels, out_channels)
           self.bn2 = nn.BatchNorm2d(out_channels)
           self.downsample = downsample
        
    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        if self.downsample:
            residual = self.downsample(x)
        out += residual
        out = self.relu(out)
        return out

##Purpose:
This class defines a residual block, a key building block in ResNet architectures.

It consists of two convolutional layers, batch normalization, and ReLU activation.

 The downsample argument is used to adjust the dimensions of the input if necessary.

##Reasoning:
Residual blocks allow for training very deep neural networks by introducing skip connections that help alleviate the vanishing gradient problem.

----------------------
#In summary,
the conv3x3 function creates a standard 3x3 convolutional layer, while the ResidualBlock class defines a residual block using two convolutional layers, batch normalization, and ReLU activation.

These components are fundamental building blocks for constructing ResNet architectures.



In [None]:
class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        super(ResNet, self).__init__()
        self.in_channels = 16
        self.conv = conv3x3(3, 16)
        self.bn = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self.make_layer(block, 16, layers[0])
        self.layer2 = self.make_layer(block, 32, layers[1], 2)
        self.layer3 = self.make_layer(block, 64, layers[2], 2)
        self.avg_pool = nn.AvgPool2d(8)
        self.fc = nn.Linear(64, num_classes)

    def make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if (stride != 1) or (self.in_channels != out_channels):
            downsample = nn.Sequential(conv3x3(self.in_channels, out_channels, stride=stride),
                                       nn.BatchNorm2d(out_channels))
        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels
        for i in range(1, blocks):
            layers.append(block(out_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv(x)
        out = self.bn(out)
        out = self.relu(out)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.avg_pool(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

# Define the Resnet Class
This class defines the overall architecture of the ResNet model.

It utilizes the ResidualBlock we discussed earlier as its building block.

--------------------
##__init__ method:

Initializes the ResNet model with the specified block (which is ResidualBlock in this case), layers (a list specifying the number of residual blocks in each layer), and num_classes (the number of output classes).

Sets up the initial convolutional layer (self.conv), batch normalization (self.bn), and ReLU activation (self.relu).

Creates three layers (self.layer1, self.layer2, self.layer3) using the make_layer method.

Adds an average pooling layer (self.avg_pool) and a fully connected layer (self.fc) for the final classification.
make_layer method:

This method is responsible for creating a layer of residual blocks.

It takes the block type, out_channels, blocks (number of blocks in the layer), and stride as input.

If necessary, it creates a downsampling module (downsample) to adjust the dimensions of the input.

It then creates a list of residual blocks and returns them as a sequential module.
forward method:

This method defines the forward pass of the ResNet model.

It takes the input tensor x and passes it through the various layers of the network:
Initial convolutional layer, batch normalization, and ReLU activation.

Three layers of residual blocks.
Average pooling layer.

Reshaping the output for the fully connected layer.

Fully connected layer for classification.
Finally, it returns the output of the fully connected layer.

---------------------
#Detailed Analysis
    super(ResNet, self).__init__()
##Purpose:
This line calls the constructor of the parent class (nn.Module) to properly initialize the ResNet object as a PyTorch module.

##Reasoning:
It's essential for setting up the underlying infrastructure of the module and ensuring that all the necessary attributes and methods are inherited.

    self.in_channels = 16
##Purpose:
This line initializes the in_channels attribute to 16.

This attribute keeps track of the number of input channels for the current layer or block.

##Reasoning:
It's crucial for defining the input dimensions of subsequent layers and ensuring compatibility between different parts of the network.

    self.conv = conv3x3(3, 16)
##Purpose:
This line creates the initial convolutional layer of the ResNet using the conv3x3 function.

It takes 3 input channels (for RGB images) and produces 16 output channels.

##Reasoning:
The initial convolutional layer is responsible for extracting basic features from the input image.


    self.bn = nn.BatchNorm2d(16)
##Purpose:
This line creates a batch normalization layer (nn.BatchNorm2d) with 16 channels, corresponding to the output channels of the previous convolutional layer.

##Reasoning:
Batch normalization helps stabilize training and improve performance by normalizing the activations within each batch.

    self.relu = nn.ReLU(inplace=True)
##Purpose:
This line creates a ReLU activation function (nn.ReLU).

The inplace=True argument means that the activation is applied directly to the input tensor, saving memory.

##Reasoning:
ReLU introduces non-linearity into the network, which is essential for learning complex patterns.

    self.layer1 = self.make_layer(block, 16, layers[0])
##Purpose:
This line creates the first layer of residual blocks using the make_layer method. It passes the block type (ResidualBlock), the desired output channels (16), and the number of blocks in the layer (specified by layers[0]).

##Reasoning:
This layer further extracts features and introduces skip connections for deeper learning.

    self.layer2 = self.make_layer(block, 32, layers[1], 2)
##Purpose:
Similar to layer1, this line creates the second layer of residual blocks with 32 output channels and a stride of 2.

##Reasoning:
Increasing the output channels and stride allows the network to learn more complex features at different scales.

    self.layer3 = self.make_layer(block, 64, layers[2], 2)
##Purpose:
This line creates the third layer of residual blocks with 64 output channels and a stride of 2.

##Reasoning:
This layer further increases the feature complexity and prepares the output for the final classification.

    self.avg_pool = nn.AvgPool2d(8)
##Purpose:
This line creates an average pooling layer (nn.AvgPool2d) with a kernel size of 8.

##Reasoning:
Average pooling reduces the spatial dimensions of the feature maps, preparing them for the fully connected layer.

    self.fc = nn.Linear(64, num_classes)
##Purpose:
This line creates the fully connected layer (nn.Linear) that maps the final feature representation to the desired number of output classes.

##Reasoning:
The fully connected layer performs the final classification based on the learned features.
----------------
#In essence,
the ResNet class assembles the different components (convolutional layers, residual blocks, pooling, and fully connected layers) to form the complete ResNet architecture.

The forward method defines how data flows through the network during inference or training.

# Best Practice Class Defintion I

In [None]:
class ResNet(nn.Module):
       def __init__(self, block, layers, num_classes=10, initial_channels=16):
           super(ResNet, self).__init__()
           self.in_channels = initial_channels  # Using the parameter
           self.conv = conv3x3(3, self.in_channels)  # Referencing the attribute
           # ... rest of the code ...

   # Creating a ResNet with 32 initial channels:
   model = ResNet(ResidualBlock, [2, 2, 2], initial_channels=32)

# Best Practice Class Definition II

In [None]:
class ResNet(nn.Module):
       def __init__(self, block, layers, num_classes=10, input_shape=(3, 32, 32)):
           super(ResNet, self).__init__()
           self.in_channels = input_shape[0] * 2  # Example calculation
           # ... rest of the code ...

In [None]:
model = ResNet(ResidualBlock, [2, 2, 2]).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# 1. model = ResNet(ResidualBlock, [2, 2, 2]).to(device)

##Purpose:
This line creates an instance of your ResNet model, transfers it to the specified device (GPU if available, otherwise CPU), and assigns it to the variable model.


##ResNet(ResidualBlock, [2, 2, 2]):
This part instantiates your ResNet model.

 You're using the ResidualBlock as the building block for your ResNet, and the list [2, 2, 2] specifies the number of residual blocks in each of the three layers of your ResNet architecture.

##Meaning of "2 Residual Blocks"

In your case, "2 residual blocks" in a layer simply means that the layer contains two consecutive instances of the ResidualBlock class.

Think of it like this:

    Layer 1:
       ResidualBlock 1
       ResidualBlock 2

So, when the input data flows through Layer 1, it will pass through these two ResidualBlocks in sequence.

Each ResidualBlock will perform its internal operations (convolution, batch normalization, ReLU, and skip connection) before passing the output to the next ResidualBlock within the layer.

Visualizing it

You can imagine the structure like this:

    Layer 1:
       Input --> [ResidualBlock 1] --> [ResidualBlock 2] --> Output of Layer 1

The same principle applies to Layer 2 and Layer 3, which also have 2 residual blocks each.

Importance of the Number of Blocks

The number of residual blocks in each layer influences the depth and complexity of the model.

 More blocks mean a deeper network capable of learning more intricate patterns. However, increasing the number of blocks also increases the computational cost.

Therefore, the choice of the number of residual blocks is a design decision based on the trade-off between model capacity and computational resources.

##Balancing Depth and Complexity

The choice of the number of residual blocks in each layer of a ResNet is often a design decision based on balancing model depth, complexity, and computational cost.

Here's why 2 residual blocks might have been chosen in this case:

##Sufficient Depth:
Using 2 residual blocks per layer provides a reasonable level of depth to allow the network to learn hierarchical features effectively.

Deeper networks can capture more complex patterns in the data, but they can also be more difficult to train and require more resources.

##Manageable Complexity:
With 2 residual blocks per layer, the overall model complexity is not overly high.

This makes the model easier to train and reduces the risk of overfitting, especially when dealing with limited data or computational resources.

##Computational Efficiency:
Keeping the number of blocks relatively low (like 2) helps maintain a balance between model performance and computational efficiency.

Using more blocks would increase the computational cost and training time.
Experimentation and Architectural Variations

In practice, the number of residual blocks in each layer can be adjusted based on the specific task, dataset size, and available resources.

Researchers often experiment with different architectures to find the optimal configuration.

For example, some ResNet variants use more blocks in later layers to progressively increase the network's depth and feature extraction capabilities.

Others might employ a bottleneck architecture with a narrower middle layer within each residual block to further improve efficiency.

In the case of your architecture, using 2 residual blocks per layer appears to be a reasonable starting point that provides a good balance between depth, complexity, and computational considerations.

It allows the model to learn effective features without becoming too unwieldy.

Ultimately, the best choice for the number of residual blocks depends on the specific context and desired performance characteristics.

Experimentation and empirical evaluation are often key to finding the most effective architecture.

In summary, "2 residual blocks" in a ResNet layer indicates that the layer contains two consecutive ResidualBlock units, contributing to the overall depth and complexity of the network.

##.to(device):
This is a crucial step.

It moves your model to the device you selected earlier (using torch.device).

By doing this, you enable GPU acceleration if a CUDA-enabled GPU is available, which can significantly speed up the training process.

##model = ...:
Finally, the created and device-placed model is assigned to the variable model, which you'll use to interact with your network during training and inference.

-----------------------
#2. criterion = nn.CrossEntropyLoss()

##Purpose:
This line defines the loss function you'll use to train your model.

    nn.CrossEntropyLoss

is a common choice for multi-class classification problems.

##Reasoning:
The loss function measures the difference between your model's predictions and the actual target labels.

It guides the optimization process, helping your model learn to make better predictions by minimizing this difference.

--------------------------
#3. optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

##Purpose:
This line creates an optimizer, which is responsible for updating the model's parameters during training to minimize the loss. You're using the Adam optimizer here.

##Reasoning:
    torch.optim.Adam:

Adam is a popular optimization algorithm known for its efficiency and effectiveness in many deep learning tasks.

    model.parameters():
This tells the optimizer which parameters of your model it should update during training.

##lr=learning_rate:
This sets the learning rate, a hyperparameter that controls the step size the optimizer takes when updating the model's parameters.

You defined learning_rate earlier in your code.

In [None]:
decay = 0
model.train()
for epoch in range(num_epochs):

    # Decay the learning rate every 20 epochs
    if (epoch+1) % 20 == 0:
        decay+=1
        optimizer.param_groups[0]['lr'] = learning_rate * (0.5**decay)
        print("The new learning rate is {}".format(optimizer.param_groups[0]['lr']))

    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print ("Epoch [{}/{}], Step [{}/{}] Loss: {:.4f}"
                   .format(epoch+1, num_epochs, i+1, len(train_loader), loss.item()))

# 1. decay = 0
##Purpose:
Initializes a variable decay to 0. This variable will be used to control the learning rate decay.

##Reasoning:
Learning rate decay is a common technique to improve training by gradually reducing the learning rate as training progresses.

This helps the model converge to a better solution.

------------------
##2. model.train()

##Purpose:
Sets the model to training mode.

##Reasoning:
This is important because some layers, like dropout and batch normalization, behave differently during training and evaluation.

Setting the model to training mode ensures that these layers are activated correctly.

------------------
##3. for epoch in range(num_epochs):

##Purpose:
This outer loop iterates over the specified number of epochs (num_epochs).

##Reasoning:
An epoch represents one complete pass through the entire training dataset.

Training for multiple epochs allows the model to learn from the data iteratively.

----------
#4. if (epoch+1) % 20 == 0:

##Purpose:
This conditional statement checks if the current epoch number (epoch+1) is divisible by 20.

##Reasoning:
It's used to implement learning rate decay every 20 epochs.

-----------------------
#5. decay+=1

##Purpose:
Increments the decay variable by 1.

##Reasoning:
This is used to progressively reduce the learning rate as training continues.

-----------------------
#6. `optimizer.param_groups[0]['lr'] = learning_rate * (0.5decay)`**

##Purpose:
Updates the learning rate of the optimizer.

##Reasoning:
It multiplies the initial learning rate (learning_rate) by 0.5 raised to the power of decay.

This effectively halves the learning rate every 20 epochs.

---------------
#7. print("The new learning rate is {}".format(optimizer.param_groups[0]['lr']))

##Purpose:
Prints the new learning rate to the console.

##Reasoning:
This helps you monitor the learning rate decay during training.

-----------------------
#8. for i, (images, labels) in enumerate(train_loader):

##Purpose:
This inner loop iterates over the training data in batches.

##Reasoning:
train_loader is a data loader that provides batches of images and their corresponding labels.

--------------------
#9.  images = images.to(device)
#10. labels = labels.to(device)

##Purpose:
Moves the images and labels to the specified device (GPU if available, otherwise CPU).

##Reasoning:
This ensures that the data is processed on the correct device for faster computation.

-------------------
#11. outputs = model(images)

##Purpose:
Passes the images through the model to obtain predictions.

##Reasoning:
This is the forward pass of the model, where it computes the output based on the input images.

--------------------
#12. loss = criterion(outputs, labels)

##Purpose:
Calculates the loss between the model's predictions (outputs) and the actual labels (labels) using the specified loss function (criterion).

##Reasoning:
The loss function quantifies the error made by the model, and the goal of training is to minimize this loss.

---------------------
#13. optimizer.zero_grad()

##Purpose:
Resets the gradients of the model's parameters to zero.

##Reasoning:
This is necessary before computing the gradients for the current batch to avoid accumulating gradients from previous batches.

-----------------------
#14. loss.backward()

##Purpose:
Computes the gradients of the loss with respect to the model's parameters.

##Reasoning:
This is the backpropagation step, where the gradients are calculated to determine how to update the parameters to reduce the loss.

-----------------------
#15. optimizer.step()

##Purpose:
Updates the model's parameters based on the computed gradients.

##Reasoning:
The optimizer uses the gradients to adjust the parameters in a way that minimizes the loss.

-------------------------
#16. if (i+1) % 100 == 0:

##Purpose:
This conditional statement checks if the current step number (i+1) is divisible by 100.

##Reasoning:
It's used to print the training progress every 100 steps.

--------------------
#17. print ("Epoch [{}/{}], Step [{}/{}] Loss: {:.4f}" .format(epoch+1, num_epochs, i+1, len(train_loader), loss.item()))

##Purpose:
Prints the current epoch, step, and loss to the console.

##Reasoning:
This provides feedback on the training progress and helps you monitor the loss over time.

In [None]:
#Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the model on the test images: {} %'.format(100 * correct / total))

#1. model.eval()

##Purpose:
Sets the model to evaluation mode.

##Reasoning:
This is important because some layers, like dropout and batch normalization, behave differently during training and evaluation.

Setting the model to evaluation mode ensures that these layers are deactivated or used in their inference mode.

----------------------
#2. with torch.no_grad():

##Purpose:
Disables gradient computation.

##Reasoning:
During evaluation, we don't need to compute gradients since we're not updating the model's parameters.

This saves memory and computation time.

------------------------
#3. correct = 0 4. total = 0

##Purpose:
Initializes variables to store the number of correct predictions and the total number of samples, respectively.

--------------------
#5. for images, labels in test_loader:

##Purpose:
Iterates over the test dataset in batches.

##Reasoning:
test_loader is a data loader that provides batches of images and their corresponding labels from the test dataset.

-------------------
#6. images = images.to(device)
#7. labels = labels.to(device)

##Purpose:
Moves the images and labels to the specified device (GPU if available, otherwise CPU).

##Reasoning:
This ensures that the data is processed on the correct device for faster computation.

--------------------------
#8. outputs = model(images)

##Purpose:
Passes the images through the model to obtain predictions.

##Reasoning:
This is the forward pass of the model, where it computes the output based on the input images.

--------------------
#9. _, predicted = torch.max(outputs.data, 1)

##Purpose:
Gets the predicted class labels.

##Reasoning:
torch.max finds the index of the maximum value in each row of the outputs.data tensor (which represents the predicted probabilities for each class).

The _ is used to discard the actual maximum values, and predicted stores the indices, which correspond to the predicted class labels.

------------------
#10. total += labels.size(0)

##Purpose:
Updates the total number of samples processed.

##Reasoning:
labels.size(0) gives the number of samples in the current batch, which is added to the total count.

--------------------------
#11. correct += (predicted == labels).sum().item()

##Purpose:
Updates the number of correct predictions.

##Reasoning:
(predicted == labels) creates a tensor of Boolean values indicating whether each prediction is correct.

sum().item() calculates the total number of correct predictions in the batch and adds it to the correct count.

------------------------------
#12. print('Accuracy of the model on the test images: {} %'.format(100 * correct / total))

##Purpose:
Prints the accuracy of the model on the test dataset.

#$#Reasoning:

It calculates the accuracy as the percentage of correct predictions out of the total number of samples and displays it to the user.