* torch: Core PyTorch library, which provides tensor operations.
* torch.nn: Contains neural network layers like Conv2d, Linear, etc.
* torch.nn.functional: Provides functions for operations like activation functions (ReLU), pooling, etc.
* torch.optim: Contains optimization algorithms such as SGD (Stochastic Gradient Descent).
* torchvision: Useful for image datasets and transformations. Here, we use it to load the MNIST dataset and apply transformations like normalization.


1. SimpleCNN(nn.Module): We define a class SimpleCNN, which inherits from nn.Module (the base class for all neural networks in PyTorch). This will allow us to build a custom architecture.
2. self.conv1 = nn.Conv2d(1, 32, kernel_size=3):

    This is a convolutional layer that takes a grayscale image (with 1 channel), applies 32 filters, each of size 3x3.
    stride=1: The filter moves 1 pixel at a time.
    padding=1: Adds a 1-pixel border around the image to ensure the output has the same spatial dimensions as the input.

3. self.conv2 = nn.Conv2d(32, 64, kernel_size=3):

    A second convolutional layer that takes the 32 feature maps from the first layer and applies 64 filters of size 3x3.

4. self.pool = nn.MaxPool2d(2, 2):

    A max-pooling layer with a 2x2 window and stride 2. This reduces the spatial dimensions of the feature map by a factor of 2, which helps in downsampling the data and reducing computation.

5. self.fc1 = nn.Linear(64 * 7 * 7, 128):

    A fully connected layer that takes the flattened feature maps from the previous layer. Since the feature maps from the last convolution have a size of 64x7x7, the input to the fully connected layer is 6477.

6. self.fc2 = nn.Linear(128, 10):

    Another fully connected layer that maps the 128 features from the previous layer to 10 output units (corresponding to 10 classes, e.g., digits 0–9 for MNIST).

x = self.pool(F.relu(self.conv1(x))):

    First, we apply the first convolutional layer conv1, followed by the ReLU activation function (F.relu), and then we downsample using max-pooling (self.pool).

x = x.view(-1, 64 * 7 * 7):

    After the convolutional and pooling layers, the data is still in 2D format. This line flattens the 2D feature maps into a 1D vector so they can be passed to the fully connected layers.

x = F.relu(self.fc1(x)):

    The flattened vector is passed through the first fully connected layer fc1, followed by the ReLU activation.

x = self.fc2(x):

    The result from fc1 is passed to the final fully connected layer fc2, which outputs 10 class scores (logits), one for each digit (0-9).

batch_size: The number of training samples to work through before updating the model's weights.
learning_rate: Controls how much to adjust the model's weights during each optimization step.
epochs: The number of complete passes through the training dataset.

transforms.Compose: Applies transformations to the images. Here:

    ToTensor(): Converts the images to tensors.
    Normalize((0.5,), (0.5,)): Normalizes the images by subtracting the mean (0.5) and dividing by the standard deviation (0.5), so pixel values fall within [-1, 1].

datasets.MNIST: Loads the MNIST dataset. It downloads the dataset if not already present.

torch.utils.data.DataLoader: Loads the data in batches (size of batch_size), and the shuffle=True ensures the data is randomly shuffled at each epoch.


model = SimpleCNN(): Creates an instance of our SimpleCNN class.
criterion = nn.CrossEntropyLoss(): We use the cross-entropy loss function, suitable for multi-class classification problems.
optimizer = optim.SGD(model.parameters(), lr=learning_rate): Stochastic Gradient Descent is used to update the weights of the model parameters with the learning rate of 0.01.

for epoch in range(epochs): Loops over the dataset multiple times.
optimizer.zero_grad(): Clears the gradients from the previous iteration.
outputs = model(images): Performs a forward pass through the network to get predictions for the current batch of images.
loss = criterion(outputs, labels): Calculates the loss between the predicted outputs and the true labels.
loss.backward(): Performs backpropagation to calculate the gradients for each parameter in the network.
optimizer.step(): Updates the network weights using the computed gradients.

After each epoch, we print the average loss to monitor the training process.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

# Define the CNN architecture
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        # Convolutional layer 1: in_channels=1 (grayscale), out_channels=32, kernel_size=3x3
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        # Convolutional layer 2: in_channels=32, out_channels=64, kernel_size=3x3
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        # Max-pooling layer 2x2
        self.pool = nn.MaxPool2d(2, 2)
        # Fully connected layer 1: 64*7*7 input to 128 output units
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        # Fully connected layer 2: 128 input to 10 output units (for 10 classes)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        # Apply convolutional layers followed by ReLU and max-pooling
        x = self.pool(F.relu(self.conv1(x)))  # Conv1 -> ReLU -> MaxPool
        x = self.pool(F.relu(self.conv2(x)))  # Conv2 -> ReLU -> MaxPool
        # Flatten the feature map from 2D to 1D (batch_size, channels * height * width)
        x = x.view(-1, 64 * 7 * 7)
        # Apply fully connected layers with ReLU activation
        x = F.relu(self.fc1(x))  # Fully connected layer 1
        x = self.fc2(x)          # Fully connected layer 2 (output layer)
        return x

# Training settings
batch_size = 64
learning_rate = 0.01
epochs = 10

# Load dataset (MNIST)
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

# Instantiate the network, loss function, and optimizer
model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

# Training loop
for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_loader:
        # Zero the parameter gradients
        optimizer.zero_grad()
        # Forward pass
        outputs = model(images)
        # Loss calculation
        loss = criterion(outputs, labels)
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader)}")

print("Finished Training")


Key Concepts and Notes
1. What is a CNN?

    A Convolutional Neural Network (CNN) is a specialized kind of neural network for processing data with a grid-like topology, such as images.
    CNNs are excellent for feature extraction using convolutions, which apply filters to identify edges, textures, and other spatial patterns in images.

2. Layers in a CNN

    Convolutional Layer (nn.Conv2d): This layer performs the convolution operation, sliding a kernel/filter over the input image to detect features. In this case:
        conv1: 32 filters (feature maps) of size 3x3 applied to a grayscale image (1 channel).
        conv2: 64 filters of size 3x3 applied to the 32 feature maps from conv1.
    ReLU Activation Function: F.relu is used to introduce non-linearity into the network. It replaces all negative pixel values with 0, keeping the positive values unchanged.
    Max-Pooling (nn.MaxPool2d): A pooling layer reduces the dimensionality of the feature maps. It downsamples the input by taking the maximum value from a 2x2 region, which helps reduce computational complexity and overfitting.
    Fully Connected Layers (nn.Linear): These layers are used at the end of the CNN to make the final predictions. They take the flattened feature maps from the convolutional layers and output a prediction for each class.

3. Forward Pass

    The input passes through two convolutional layers (conv1 and conv2), with ReLU activations and max-pooling applied after each.
    After the convolutional and pooling layers, the output is flattened (reshaped) into a 1D vector before passing through two fully connected layers to produce the final class scores.

4. Loss Function

    CrossEntropyLoss is commonly used for classification problems. It combines LogSoftmax and NLLLoss in one function to provide loss based on the predicted output and true label.

5. Optimizer

    Stochastic Gradient Descent (SGD): This optimizer updates the model's weights to minimize the loss function. You can adjust the learning rate and momentum to control the speed of convergence.

6. Training Process

    The training loop iterates through the dataset for a set number of epochs, computes the loss, and updates the network weights using backpropagation (loss.backward()) and the optimizer (optimizer.step()).
    After each epoch, the average loss is printed to monitor the training progress.



Important Interview Concepts

    Convolutional Filters:
        Filters or kernels slide over the input to detect patterns.
        Kernel size, stride, and padding influence the output size.
        Learnable parameters: CNN filters are updated during training.

    Pooling Layers:
        Reduce the spatial size of the representation, downsampling by taking the maximum or average values.
        Helps to make the model invariant to small translations in the input.

    Activation Functions (ReLU):
        Adds non-linearity to the network, which is essential for learning complex patterns.

    Overfitting and Regularization:
        CNNs are prone to overfitting due to their large number of parameters.
        Techniques like dropout, data augmentation, or early stopping can mitigate overfitting.

    Backpropagation and Gradient Descent:
        During training, CNN weights are updated using gradient descent. The gradients of the loss with respect to each parameter are computed using backpropagation.

    Applications of CNNs:
        Image classification, object detection, face recognition, medical image analysis, etc.