**Plan**

**1. Understanding convolutional layers**
    
**2. Pooling layers**
    
**3. Building a simple CNN**
    
**4. Transfer learning with pre-trained models**



# **Understanding convolutional layers**

Understanding convolutional layers in PyTorch involves grasping both the theoretical concepts behind convolutional neural networks (CNNs) and how to implement these layers using PyTorch's **torch.nn** module. Here's a breakdown of the key concepts and a simple example to illustrate their implementation.

**Convolutional Layer Parameters**

1. **Stride**:
   - Controls the step size of the filter when it moves across the input image.
   - Can be a single integer or a tuple of two integers (height and width stride).
   - Example: `stride=2` or `stride=(2, 2)`.

2. **Padding**:
   - Adds a border of zeros around the input image to control the spatial dimensions of the output.
   - Can be an integer, a tuple, or a string (`'valid'` or `'same'`).
   - `'valid'` means no padding, and `'same'` means padding is added so that the output dimensions are the same as the input dimensions.
   - Example: `padding=1`, `padding=(1, 1)`, `padding='same'`.

3. **Dilation**:
   - Controls the spacing between kernel points.
   - Can be a single integer or a tuple of two integers.
   - Increases the receptive field of the convolutional kernel.
   - Example: `dilation=2` or `dilation=(2, 2)`.

4. **Groups**:
   - Controls the connections between inputs and outputs.
   - `groups=1` means all inputs are convolved to all outputs.
   - `groups=2` means the input channels are split into two groups, each convolved separately.
   - `groups=in_channels` means a depthwise convolution where each input channel is convolved with its own set of filters.
   - Example: `groups=1`, `groups=2`, `groups=in_channels`.

**Output Shape Formula**

The output shape of a convolutional layer can be calculated using the following formula:

For each dimension $ i $ (height or width):
$$ \text{Output size}_i = \left\lfloor \frac{\text{Input size}_i + 2 \times \text{Padding}_i - \text{Dilation}_i \times (\text{Kernel size}_i - 1) - 1}{\text{Stride}_i} + 1 \right\rfloor $$


**Example 1: Basic Convolution**

In [1]:
import torch
import torch.nn as nn

# Basic convolution with 1 input channel, 6 output channels, and 3x3 kernel
conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)
input_tensor1 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor1 = conv1(input_tensor1)
print(f"Example 1 Output Shape: {output_tensor1.shape}")

Example 1 Output Shape: torch.Size([1, 6, 26, 26])


Output shape formula for Example 1:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 0 - 1 \times (3 - 1) - 1}{1} + 1 \right\rfloor = 26 $$
So the output shape is `(1, 6, 26, 26)`.


**Example 2: Convolution with Stride**

In [2]:
# Convolution with 1 input channel, 6 output channels, 3x3 kernel, and stride of 2
conv2 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, stride=2)
input_tensor2 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor2 = conv2(input_tensor2)
print(f"Example 2 Output Shape: {output_tensor2.shape}")

Example 2 Output Shape: torch.Size([1, 6, 13, 13])


Output shape formula for Example 2:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 0 - 1 \times (3 - 1) - 1}{2} + 1 \right\rfloor = 13 $$
So the output shape is `(1, 6, 13, 13)`.


**Example 3: Convolution with Padding**

In [3]:
# Convolution with 1 input channel, 6 output channels, 3x3 kernel, and padding of 1
conv3 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, padding=1)
input_tensor3 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor3 = conv3(input_tensor3)
print(f"Example 3 Output Shape: {output_tensor3.shape}")

Example 3 Output Shape: torch.Size([1, 6, 28, 28])


Output shape formula for Example 3:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 1 - 1 \times (3 - 1) - 1}{1} + 1 \right\rfloor = 28 $$
So the output shape is `(1, 6, 28, 28)`.



**Example 4: Convolution with Dilation**

In [6]:
# Convolution with 1 input channel, 6 output channels, 3x3 kernel, and dilation of 2
conv4 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, dilation=2)
input_tensor4 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor4 = conv4(input_tensor4)
print(f"Example 4 Output Shape: {output_tensor4.shape}")

Example 4 Output Shape: torch.Size([1, 6, 24, 24])


Output shape formula for Example 4:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 0 - 2 \times (3 - 1) - 1}{1} + 1 \right\rfloor = 24 $$
So the output shape is `(1, 6, 24, 24)`.



**Example 5: Convolution with Different Kernel Sizes**

In [7]:
# Convolution with 1 input channel, 6 output channels, 5x5 kernel
conv5 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
input_tensor5 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor5 = conv5(input_tensor5)
print(f"Example 5 Output Shape: {output_tensor5.shape}")

Example 5 Output Shape: torch.Size([1, 6, 24, 24])


Output shape formula for Example 5:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 0 - 1 \times (5 - 1) - 1}{1} + 1 \right\rfloor = 24 $$
So the output shape is `(1, 6, 24, 24)`.

**Example 6: Convolution with Multiple Input Channels**

In [8]:
# Convolution with 3 input channels (e.g., RGB image), 6 output channels, and 3x3 kernel
conv6 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3)
input_tensor6 = torch.randn(1, 3, 28, 28)  # Batch size of 1, 3 channels (RGB), 28x28 image
output_tensor6 = conv6(input_tensor6)
print(f"Example 6 Output Shape: {output_tensor6.shape}")

Example 6 Output Shape: torch.Size([1, 6, 26, 26])


Output shape formula for Example 6:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 0 - 1 \times (3 - 1) - 1}{1} + 1 \right\rfloor = 26 $$
So the output shape is `(1, 6, 26, 26)`.


**Example 7: Convolution with Grouping**

In [9]:
# Convolution with 4 input channels, 8 output channels, 3x3 kernel, and groups of 2
conv7 = nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, groups=2)
input_tensor7 = torch.randn(1, 4, 28, 28)  # Batch size of 1, 4 channels, 28x28 image
output_tensor7 = conv7(input_tensor7)
print(f"Example 7 Output Shape: {output_tensor7.shape}")

Example 7 Output Shape: torch.Size([1, 8, 26, 26])


Output shape formula for Example 7:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 0 - 1 \times (3 - 1) - 1}{1} + 1 \right\rfloor = 26 $$
So the output shape is `(1, 8, 26, 26)`.



**Example 8: Convolution with Different Stride and Padding**

In [11]:
# Convolution with 1 input channel, 6 output channels, 3x3 kernel, stride of 2, and padding of 1
conv8 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3, stride=2, padding=1)
input_tensor8 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor8 = conv8(input_tensor8)
print(f"Example 8 Output Shape: {output_tensor8.shape}")


Example 8 Output Shape: torch.Size([1, 6, 14, 14])


Output shape formula for Example 8:
$$ \text{Output size} = \left\lfloor \frac{28 + 2 \times 1 - 1 \times (3 - 1) - 1}{2} + 1 \right\rfloor = 14 $$
So the output shape is `(1, 6, 14, 14)`.


# **Pooling layers**

Pooling layers in PyTorch are used to reduce the spatial dimensions of the input tensor, effectively downsampling the input. This is done to reduce the computational load, control overfitting, and allow the network to learn spatial hierarchies. The most common types of pooling layers are Max Pooling and Average Pooling. Below, we'll cover these types and provide examples with varying parameters.

**Max Pooling** and **Average Pooling**

Max Pooling takes the maximum value from each patch of the feature map.

Average Pooling calculates the average value from each patch of the feature map.

**Parameters:**
- **kernel_size**: The size of the window to take a max over.
- **stride**: The stride of the window. Default value is kernel_size.
- **padding**: Implicit zero padding to be added on both sides.
- **dillation:**

For each dimension $ i $ (height or width): by default stride = 2
$$ \text{Output size}_i = \left\lfloor \frac{\text{Input size}_i + 2 \times \text{Padding}_i - \text{Dilation}_i \times (\text{Kernel size}_i - 1) - 1}{\text{Stride}_i} + 1 \right\rfloor $$

**<h2>Max Pooling</h2>**

**Example 1: Basic Max Pooling**

In [12]:
import torch
import torch.nn as nn

# Basic max pooling with 2x2 kernel
max_pool1 = nn.MaxPool2d(kernel_size=2)
input_tensor1 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor1 = max_pool1(input_tensor1)
print(f"Example 1 Output Shape: {output_tensor1.shape}")

Example 1 Output Shape: torch.Size([1, 1, 14, 14])


**Example 2: Max Pooling with Stride**

In [13]:
# Max pooling with 2x2 kernel and stride of 2
max_pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
input_tensor2 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor2 = max_pool2(input_tensor2)
print(f"Example 2 Output Shape: {output_tensor2.shape}")

Example 2 Output Shape: torch.Size([1, 1, 14, 14])


**Example 3: Max Pooling with Padding**

In [14]:
# Max pooling with 3x3 kernel, stride of 2, and padding of 1
max_pool3 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
input_tensor3 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor3 = max_pool3(input_tensor3)
print(f"Example 3 Output Shape: {output_tensor3.shape}")


Example 3 Output Shape: torch.Size([1, 1, 14, 14])


**<h2>Average Pooling</h2>**

**Example 4: Basic Average Pooling**

In [15]:
# Basic average pooling with 2x2 kernel
avg_pool1 = nn.AvgPool2d(kernel_size=2)
input_tensor4 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor4 = avg_pool1(input_tensor4)
print(f"Example 4 Output Shape: {output_tensor4.shape}")

Example 4 Output Shape: torch.Size([1, 1, 14, 14])


**Example 5: Average Pooling with Stride**

In [16]:
# Average pooling with 2x2 kernel and stride of 2
avg_pool2 = nn.AvgPool2d(kernel_size=2, stride=2)
input_tensor5 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor5 = avg_pool2(input_tensor5)
print(f"Example 5 Output Shape: {output_tensor5.shape}")

Example 5 Output Shape: torch.Size([1, 1, 14, 14])


**Example 6: Average Pooling with Padding**

In [17]:
# Average pooling with 3x3 kernel, stride of 2, and padding of 1
avg_pool3 = nn.AvgPool2d(kernel_size=3, stride=2, padding=1)
input_tensor6 = torch.randn(1, 1, 28, 28)  # Batch size of 1, 1 channel, 28x28 image
output_tensor6 = avg_pool3(input_tensor6)
print(f"Example 6 Output Shape: {output_tensor6.shape}")

Example 6 Output Shape: torch.Size([1, 1, 14, 14])


# **Building a simple CNN**

In [33]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

from tqdm import tqdm

In [49]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()

        # Define the first convolutional layer
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)

        # Define a fully connected layer
        self.fc1 = nn.Linear(in_features=32 * 7 * 7, out_features=128)
        self.fc2 = nn.Linear(in_features=128, out_features=10)

    def forward(self, x):
        # Apply the first convolutional layer, followed by ReLU activation and max pooling
        x = self.pool(torch.relu(self.conv1(x)))

        # Apply the second convolutional layer, followed by ReLU activation and max pooling
        x = self.pool(torch.relu(self.conv2(x)))

        # Flatten the tensor into a vector
        x = x.view(-1, 32 * 7 * 7)

        # Apply the first fully connected layer with ReLU activation
        x = torch.relu(self.fc1(x))

        # Apply the second fully connected layer (output layer)
        x = self.fc2(x)

        return x

In [None]:
# Define transformations for the training and test sets
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # Normalize the images
])

# Download and load the training and test datasets
train_set = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)

test_set = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_set, batch_size=64, shuffle=False)


In [50]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleCNN().to(device)

In [53]:
model.fc1.parameters

In [23]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
# Function to train the model
def train(model, train_loader, criterion, optimizer, epochs=5):
    for epoch in range(epochs):
        running_loss = 0.0
        for images, labels in tqdm(train_loader, total=len(train_loader)):
            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward pass
            outputs = model(images.to(device))
            loss = criterion(outputs, labels)

            # Backward pass and optimization
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

# Train the model for 5 epochs
train(model, train_loader, criterion, optimizer, epochs=5)


In [40]:
# Function to evaluate the model
def evaluate(model, test_loader):
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in tqdm(test_loader, total=len(test_loader)):
            outputs = model(images.to(device))
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f'Accuracy: {100 * correct / total}%')

# Evaluate the model on the test set
evaluate(model, test_loader)


100%|██████████| 157/157 [00:05<00:00, 29.91it/s]

Accuracy: 97.97%





# **Transfer learning with pre-trained models**

Transfer learning is a powerful technique in deep learning where a model developed for one task is reused as the starting point for a model on a second task. Using pre-trained models can significantly speed up the training process and improve performance, especially when dealing with limited data. PyTorch provides several pre-trained models through the torchvision.models module.

Here is a step-by-step guide to implementing transfer learning using a pre-trained model, such as ResNet, to classify images from a different dataset, such as CIFAR-10.

In [41]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torchvision import models
from torch.utils.data import DataLoader


In [42]:
# Define transformations for the training and test sets
transform = transforms.Compose([
    transforms.Resize(224),  # Resize images to 224x224 as expected by ResNet
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize the images
])

# Download and load the training and test datasets
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)

test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_set, batch_size=64, shuffle=False)


In [None]:
# Load the pre-trained ResNet18 model
model = models.resnet18(pretrained=True)

# Freeze the parameters of the pre-trained layers
for param in model.parameters():
    param.requires_grad = False

# Modify the final fully connected layer to match the number of output classes (CIFAR-10 has 10 classes)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 10)  # CIFAR-10 has 10 output classes

In [48]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)  # Only train the final layer

In [None]:
# Function to train the model
def train(model, train_loader, criterion, optimizer, epochs=5):
    model.train()  # Set the model to training mode
    for epoch in range(epochs):
        running_loss = 0.0
        for images, labels in train_loader:
            optimizer.zero_grad()  # Zero the parameter gradients

            outputs = model(images)  # Forward pass
            loss = criterion(outputs, labels)  # Compute the loss

            loss.backward()  # Backward pass
            optimizer.step()  # Optimize the model

            running_loss += loss.item()

        print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}')

# Train the model for 5 epochs
train(model, train_loader, criterion, optimizer, epochs=5)

In [None]:
# Function to evaluate the model
def evaluate(model, test_loader):
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():  # Disable gradient calculation
        for images, labels in test_loader:
            outputs = model(images)  # Forward pass
            _, predicted = torch.max(outputs.data, 1)  # Get the class with the highest probability
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f'Accuracy: {100 * correct / total}%')

# Evaluate the model on the test set
evaluate(model, test_loader)