<a href="https://colab.research.google.com/github/Arshiya-Begum30/FMML_Poject_and_labs/blob/main/Effect_of_padding%2C_kernel_size_and_stride_Pooling_Transfer_learning_and_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **1. Effect of padding, kernel size and stride**

# Questions:

1) Does increasing stride increase output image size?

**Answer:**

 No, increasing the stride typically reduces the output image size.

In convolutional neural networks (CNNs), the stride refers to the number of pixels the filter moves between each application to the input image. When we increase the stride, the filter moves over the input image in larger steps, which leads to a reduction in the size of the output feature maps.

Conversely, decreasing the stride would result in smaller steps, potentially leading to larger output feature maps depending on the padding and size of the input.

2) Does increasing padding increase output image size?

**Answer:**

 Yes, increasing padding can increase the output image size, particularly when using convolutional layers in neural networks.

Padding involves adding extra pixels (usually with zero values) around the input image before applying convolution. This extra border of pixels helps to retain information at the edges of the input, preventing a reduction in spatial dimensions. When we increase the padding, we add more extra pixels around the input, which can result in larger output feature maps.

# **2. Pooling**

# Questions:

1) Can you think of any other pooling other than max and avg?

**Answer:**

Yes, besides max pooling and average pooling, there are other types of pooling operations used in convolutional neural networks (CNNs). Some examples include:

**1.** **Global Average Pooling (GAP):** This pooling operation computes the average value of each feature map across the entire spatial dimensions. It reduces each feature map to a single value, which is then used as a feature for classification.

**2. Global Max Pooling (GMP):** Similar to global average pooling, global max pooling computes the maximum value of each feature map across all spatial dimensions.

**3. Fractional Pooling:** This type of pooling allows for non-integer pool sizes and strides. It can be useful when we want to downsample an input by a fractional factor.

**4. Min Pooling:** Similar to Max Pooling, but instead of taking the maximum value, it takes the minimum value from each local region.

**5. Stochastic Pooling:** In this method, instead of taking the maximum or average value from a region, it randomly selects one value based on a probability distribution.

# **3. Fine-tuning and transfer learning**

# Exercises:

Q1: Why do you think the network did not achieve good test accuracy in the feature extraction approach?

**Answer:**

There could be several reasons why a neural network did not achieve good test accuracy in a feature extraction approach. Here are some potential factors to consider:

**1. Learning Rate:** In the feature extraction approach, we used a relatively high learning rate (0.01) for the optimizer that updates only the parameters of the newly added fully connected layer. This might be too high for the specific task or cause instability in fine-tuning. Experimenting with different learning rates might help.

**2. Limited Capacity:** The feature extraction approach might not have sufficient model capacity to capture the complexity of the underlying patterns in the data. Increasing the depth or width of the network, or trying more advanced architectures, could be beneficial.

**3. Number of Epochs:** The feature extraction training was conducted for only 5 epochs. For transfer learning, especially when fine-tuning a pre-trained model, it's often necessary to train for more epochs. We might want to consider increasing the number of epochs and monitoring the validation performance.

**4. Fine-tuning More Layers:** In some cases, especially when the target task is significantly different from the source task, fine-tuning more layers of the pre-trained model may be beneficial. We could try unfreezing and fine-tuning more layers to capture task-specific features.

Q2: Can you think of a scenario where the feature extraction approach would be preferred compared to fine tuning approach?

**Answer:**

This can be suitable in scenarios where:

**1. Limited Data:** If we have a relatively small dataset for your specific task (classification of German traffic signs), fine-tuning the entire pre-trained ResNet18 model might lead to overfitting due to the risk of learning task-specific features from a limited amount of data. Feature extraction allows us to leverage the pre-trained features and introduce minimal task-specific learning.

**2. Task Similarity:** The pre-trained ResNet18 model is likely to have learned useful hierarchical features that are transferrable to tasks with similar visual characteristics, such as image classification. If the pre-trained features are relevant to the task, freezing most of the pre-trained model and only training a new classifier might be sufficient.

**3. Resource Constraints:** Fine-tuning a deep neural network with a large number of parameters requires more computational resources compared to feature extraction. If we have limited resources, feature extraction provides a computationally less expensive option.

**4. Preventing Overfitting:** Freezing the majority of the pre-trained model helps prevent overfitting, especially when the task-specific dataset is small. The pre-trained model acts as a feature extractor, and the new classifier is trained to make predictions based on these extracted features.

Q3: Replace the ResNet18 architecture with some other pretrained model in pytorch and try to find the optimal parameters. Report the architecture and the final model performance.

**Answer:**

We can replace the ResNet18 architecture with a different pre-trained model. Let's replace it with the VGG16 architecture.

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader, SubsetRandomSampler
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

import matplotlib.pyplot as plt
import numpy as np
import time

# Device configuration (whether to run on GPU or CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)   # Set seeds for reproducibility
seed = 0
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True

# Download and unzip the dataset
!gdown --id 1V7dt70fz_AKRJlttyjnrtFpuJDLXr15x
!unzip -q german_traffic_signs_dataset.zip

# Transformation for data augmentation
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.GaussianBlur(3),
    transforms.RandomAffine(0, translate=(0.3, 0.3), shear=5),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Load the dataset
trainset = ImageFolder('german_traffic_signs_dataset/Train', transform=transform)
testset = ImageFolder('german_traffic_signs_dataset/Test', transform=transform)

# Shuffle and split train set into 80% training and 20% validation set
val_split = 0.2
indices = np.arange(len(trainset))
np.random.shuffle(indices)
partition = int((1 - val_split) * len(trainset))

# SubsetRandomSampler will only sample examples from the given subset of data
train_loader = DataLoader(trainset, shuffle=False, sampler=SubsetRandomSampler(indices[:partition]), batch_size=64, num_workers=2)
val_loader = DataLoader(trainset, shuffle=False, sampler=SubsetRandomSampler(indices[partition:]), batch_size=64, num_workers=2)

dataloaders = {'train': train_loader, 'val': val_loader}
dataset_sizes = {'train': partition, 'val': len(train_loader.dataset) - partition}

test_loader = DataLoader(testset, shuffle=False, batch_size=64, num_workers=2)

# Print dataset information
print('Number of training images: ', dataset_sizes['train'])
print('Number of validation images: ', dataset_sizes['val'])
print('Number of test images: ', len(test_loader.dataset))
print('Number of classes: ', len(trainset.classes))

# Helper function to show an image
def plot_image(img):
    img = img / 2 + 0.5  # unnormalize the image
    npimg = img.numpy()  # torch to numpy
    plt.imshow(np.transpose(npimg, (1, 2, 0)))  # as torch image is (C, H, W)
    plt.show()

# Get some random training images from dataloader
dataiter = iter(train_loader)
images, labels = next(dataiter)

# Plot images
plot_image(torchvision.utils.make_grid(images[:20], nrow=5))

# Define a custom classifier for VGG16
class CustomVGG16Classifier(nn.Module):
    def __init__(self, num_classes=43):
        super(CustomVGG16Classifier, self).__init__()
        self.features = torchvision.models.vgg16(pretrained=True).features
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

# Initialize the model
model = CustomVGG16Classifier(num_classes=43).to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Training loop
num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    # Print training statistics
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')

# Evaluation on the test set
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f'Test Accuracy: {accuracy}')


Some of the parameters that can be considered for optimization include learning rate, number of epochs, and model architecture.

Q4: Which other data augmentations can we used to augment the data?

**Answer:**

Other common data augmentations that can be used to further augment the data include:

**1. Random Rotation:** Randomly rotating the image by a certain angle.

**2. Color Jitter:** Randomly changing the brightness, contrast, saturation, and hue of the image.

**3. Random Horizontal Flip:** Flipping the image horizontally with a certain probability.

**4. Random Vertical Flip:** Flipping the image vertically with a certain probability.

**5. Random Crop:** Randomly cropping a portion of the image.