In [None]:
# Sumani
# 5-8-2024

# 1. Key Concepts in Convolutional Neural Networks (CNNs)

## Introduction
Convolutional Neural Networks (CNNs) are a class of deep neural networks that have proven to be highly effective for various computer vision tasks, such as image classification, object detection, and segmentation. CNNs are designed to automatically and adaptively learn spatial hierarchies of features from input images.

This notebook will cover the fundamental concepts that underpin CNNs, including convolution operations, padding, stride, feature maps, and filters. By understanding these basics, students will be well-equipped to delve into more complex topics and applications of CNNs.

## Key Concepts

### Convolution
Convolution is the cornerstone of CNNs. It involves sliding a filter (also known as a kernel) across the input image and computing the dot product between the filter and a portion of the input at each position. This process extracts features such as edges, textures, and patterns from the input image. Convolution reduces the spatial dimensions of the input while retaining the essential features needed for the learning process.

### Padding
Padding refers to adding extra pixels around the border of the input image. This is done to control the spatial dimensions of the output feature map. There are two main types of padding:
- **Valid Padding**: No padding is added, and the filter is only applied to valid positions within the input. This results in an output that is smaller than the input.
- **Same Padding**: Padding is added to ensure that the output feature map has the same spatial dimensions as the input. This is useful for preserving the spatial resolution.

### Stride
Stride is the step size by which the filter moves across the input image. A stride of 1 means the filter moves one pixel at a time, while a stride of 2 means it moves two pixels at a time. Stride affects the spatial dimensions of the output feature map. Larger strides result in smaller output dimensions and faster computations, but can also miss finer details in the input.

### Feature Maps
Feature maps are the outputs of the convolutional layers. They represent the presence of specific features detected by the filters at various spatial locations in the input image. Each filter produces a separate feature map, and stacking these maps forms the complete set of features learned by the CNN.

### Filters (Kernels)
Filters are small matrices that slide over the input image to perform the convolution operation. Each filter is trained to detect a specific feature, such as an edge, corner, or texture. The depth of a filter matches the number of channels in the input image (e.g., 3 channels for RGB images). The values in the filter are learned during the training process to optimize the feature extraction.

### Pooling Layers
Pooling layers are used to reduce the spatial dimensions of the feature maps, making the computation more efficient and reducing the risk of overfitting. The most common types of pooling are:
- **Max Pooling**: Takes the maximum value in each patch of the feature map.
- **Average Pooling**: Takes the average value in each patch of the feature map.

### Activation Functions
Activation functions introduce non-linearity into the network, allowing it to learn more complex patterns. The most common activation function in CNNs is the Rectified Linear Unit (ReLU), which replaces all negative values with zero.

### Regularization Techniques
Regularization techniques are used to prevent overfitting and improve the generalization of the model. Common techniques include:
- **Dropout**: Randomly sets a fraction of the input units to zero during training.
- **Batch Normalization**: Normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

### Advanced Concepts
- **Transfer Learning**: Using pre-trained models on new tasks to leverage learned features.
- **Backpropagation in CNNs**: Understanding how gradients are calculated and weights are updated during training.
- **Data Augmentation**: Techniques to artificially increase the size of the training dataset by creating modified versions of images.

By the end of this notebook, you will have a solid understanding of these fundamental concepts and how they are implemented in PyTorch. This knowledge will serve as a foundation for building and training your own CNNs for various computer vision tasks.

Let's dive in and explore each concept in detail with practical examples and code implementations.


# 2. Convolution

Convolution is a mathematical operation used in deep learning, especially in Convolutional Neural Networks (CNNs), to extract features from input data. It involves sliding a filter (also called a kernel) over the input data and computing the dot product at each position.

In [None]:
# Import necessary libraries
import torch
import torch.nn.functional as F

In [None]:
# Define a simple input tensor (3x3)
input_tensor = torch.tensor([[1.0, 2.0, 3.0],
                             [4.0, 5.0, 6.0],
                             [7.0, 8.0, 9.0]])

# Define a simple filter (2x2)
filter_tensor = torch.tensor([[1.0, 0.0],
                              [0.0, -1.0]])

In [None]:
# Perform convolution without padding and stride of 1
conv_output = F.conv2d(input_tensor.unsqueeze(0).unsqueeze(0), filter_tensor.unsqueeze(0).unsqueeze(0))

In [None]:
# Output the result
print("Convolution Output (no padding, stride 1):\n", conv_output)

## 3. Padding

Padding is the addition of extra pixels (usually zeros) to the border of the input data. It helps in maintaining the spatial dimensions of the output after convolution and ensures that edge information is not lost.


In [None]:
# Perform convolution with padding of 1 and stride of 1
conv_output_padded = F.conv2d(input_tensor.unsqueeze(0).unsqueeze(0), filter_tensor.unsqueeze(0).unsqueeze(0), padding=1)

# Output the result
print("Convolution Output (padding 1, stride 1):\n", conv_output_padded)

## 4. Stride
Stride is the number of pixels by which the filter moves over the input data. A stride of 1 means the filter moves one pixel at a time, while a stride of 2 means the filter moves two pixels at a time. Larger strides result in smaller output dimensions.


In [None]:
# Perform convolution with padding of 1 and stride of 2
conv_output_stride = F.conv2d(input_tensor.unsqueeze(0).unsqueeze(0), filter_tensor.unsqueeze(0).unsqueeze(0), padding=1, stride=2)

# Output the result
print("Convolution Output (padding 1, stride 2):\n", conv_output_stride)