# Understanding Pooling and Padding in CNN

## 1.Describe the purpose and benefits of pooling in CNN.

* Purpose

It helps in reducing the size of image.

It reducing the size of image that forces the model to better generalize and it don't let the model to memorize the data. By doing this it solves the problem of overfitting.

By distorting the input image it makes the model more robust to variations.

* Benefits

Pooling downsamples the image and because of that it reduces the number of neuron in a perticular layer.

It helps in extracting dominant features.

By reducing the overfitting it increases a genralization capability of the model and makes the model effective on unseen data.

## 2.Explain the difference between min pooling and max pooling.

* Key Differences

Focus : Max pooling emphasizes the most significant features and min pooling emphasizes the least significant features.

Impact : Max pooling makes the network more robust to variations by capturing strong signals and min pooling helps to capture low-contrast details the max pooling could overlook.

Usage : Max pooling is more commonly used in standard CNN architectures, while min pooling is less common and is used in niche applications where weaker signals are important.

## 3.Discuss the concept of padding in CNN and its significance.

* Definition

Padding is process that adds extra pixels arround the inpute image or feature map. This is done before performing convolution operation.

* Significance of Padding

Preserves spatial dimensions : It is important in case where the size of the output needs to be the same as inpute.

Retaines border information : Padding ensures that every pixel even the pixels at the border are also be included in convolution operations.

Allow deeper networks : Padding allows the network to have more convolution layers without reducing the size of feature map too quickly.

Avoids information loss : If we don't have padding so because of convolution layer we will loss our border information, by adding padding we can avoid the information loss.

## 4.Compare and contrast zero-padding and valid-padding in terms of their effects on the output feature map size.

Key Differences

* Definition

Zero-Padding : Adds extra pixels arround the inpute image, especially zeros.

Valid-Padding : No extra pixels are added to inpute feature map.

* Effect on Output Size

Zero-Padding : Keeps the size of output image same as the inpute image.

Valid-Padding : Reduces the size of output image as compare to input image.

* Information Retention 

Zero-Padding : Retains the information from border pixels of feature map.

Valid-Padding : Discards the information from the border and near the border pixels of feature map.

# Exporing LeNet

## 1.Provide a brief overview of LeNet-5 architecture.

* Input Layer:

Accepts a grayscale image of size 32×32pixels. The MNIST dataset images are originally 28×28, so they are zero-padded to match the input size of LeNet-5.

* C1 - First Convolutional Layer:

Applies 6 convolutional filters (kernels) of size 5×5 with a stride of 1, resulting in 6 feature maps of size 28×28(since no padding is used).

Each filter detects different local patterns or features, such as edges or textures.

Activation function: Sigmoid or tanh.

* S2 - First Subsampling (Pooling) Layer:

Averages the values in non-overlapping 2×2 windows using average pooling with a stride of 2, reducing the size of each feature map from 28×28 to 14×14.

The number of feature maps remains the same (6).

Activation function: Sigmoid or tanh.

* C3 - Second Convolutional Layer:

Applies 16 convolutional filters of size 5×5 to the pooled output.

Different filters connect to different subsets of the input feature maps from the previous layer (not all 6 maps), resulting in 16 feature maps of size 10×10.

Activation function: Sigmoid or tanh.

* S4 - Second Subsampling (Pooling) Layer:

Applies average pooling with a 2×2 window and a stride of 2, reducing the feature map size from 10×10 to 5×5.

The number of feature maps remains the same (16).

Activation function: Sigmoid or tanh.

* C5 - Fully Connected Convolutional Layer:

A fully connected layer with 120 units. Each of the 120 neurons is connected to all 400 (16 feature maps of size 5×5 inputs from the previous layer.

This layer uses a set of learnable weights to combine features from the input maps.

Activation function: Sigmoid or tanh.

* F6 - Fully Connected Layer:

A fully connected layer with 84 neurons, where each neuron is connected to all 120 inputs from the previous layer.

Activation function: Sigmoid or tanh.

* Output Layer:

A fully connected layer with 10 neurons, corresponding to the 10 possible digit classes (0-9) for the MNIST dataset.

Activation function: Softmax, which outputs a probability distribution over the 10 classes.

## 2.Describe the key components of LeNet-5 and their respective purposes.

* Input Layer: Takes a grayscale image of size 32×32 pixels as input, typically a zero-padded MNIST digit.

* C1 - First Convolutional Layer: Applies 6 convolutional filters of size 5×5 to extract basic features (e.g., edges, textures). Produces 6 feature maps of size 28×28.

* S2 - First Subsampling (Pooling) Layer: Performs average pooling with a 2×2 window, reducing each feature map size to 14×14 to decrease spatial dimensions and introduce translational invariance.

* C3 - Second Convolutional Layer: Uses 16 convolutional filters of size 5×5 to learn more complex patterns from pooled features. Produces 16 feature maps of size 10×10.

* S4 - Second Subsampling (Pooling) Layer: Further reduces feature map size to 5×5 through average pooling, maintaining the number of feature maps (16) while reducing computational load.

* C5 - Fully Connected Convolutional Layer: A fully connected layer with 120 neurons to combine all previous features into a compact representation.

* F6 - Fully Connected Layer: Has 84 neurons that learn complex combinations of features, bridging to the output.

* Output Layer: A fully connected layer with 10 neurons representing 10 digit classes. Uses Softmax to output class probabilities.

## 3.Discuss the advantages and limitations of LeNet-5 in the context of image classification tasks.

* Advantages

LeNet-5 has a simple and straightforward architecture.

The use of convolutional layers in LeNet-5 allows it to automatically learn hierarchical features from input images.

By using pooling layer it achieves some level of translation invariance, which makes it more robust to restortion in image.

LeNet-5 performs well on small, clean datasets like MNIST.

* Limitations

LeNet-5 is a relatively shallow network with only a few layers.

The small kernel size (5x5) and shallow depth result in the network that can only capture small, localized patterns.

LeNet-5 lacks modern regularization techniques like dropout, batch normalization, and data augmentation, making it prone to overfitting when trained on larger and more complex datasets.

## 4.Implement LeNet-5 using a deep learning framework of your choice (e.g., TensorFlow, PyTorch) and train it on a publicly available dataset (e.g., MNIST). Evaluate its performance and provide insights.

Jupyter notebook was not supporting the dataset, so I done the code in GoogleColab and below is the link of it :
https://colab.research.google.com/drive/1rsSjgGThh4ncOXgCay49puaGJBgItFtp#scrollTo=gnAX_fG9NbVm

# Analyzing the AlexNet

## 1.Present an overview of the AlexNet architecture.

* Input Layer:

Accepts RGB images of size 227×227×3 pixels. The original images from the ImageNet dataset are typically of size 256×256, which are randomly cropped and resized to 227×227 to fit the network.

* First Convolutional Layer (Conv1):

Applies 96 convolutional filters (kernels) of size 11×11×3 with a stride of 4 and padding of 0.

Produces 96 feature maps of size 55×55.

Followed by a ReLU activation function to introduce non-linearity.

Utilizes Local Response Normalization (LRN) to normalize the activations, mimicking the lateral inhibition in real neurons.

Uses max-pooling with a 3×3 window and a stride of 2 to reduce the size of each feature map to 27×27.

* Second Convolutional Layer (Conv2):

Applies 256 convolutional filters of size 5×5×48 with a stride of 1 and padding of 2.

Produces 256 feature maps of size 27×27.

Followed by a ReLU activation and Local Response Normalization (LRN).

Uses max-pooling with a 3×3 window and a stride of 2, reducing the size of each feature map to 13×13.

* Third Convolutional Layer (Conv3):

Applies 384 convolutional filters of size 3×3×256 with a stride of 1 and padding of 1.

Produces 384 feature maps of size 13×13.

Followed by a ReLU activation function.

* Fourth Convolutional Layer (Conv4):

Applies 384 convolutional filters of size 3×3×192 with a stride of 1 and padding of 1.

Produces 384 feature maps of size 13×13.

Followed by a ReLU activation function.

* Fifth Convolutional Layer (Conv5):

Applies 256 convolutional filters of size 3×3×192 with a stride of 1 and padding of 1.

Produces 256 feature maps of size 13×13.

Followed by a ReLU activation function.

Uses max-pooling with a 3×3 window and a stride of 2, reducing the size of each feature map to 6×6.

* First Fully Connected Layer (FC6):

Flattening the output from the last pooling layer (a total of 6×6×256=9216 units.

Fully connected to 4096 neurons.

Followed by a ReLU activation function.

Uses dropout (with a probability of 0.5) to prevent overfitting.

* Second Fully Connected Layer (FC7):

Fully connected to 4096 neurons.

Followed by a ReLU activation function.

Uses dropout (with a probability of 0.5) to further prevent overfitting.

* Output Layer (FC8):

Fully connected to 1000 neurons, corresponding to the 1000 classes in the ImageNet dataset.

Uses a Softmax activation function to output a probability distribution over the 1000 classes.

## 2.Explain the architectural innovations introduced in AlexNet that contributed to its breakthrough performance.

1. ReLU Activation Function :  AlexNet used the Rectified Linear Unit (ReLU) activation function instead of traditional activation functions like Sigmoid or Tanh.

2. Dropout Regularization : AlexNet employed dropout in the fully connected layers. During each training iteration, dropout randomly "drops out" (sets to zero) a fraction (50%) of the neurons, effectively removing them from the network.

3. Data Augmentation : AlexNet utilized data augmentation techniques to artificially expand the size of the training dataset. This included random cropping, horizontal flipping, and random changes in brightness, contrast, and color.

4. Large Convolution Kernels and Strided Convolution : The first convolutional layer in AlexNet used relatively large convolutional kernels of size 11×11 with a stride of 4, followed by subsequent layers with smaller kernel sizes.

5. Use of Multiple Convolution Layer : AlexNet was one of the first networks to use multiple stacked convolutional layers to progressively learn hierarchical features from low-level edges to high-level shapes and objects.

6. Softmax Output Layer : The final layer of AlexNet uses a Softmax activation function to produce a probability distribution over the 1000 classes in the ImageNet dataset.

## 3.Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.

* Convolutional Layers

The convolution layer serves as a primary mechanism for feature learning in AlexNet.

By stacking multiple convolution layers network can learn hierarchical representation, from low-level feature to more abstract patterns.

It enhances the network's ability to observe more objects in image.

* Pooling Layers

By decreasing the dimensions of feature map it helps to decrease a number of parameters and computation cost.

Max pooling introduces translation invariance, which allows network to recognize objects in different positions and orientations.

It makes the network more robust to distortion in image.

* Fully Connected Layers

Fully connected layers synthesizes all learned features from convolution and pooling layer to make the final classification decision.

This layers are capable of learning non-linear relationships, enhances the capability of network to classify diverse images.

The ouput layers determines the final class of the inpute image by taking the class with highest probability from the softmax activation function.

## 4.Implement AlexNet using a deep learning framework of your choice and evaluate its performance on a dataset of your choice.

https://colab.research.google.com/drive/1GpLb3wCVu6v1DW-AVnUR3ArEyVjKnqJU#scrollTo=DnxTzA5uZk2d