# **TOPIC: Understanding Pooling and Padding in CNN**

## ANSWER 1
**Pooling in CNN:**

**Purpose:**

 Pooling is a downsampling operation in Convolutional Neural Networks (CNNs) that reduces the spatial dimensions of the input feature maps. The primary purpose is to progressively reduce the spatial size, which helps in controlling the number of parameters and computations in the network. This, in turn, helps in preventing overfitting and reducing the computational load.

**Benefits:**

Translation Invariance: Pooling makes the network less sensitive to the spatial location of features, providing a degree of translation invariance.

Reduced Computational Complexity: Pooling reduces the number of parameters and computations in the network, making it computationally more efficient.

Feature Hierarchy: Successive pooling layers help create a hierarchical representation of features, capturing both local and global information.

## ANSWER 2

**Max Pooling:**

 It selects the maximum value from the group of values within the pooling window. Max pooling is effective in capturing the most prominent feature in a local region.

**Min Pooling:**

 It selects the minimum value from the group of values within the pooling window. Min pooling can be less common and is used to capture the least prominent feature in a local region. It may be useful in certain scenarios where highlighting the least intense features is important.

## ANSWER 3

**Padding in CNN:**

**Concept:**

 Padding involves adding extra border pixels to the input feature map, allowing the convolutional filters to process the pixels at the image boundaries. This is important to prevent the reduction in spatial dimensions and loss of information during convolutional operations.

**Significance:**

Preservation of Spatial Information: Padding ensures that the spatial dimensions of the input are maintained after convolution, preventing information loss.

Edge and Corner Pixel Treatment: Without padding, the pixels at the edges and corners of the input feature map would be underrepresented in the output, impacting the network's ability to capture features at these locations.

## ANSWER 4
**Zero-padding:**

Effect on Output Size: Increases the spatial dimensions of the feature map by adding zeros around the input.

Advantage: Preserves spatial information at the borders and corners of the input.

Commonly Used: Often used to ensure that the output feature map has the same spatial dimensions as the input.

**Valid-padding:**

Effect on Output Size: Does not add any extra padding to the input, resulting in a smaller output size compared to the input.

Advantage: Reduces the computational load and the risk of overfitting.

Commonly Used: Frequently used when the spatial dimensions can be reduced without significant loss of information.

#**TOPIC: Exploring LeNet**

## ANSWER 1
**Overview of LeNet-5 Architecture:**

LeNet-5 is a pioneering convolutional neural network (CNN) architecture developed by Yann LeCun and his collaborators in the late 1990s. It was designed for handwritten digit recognition tasks and played a significant role in the development of deep learning for computer vision.

## ANSWER 2
**Key Components of LeNet-5:**

**Convolutional Layers:**

LeNet-5 consists of multiple convolutional layers, each followed by a subsampling (pooling) layer. These layers are responsible for learning hierarchical features from the input data.

**Fully Connected Layers:**

After the convolutional and pooling layers, LeNet-5 has fully connected layers for high-level reasoning and decision-making.

**Activation Functions:**

Tanh and sigmoid activation functions are used in different layers of LeNet-5 to introduce non-linearity and enable the network to learn complex patterns.

**Pooling Layers:**

LeNet-5 uses average pooling in some layers to reduce spatial dimensions and introduce translation invariance.

**Softmax Layer:**

The final layer is a softmax layer that produces the output probabilities for each class, making it suitable for multi-class classification tasks.

## ANSWER 3
**Advantages and Limitations of LeNet-5:**

**Advantages:**

Pioneering Architecture: LeNet-5 laid the foundation for modern CNNs and demonstrated the effectiveness of convolutional layers for feature extraction.

Applicability: While initially designed for digit recognition, LeNet-5's principles have been applied to various image classification tasks.

**Limitations:**

Simplicity: Due to its age, LeNet-5 may lack the depth and complexity of more recent architectures, potentially limiting its performance on highly complex tasks.

Limited Capacity: In comparison to more recent CNNs, LeNet-5 may struggle with capturing intricate patterns in large datasets.

## ANSWER 4

In [None]:
from tensorflow import keras
from keras.datasets import mnist
from keras.layers import Conv2D,MaxPooling2D,AveragePooling2D
from keras.layers import Dense,Flatten
from keras.models import Sequential

**Load MNIST dataset**

In [None]:
(X_train , y_train ) , (X_test , y_test) = keras.datasets.mnist.load_data()

**Preprocess MNIST dataset**

In [None]:
X_train = X_train / 255.0

In [None]:
X_test = X_test /255.0

**Encode the labels**

In [None]:
y_train = keras.utils.to_categorical(y_train,10)
y_test = keras.utils.to_categorical(y_test,10)

**Build LeNet-5 model**

In [None]:
model = Sequential()

model.add(Conv2D(6,kernel_size = (5,5),padding ='valid',activation='tanh',input_shape =(28,28,1)))
model.add(AveragePooling2D(pool_size = (2,2) , strides = 2 , padding = 'valid'))

model.add(Conv2D(16,kernel_size = (5,5) , padding = 'valid', activation ='tanh'))
model.add(AveragePooling2D(pool_size = (2,2),strides = 2 , padding = 'valid'))

model.add(Flatten())

model.add(Dense(120,activation='tanh'))
model.add(Dense(84,activation='tanh'))
model.add(Dense(10,activation='softmax'))

**Model Summary**

In [None]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 24, 24, 6)         156       
                                                                 
 average_pooling2d_6 (Avera  (None, 12, 12, 6)         0         
 gePooling2D)                                                    
                                                                 
 conv2d_7 (Conv2D)           (None, 8, 8, 16)          2416      
                                                                 
 average_pooling2d_7 (Avera  (None, 4, 4, 16)          0         
 gePooling2D)                                                    
                                                                 
 flatten_3 (Flatten)         (None, 256)               0         
                                                                 
 dense_9 (Dense)             (None, 120)              

**Compiling the model**

In [None]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

**Traning the model**

In [None]:
model.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.1)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7ce01ee7e590>

**Evaluating the model**

In [None]:
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)

313/313 - 2s - loss: 0.0477 - accuracy: 0.9854 - 2s/epoch - 6ms/step


In [None]:
print("Test Accuracy : ",test_acc*100 ,"%")

Test Accuracy :  98.54000210762024 %


#**TOPIC: Analyzing AlexNet**

## ANSWER 1
**Overview of AlexNet Architecture:**

AlexNet is a deep convolutional neural network architecture designed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. It gained significant attention and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 by achieving a considerable improvement in image classification accuracy.

## ANSWER 2
**Architecture Innovations in AlexNet:**

Depth and Width: AlexNet was one of the first deep CNNs, consisting of eight layers, including five convolutional layers and three fully connected layers. This depth allowed the model to learn hierarchical features.

ReLU Activation: AlexNet used Rectified Linear Units (ReLU) as the activation function, which helped mitigate the vanishing gradient problem and accelerate convergence.

Local Response Normalization (LRN): LRN was applied after the ReLU activation to enhance the model's ability to generalize by normalizing responses across adjacent channels.

Overlapping Pooling: The pooling layers utilized overlapping regions, unlike traditional max-pooling, providing richer spatial hierarchies.

Data Augmentation and Dropout: AlexNet employed data augmentation (such as random cropping and flipping) during training to reduce overfitting. Dropout, a regularization technique, was applied to fully connected layers.

## ANSWER 3
**Role of Convolutional Layers, Pooling Layers, and Fully Connected Layers:**

Convolutional Layers: The convolutional layers in AlexNet are responsible for learning hierarchical features from the input image. These layers use filters to convolve across the input, capturing patterns and features.

Pooling Layers: Max-pooling layers in AlexNet help downsample the spatial dimensions of the feature maps, reducing computational complexity and introducing some degree of translation invariance. Overlapping pooling was used to increase spatial hierarchies.

Fully Connected Layers: The fully connected layers at the end of the network capture high-level reasoning and make predictions based on the features learned by the convolutional and pooling layers. These layers enable the model to make class predictions.

## ANSWER 4

In [1]:
pip install tensorflow



In [2]:
!pip install tflearn

Collecting tflearn
  Downloading tflearn-0.5.0.tar.gz (107 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/107.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━[0m [32m61.4/107.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.3/107.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: tflearn
  Building wheel for tflearn (setup.py) ... [?25l[?25hdone
  Created wheel for tflearn: filename=tflearn-0.5.0-py3-none-any.whl size=127283 sha256=df42014eaca9f28771489b713dba9bd6bddf08a8267b7154554bc72fba2709bd
  Stored in directory: /root/.cache/pip/wheels/55/fb/7b/e06204a0ceefa45443930b9a250cb5ebe31def0e4e8245a465
Successfully built tflearn
Installing collected packages: tflearn
Successfully installed tflearn-0.5.0


In [3]:
import tensorflow as tf
from tensorflow import keras
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization

In [4]:
from keras.datasets import cifar10
from keras.utils import to_categorical

**Load cifar10 dataset**

In [5]:
(X_train , y_train ) , (X_test , y_test) = keras.datasets.cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


**Preprocess CIFAR10 dataset**

In [6]:
X_train = X_train / 255.0

In [7]:
X_test = X_test / 255.0

**Encode the labels**

In [8]:
y_train = keras.utils.to_categorical(y_train,10)
y_test = keras.utils.to_categorical(y_test,10)

**Build AlexNet Model**

In [9]:
# Create a sequential model
model = Sequential()

In [10]:
# 1st Convolutional Layer
model.add(Conv2D(filters=96, input_shape=(32,32,3), kernel_size=(11,11), strides=(4,4), padding='valid'))
model.add(Activation('relu'))

# Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid'))
# Batch Normalisation before passing it to the next layer
model.add(BatchNormalization())

In [12]:
# 2nd Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(5,5),strides=(1,1),  padding='same'))
model.add(Activation('relu'))

# Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same'))
# Batch Normalisation
model.add(BatchNormalization())

In [13]:
# 3rd Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())

In [14]:
# 4th Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
# Batch Normalisation
model.add(BatchNormalization())

In [15]:
# 5th Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='same'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='same'))
# Batch Normalisation
model.add(BatchNormalization())

In [16]:
# Passing it to a dense layer
model.add(Flatten())

In [17]:
# 1st Dense Layer
model.add(Dense(4096))
model.add(Activation('relu'))
# Add Dropout to prevent overfitting
model.add(Dropout(0.5))
# Batch Normalisation
model.add(BatchNormalization())

In [18]:
# 2nd Dense Layer
model.add(Dense(4096))
model.add(Activation('relu'))

In [19]:
# Add Dropout
model.add(Dropout(0.5))
# Batch Normalisation
model.add(BatchNormalization())

In [20]:
# Output Layer
model.add(Dense(10))
model.add(Activation('softmax'))

**Model Summary**

In [21]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 6, 6, 96)          34944     
                                                                 
 activation (Activation)     (None, 6, 6, 96)          0         
                                                                 
 max_pooling2d (MaxPooling2  (None, 2, 2, 96)          0         
 D)                                                              
                                                                 
 batch_normalization (Batch  (None, 2, 2, 96)          384       
 Normalization)                                                  
                                                                 
 conv2d_1 (Conv2D)           (None, 2, 2, 256)         614656    
                                                                 
 activation_1 (Activation)   (None, 2, 2, 256)         0

**Compiling The Model**

In [22]:
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

**Training the model**

In [23]:
# Train
model.fit(X_train, y_train, batch_size=128, epochs=5, verbose=1,validation_split=0.2, shuffle=True)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7c30b6aadde0>

**Evaluating the model**

In [24]:
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)

313/313 - 17s - loss: 1.6939 - accuracy: 0.4155 - 17s/epoch - 53ms/step


In [25]:
print("Test Accuracy : ",test_acc*100 ,"%")

Test Accuracy :  41.54999852180481 %
