# TOPIC: Understanding Pooling and Padding in CNN


1. Describe the purpose and benefits of pooling in CNN.


A1. Purpose: Pooling layers reduce the spatial dimensions (width and height) of the feature maps, preserving important information while reducing computation and overfitting.

## Benefits:

1. Dimensionality reduction: Speeds up training and inference.

2. Translation invariance: Helps the model become more robust to small shifts and distortions.

3. Reduces overfitting: By summarizing feature presence rather than location.



2. Explain the difference between min pooling and max pooling.

A2. Difference between Max and Min Pooling are:         

## **Max Pooling**

1. Selects the **maximum** value in each patch.

2. Focuses on **strongest features**.

3. Commonly used in practice.

## **Min Pooling**      

1. Selects the **minimum** value in each patch.                 |

2. Retains **least activated features**.

3. Rarely used, typically for experimentation or special cases.


3. Discuss the concept of padding in CNN and its significance.

A3. **Padding:** It involves adding extra pixels (usually zeros) around the input image.

## Significance:

1. Controls output size: Prevents shrinking of feature maps after each convolution.

2. Preserves edge information: Ensures features at the borders are not ignored.

3. Allows deeper networks: Maintains spatial dimensions, enabling more layers without losing too much resolution.

4. Compare and contrast zero-padding and valid-padding in terms of their effects on the output feature map size.

A4.

| **Zero Padding ("same")**   | **Valid Padding**                          |
| --------------------------- | ------------------------------------------ |
| Pads input with zeros.      | No padding applied.                        |
| Output size ≈ Input size.   | Output size shrinks with each convolution. |
| Preserves spatial features. | Might lose edge information.               |
| Useful for deep networks.   | Used when reducing size is intentional.    |


Summary:

Use zero-padding when you want to maintain spatial dimensions.

Use valid-padding when you want to reduce dimensions intentionally.




# TOPIC: Exploring LeNet

1. Provide a brief overview of LeNet-5 architecture.

A1. LeNet-5 is a pioneering convolutional neural network (CNN) architecture developed by Yann LeCun in 1998 for handwritten digit recognition (MNIST dataset).

It introduced key concepts like convolutions, pooling, and fully connected layers, forming the backbone of modern CNNs.

2. Describe the key components of LeNet-5 and their respective purposes.

A2.

| **Layer**       | **Type**                   | **Purpose**                          |
| --------------- | -------------------------- | ------------------------------------ |
| **Input Layer** | 32×32 grayscale image      | Accepts input image (e.g., digit)    |
| **C1**          | Convolutional (6\@5x5)     | Extracts local features              |
| **S2**          | Subsampling (2x2 avg)      | Downsamples to reduce dimensionality |
| **C3**          | Convolutional (16\@5x5)    | Extracts deeper patterns             |
| **S4**          | Subsampling (2x2 avg)      | Further downsampling                 |
| **C5**          | Fully connected conv (120) | Transition to dense features         |
| **F6**          | Fully connected (84)       | High-level feature combination       |
| **Output**      | Fully connected (10)       | Predicts digit class (0-9)           |


Note: Activation function used is tanh in the original architecture.

3. Discuss the advantages and limitations of LeNet-5 in the context of image classification tasks.

A3. Advantages and Limitations in Image Classification are -

## Advantages:

1. **Lightweight:** Fewer parameters compared to modern networks.

2. **Interpretable:** Easy to understand and visualize.

3. **Efficient:** Fast on small datasets (e.g., MNIST).


## Limitations:

1. **Poor scalability:** Not suitable for large or complex datasets (e.g., ImageNet).

2. **Shallow:** Lacks deep feature extraction compared to modern CNNs.

3. **Limited to grayscale:** Originally designed for 1-channel inputs.

4. Implement LeNet-5 using a deep learning framework of your choice (e.g., TensorFlow, PyTorch) and train it on a publicly available dataset (e.g., MNIST). Evaluate its performance and provide insights.

In [1]:
# A4. Implementing LeNet-5 using Tensorflow.

import tensorflow as tf
from tensorflow.keras import layers, models

def build_lenet5():
    model = models.Sequential()
    model.add(layers.Input(shape=(32, 32, 1)))  # Input Layer

    model.add(layers.Conv2D(6, kernel_size=(5, 5), activation='tanh'))  # C1
    model.add(layers.AveragePooling2D(pool_size=(2, 2)))  # S2

    model.add(layers.Conv2D(16, kernel_size=(5, 5), activation='tanh'))  # C3
    model.add(layers.AveragePooling2D(pool_size=(2, 2)))  # S4

    model.add(layers.Flatten())
    model.add(layers.Dense(120, activation='tanh'))  # C5
    model.add(layers.Dense(84, activation='tanh'))   # F6
    model.add(layers.Dense(10, activation='softmax'))  # Output Layer

    return model

# Compile and view the model
lenet_model = build_lenet5()
lenet_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
lenet_model.summary()


# TOPIC: Analyzing AlexNet

1. Present an overview of the AlexNet architecture.







A1. AlexNet, developed by Alex Krizhevsky et al. in 2012, revolutionized deep learning by winning the ImageNet LSVRC-2012 competition by a large margin.

It demonstrated the power of deep convolutional neural networks on large-scale image classification.

Input: 227×227×3 image
Output: 1000 class probabilities (ImageNet classes)


2. Explain the architectural innovations introduced in AlexNet that contributed to its breakthrough
performance.

A2. Architectural Innovations of AlexNet

## Key Innovations:

1. **ReLU activation:** Introduced ReLU instead of tanh/sigmoid for faster training.

2. **GPU parallelism:** Split network across two GPUs to speed up training.

3. **Dropout:** Used to prevent overfitting in fully connected layers.

4. **Data Augmentation:** Random cropping, flipping, and color jittering.

5. **Local Response Normalization (LRN):** For contrast enhancement (now rarely used).

3. Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.

A3.

| **Layer Type**             | **Purpose**                                                         |
| -------------------------- | ------------------------------------------------------------------- |
| **Convolutional Layers**   | Feature extraction using filters (capture spatial hierarchies).     |
| **Pooling Layers**         | Downsample feature maps and make representations more invariant.    |
| **Fully Connected Layers** | Combine high-level features to make final classification decisions. |
| **ReLU Activation**        | Speeds up convergence and adds non-linearity.                       |
| **Dropout**                | Prevents overfitting by randomly deactivating neurons.              |


4. Implement AlexNet using a deep learning framework of your choice and evaluate its performance on a dataset of your choice.

A4. AlexNet Implementation in TensorFlow/Keras

We'll use CIFAR-10 (10 classes, 32×32 images) for simplicity.

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, datasets
from tensorflow.keras.utils import to_categorical

# Load and preprocess CIFAR-10
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
x_train = tf.image.resize(x_train / 255.0, (227, 227))
x_test = tf.image.resize(x_test / 255.0, (227, 227))
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# AlexNet Model
def build_alexnet():
    model = models.Sequential()

    model.add(layers.Conv2D(96, (11, 11), strides=4, activation='relu', input_shape=(227, 227, 3)))
    model.add(layers.MaxPooling2D((3, 3), strides=2))

    model.add(layers.Conv2D(256, (5, 5), padding="same", activation='relu'))
    model.add(layers.MaxPooling2D((3, 3), strides=2))

    model.add(layers.Conv2D(384, (3, 3), padding="same", activation='relu'))
    model.add(layers.Conv2D(384, (3, 3), padding="same", activation='relu'))
    model.add(layers.Conv2D(256, (3, 3), padding="same", activation='relu'))
    model.add(layers.MaxPooling2D((3, 3), strides=2))

    model.add(layers.Flatten())
    model.add(layers.Dense(4096, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(4096, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(10, activation='softmax'))  # For CIFAR-10

    return model

# Compile and train
alexnet = build_alexnet()
alexnet.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
alexnet.fit(x_train, y_train, batch_size=64, epochs=5, validation_data=(x_test, y_test))


## Summary Insights

1. AlexNet marked the deep learning boom in computer vision.

2. Its modular structure inspired many architectures like VGG, ZFNet, and even early ResNets.

3. You can further improve performance by training on ImageNet-sized datasets or using pretrained weights.