# **Cnn Architecture**

**bp: Describe the purpose and benefits of pooling in CNNs.**

Pooling is a downsampling operation used in convolutional neural networks (CNNs) to reduce the dimensionality of feature maps. Its purpose and benefits include:

1. **Dimensionality Reduction:** Pooling reduces the spatial size of the feature maps, which decreases the number of parameters and computations in the network, leading to faster training and reduced risk of overfitting.
2. **Translation Invariance:** Pooling helps make the network more invariant to small translations of the input image, improving the model's robustness to variations in the input.
3. **Feature Extraction:** By summarizing the features in a local neighborhood, pooling helps in extracting dominant features and discarding less relevant information, thus aiding in better feature extraction.

**ip: Explain the difference between min pooling and max pooling.**

- **Max Pooling:** This operation selects the maximum value from each local patch of the feature map. It highlights the most prominent features within each region.
  
  Example:
  \[
  \text{Input patch: } \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}, \text{ Max pooled value: } 4
  \]

- **Min Pooling:** This operation selects the minimum value from each local patch of the feature map. It can be used to highlight the least prominent features within each region.
  
  Example:
  \[
  \text{Input patch: } \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}, \text{ Min pooled value: } 1
  \]

### Padding in CNN

**Wp: Discuss the concept of padding in CNN and its significance.**

Padding involves adding extra pixels around the border of an input feature map before applying convolution operations. The significance of padding includes:

1. **Control Output Size:** Padding helps in controlling the spatial dimensions of the output feature maps. Without padding, the size of the feature map decreases after each convolution operation.
2. **Preserve Information:** By padding the input, information from the borders of the input image is preserved, allowing the convolutional layers to better utilize all parts of the input image.
3. **Maintain Spatial Dimensions:** In cases where it is necessary to maintain the same spatial dimensions between the input and output (e.g., for symmetric architectures), padding is essential.

**qp: Compare and contrast zero-padding and valid-padding in terms of their effects on the output feature map size.**

- **Zero-padding (same-padding):** This involves adding zeros around the border of the input feature map. It allows the convolution to be applied such that the output feature map has the same spatial dimensions as the input. This is useful when the input and output need to be of the same size.
  
  Example: 
  If the input is \(5 \times 5\) and we use a \(3 \times 3\) filter with zero-padding, the output will also be \(5 \times 5\).

- **Valid-padding:** This involves no padding. Convolution is applied only where the filter fits completely inside the input feature map. As a result, the output feature map is smaller than the input.
  
  Example: 
  If the input is \(5 \times 5\) and we use a \(3 \times 3\) filter with valid-padding, the output will be \(3 \times 3\).

### LeNet-5 Architecture

**bp: Provide a brief overview of the LeNet-5 architecture.**

LeNet-5 is one of the pioneering convolutional neural network architectures designed for handwritten digit recognition. It consists of the following layers:

1. **Input Layer:** Takes a \(32 \times 32\) grayscale image as input.
2. **C1 – Convolutional Layer:** 6 filters of size \(5 \times 5\), producing six \(28 \times 28\) feature maps.
3. **S2 – Subsampling Layer:** Average pooling with a \(2 \times 2\) filter and a stride of 2, producing six \(14 \times 14\) feature maps.
4. **C3 – Convolutional Layer:** 16 filters of size \(5 \times 5\), producing sixteen \(10 \times 10\) feature maps.
5. **S4 – Subsampling Layer:** Average pooling with a \(2 \times 2\) filter and a stride of 2, producing sixteen \(5 \times 5\) feature maps.
6. **C5 – Convolutional Layer:** 120 filters of size \(5 \times 5\), producing a \(1 \times 1 \times 120\) output.
7. **F6 – Fully Connected Layer:** 84 neurons.
8. **Output Layer:** Fully connected layer with 10 neurons (for 10 classes).

**ip: Describe the key components of LeNet-5 and their respective purposes.**

1. **Convolutional Layers (C1, C3, C5):** Extract spatial features by applying convolutional filters to the input. Each layer detects different types of features (edges, textures, etc.).
2. **Subsampling (Pooling) Layers (S2, S4):** Reduce the dimensionality of the feature maps, preserving the most important information and reducing computational complexity.
3. **Fully Connected Layers (F6, Output):** Integrate the features extracted by the convolutional layers to make final predictions. The output layer provides the class probabilities.

**Wp: Discuss the advantages and limitations of LeNet-5 in the context of image classification tasks.**

- **Advantages:**
  - Early example of a successful CNN architecture.
  - Simple and efficient for small datasets like MNIST.
  - Demonstrated the effectiveness of convolutional layers combined with pooling layers.

- **Limitations:**
  - Not scalable to larger and more complex datasets.
  - Lacks depth and capacity compared to modern CNN architectures.
  - Performance may not be competitive for high-resolution images or complex classification tasks.



**bp: Present an overview of the AlexNet architecture.**

AlexNet, designed by Alex Krizhevsky et al., is a deep convolutional neural network that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. The architecture consists of:

1. **Input Layer:** Takes a \(224 \times 224\) RGB image.
2. **5 Convolutional Layers:** Extract spatial features with different sizes and strides.
3. **Max Pooling Layers:** Reduce the dimensionality of the feature maps.
4. **3 Fully Connected Layers:**

In [4]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Add an extra dimension to the images and convert to float32
train_images = tf.expand_dims(train_images, axis=-1)
test_images = tf.expand_dims(test_images, axis=-1)

# Resize the images to 32x32
train_images = tf.image.resize(train_images, [32, 32])
test_images = tf.image.resize(test_images, [32, 32])

# Normalize the images
train_images = tf.cast(train_images, tf.float32) / 255.0
test_images = tf.cast(test_images, tf.float32) / 255.0

# One-hot encode the labels
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Define the LeNet-5 architecture
model = models.Sequential([
    layers.Conv2D(6, (5, 5), activation='relu', input_shape=(32, 32, 1)),
    layers.AveragePooling2D((2, 2)),
    layers.Conv2D(16, (5, 5), activation='relu'),
    layers.AveragePooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(120, activation='relu'),
    layers.Dense(84, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', 
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(train_images, train_labels, epochs=10, 
                    batch_size=64, 
                    validation_data=(test_images, test_labels))

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f'Test accuracy: {test_acc:.4f}')


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10


I0000 00:00:1722777911.149297    4272 service.cc:146] XLA service 0x7fbd84005640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1722777911.149397    4272 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 3050 Laptop GPU, Compute Capability 8.6
2024-08-04 18:55:11.203662: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-08-04 18:55:11.449514: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907



[1m 41/938[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m3s[0m 4ms/step - accuracy: 0.3222 - loss: 2.0875

I0000 00:00:1722777918.506602    4272 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 13ms/step - accuracy: 0.8254 - loss: 0.5588 - val_accuracy: 0.9725 - val_loss: 0.0847
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.9724 - loss: 0.0896 - val_accuracy: 0.9804 - val_loss: 0.0622
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9814 - loss: 0.0606 - val_accuracy: 0.9827 - val_loss: 0.0525
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.9856 - loss: 0.0462 - val_accuracy: 0.9843 - val_loss: 0.0478
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 6ms/step - accuracy: 0.9887 - loss: 0.0358 - val_accuracy: 0.9878 - val_loss: 0.0373
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 6ms/step - accuracy: 0.9906 - loss: 0.0317 - val_accuracy: 0.9872 - val_loss: 0.0414
Epoch 7/10
[1m938/938[0m [32m━━━━━

# **COMPLETE**