In [None]:
TOPIC: Understanding Pooling and Padding in CNN
    
1. Pooling in CNN (Convolutional Neural Networks):

Purpose: The purpose of pooling in CNN is to downsample the spatial dimensions (width and height) of feature maps while 
    retaining important information. Pooling helps reduce the computational complexity of the network, makes the model more 
    robust to small variations in input, and helps prevent overfitting by introducing a degree of translation invariance.
Benefits:
Dimensionality Reduction: Pooling reduces the size of feature maps, which reduces the number of parameters and computations in 
    subsequent layers.
Translation Invariance: Pooling helps the network recognize patterns regardless of their precise location in the input.
Feature Selection: By selecting the most relevant information from a local neighborhood, pooling focuses on essential features 
    while discarding less important ones.

2. Difference between Min Pooling and Max Pooling:

Max Pooling: In max pooling, for each pooling region (e.g., a 2x2 window), the maximum value within that region is selected as 
    the output value. It is often used to highlight the most important feature in a region.
Min Pooling: Min pooling is similar to max pooling, but it selects the minimum value within the pooling region instead of the 
    maximum. This can be useful in some applications where the presence of specific features is indicated by low values.

3. Padding in CNN:

Concept: Padding in CNN involves adding extra border pixels (usually zeros) around the input image or feature map before 
    applying convolution or pooling operations. Padding is used to control the spatial dimensions of the output feature maps.
Significance: Padding is essential for several reasons:
Preserving Spatial Dimensions: Padding ensures that the spatial dimensions of the output feature maps are the same or similar 
    to the input's spatial dimensions.
Edge Information: Without padding, the information at the edges of the input may be underrepresented in the output feature maps.
Control over Convolution Size: Padding allows us to control the size of the convolutional output.

4. Zero-padding vs. Valid-padding:

Zero-padding: Zero-padding adds zeros around the input feature map, increasing its spatial dimensions. It is often used when you
    want to maintain the spatial dimensions of the input in the output feature maps or when you need to apply convolutional 
    layers without reducing the spatial size.
Valid-padding: Valid-padding (also known as no padding) does not add any extra pixels around the input. It reduces the spatial 
    dimensions of the output feature maps compared to the input. This is commonly used when you want to reduce the spatial 
    dimensions, which typically occurs in later layers of a CNN.
In summary, zero-padding maintains spatial dimensions, while valid-padding reduces them. The choice between these padding types 
depends on the specific architectural requirements of the CNN and the desired output size.

In [None]:
TOPIC: Exploring LeNet
    
1. Overview of LeNet-5 Architecture:
LeNet-5 is a convolutional neural network (CNN) architecture developed by Yann LeCun and his colleagues in the 1990s. It is one 
of the pioneering CNN architectures and was designed for handwritten digit recognition, making it particularly well-suited for 
image classification tasks. LeNet-5 played a crucial role in the development of modern CNNs.

2. Key Components of LeNet-5:
LeNet-5 consists of the following key components:

Input Layer: Accepts grayscale images with a fixed size (usually 32x32 or 28x28 pixels).

Convolutional Layers: LeNet-5 has two sets of convolutional layers:

The first convolutional layer applies a 5x5 filter with a ReLU activation function.
The second convolutional layer applies a 5x5 filter with a ReLU activation.
Both layers are followed by 2x2 max-pooling layers.
Fully Connected Layers: After the convolutional layers, LeNet-5 has three fully connected layers:

The first fully connected layer has 120 neurons with a ReLU activation.
The second fully connected layer has 84 neurons with a ReLU activation.
The final output layer has the number of neurons equal to the number of classes in the dataset.
Activation Functions: LeNet-5 primarily uses the ReLU activation function, which introduces non-linearity into the network.

Pooling Layers: LeNet-5 employs max-pooling layers after the convolutional layers, helping to downsample feature maps and reduce
    computational complexity.

3. Advantages and Limitations of LeNet-5:

Advantages:
LeNet-5 was groundbreaking at its time and demonstrated the effectiveness of CNNs for image classification.
It introduced the concept of convolutional layers and weight sharing, which are fundamental to modern CNN architectures.
Well-suited for simple image classification tasks, such as handwritten digit recognition.
Limitations:
Limited capacity: LeNet-5 may not perform well on more complex tasks with larger and diverse datasets.
Shallow architecture: Compared to modern CNNs, it has a relatively shallow architecture, which may not capture intricate 
    features in complex images.
Not suitable for large, high-resolution images due to its fixed input size.

In [2]:
# 4.  Implementing and Training LeNet-5:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from sklearn.metrics import accuracy_score

In [3]:
# Define LeNet-5 architecture
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(-1, 16 * 4 * 4)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [4]:
# Load and preprocess MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100%|███████████████████████████████████████████████████████████████████| 9912422/9912422 [00:02<00:00, 3890781.77it/s]


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|███████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 4123768.30it/s]


Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|███████████████████████████████████████████████████████████████████| 1648877/1648877 [00:00<00:00, 2654583.72it/s]


Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████████████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<?, ?it/s]

Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw






In [5]:
# Initialize LeNet-5 model, loss function, and optimizer
model = LeNet5()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [6]:
# Training loop
epochs = 10
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch + 1}, Loss: {running_loss / len(train_loader)}")

Epoch 1, Loss: 0.26368189649904633
Epoch 2, Loss: 0.06924514895774671
Epoch 3, Loss: 0.04824028703518637
Epoch 4, Loss: 0.03878923371809769
Epoch 5, Loss: 0.03059123864248226
Epoch 6, Loss: 0.028147415411266398
Epoch 7, Loss: 0.02500297809285591
Epoch 8, Loss: 0.0213476820416017
Epoch 9, Loss: 0.018093869924655094
Epoch 10, Loss: 0.016143703104969234


In [7]:
# Evaluation on the test dataset
model.eval()
predictions = []
true_labels = []
with torch.no_grad():
    for inputs, labels in test_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        predictions.extend(predicted.cpu().numpy())
        true_labels.extend(labels.cpu().numpy())

In [8]:
# Calculate accuracy
accuracy = accuracy_score(true_labels, predictions)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Test Accuracy: 98.93%
