### 1. Describe the purpose and benefits of pooling in CNN.

Pooling in Convolutional Neural Networks (CNNs) is used to reduce the spatial dimensions (width and height) of the input volume. This reduction is achieved through a pooling operation, which aggregates information over a region of the input feature map. The primary purposes and benefits of pooling include:

- **Dimensionality Reduction**: Reduces the number of parameters and computations in the network, making it more efficient.
- **Translation Invariance**: Helps the network become more robust to translations and distortions in the input.
- **Control Overfitting**: By reducing the dimensionality, pooling can help prevent overfitting.

### 2. Explain the difference between min pooling and max pooling

- **Max Pooling**: Takes the maximum value within a defined pooling window. It emphasizes the most prominent features within the region, which can help highlight strong activations.
  
- **Min Pooling**: Takes the minimum value within a defined pooling window. It is less commonly used but can be beneficial in certain applications where the smallest features are of interest.

### 3. Discuss the concept of padding in CNN and its significance.

Padding in CNNs involves adding extra pixels to the border of an input image or feature map. The significance of padding includes:

- **Control Output Size**: Padding helps control the spatial dimensions of the output feature maps.
- **Preserve Information at Borders**: Ensures that edge pixels are considered during convolution, preserving information that might otherwise be lost.
- **Maintain Spatial Dimensions**: In some cases, padding allows the input and output dimensions to remain the same, facilitating easier design of the network architecture.

### 4. Compare and contrast zero-padding and valid-padding in terms of their effects on the output feature map size.

- **Zero-padding**: Adds zeros to the border of the input. It increases the size of the input, allowing for more convolutions without shrinking the spatial dimensions. The output feature map size is the same as the input size when using a stride of 1.
  
- **Valid-padding**: No padding is added, meaning the convolution is only applied to valid regions of the input. The output feature map is smaller than the input, as edges are not padded and thus fewer convolutions can be performed.

---

### LeNet-5 Architecture

#### 1. Provide a brief overview of LeNet-5 architecture.

LeNet-5 is a pioneering Convolutional Neural Network (CNN) designed by Yann LeCun and others in 1998 for handwritten digit recognition (MNIST dataset). The architecture consists of seven layers (excluding the input), including convolutional layers, subsampling (pooling) layers, and fully connected layers.

#### 2. Describe the key components of LeNet-5 and their respective purposes.

- **Input Layer**: Takes a 32x32 pixel grayscale image.
- **C1 (Convolutional Layer)**: Applies six 5x5 filters, resulting in a 28x28x6 output.
- **S2 (Subsampling/Pooling Layer)**: Performs average pooling with a 2x2 filter, resulting in a 14x14x6 output.
- **C3 (Convolutional Layer)**: Applies sixteen 5x5 filters, resulting in a 10x10x16 output.
- **S4 (Subsampling/Pooling Layer)**: Performs average pooling with a 2x2 filter, resulting in a 5x5x16 output.
- **C5 (Convolutional Layer)**: Applies 120 5x5 filters, resulting in a 1x1x120 output.
- **F6 (Fully Connected Layer)**: 84 neurons, each connected to the previous layer.
- **Output Layer**: 10 neurons (one for each digit), using softmax activation for classification.

#### 3. Discuss the advantages and limitations of LeNet-5 in the context of image classification tasks.

**Advantages**:
- **Pioneering Design**: Demonstrated the effectiveness of CNNs for image classification.
- **Efficiency**: Designed to be computationally efficient for the hardware available at the time.
- **Good for Small Datasets**: Performs well on small-scale datasets like MNIST.

**Limitations**:
- **Limited Scalability**: Not suitable for larger and more complex datasets.
- **Outdated Techniques**: Modern CNNs use more advanced techniques like ReLU activations, dropout for regularization, and batch normalization.

### AlexNet Architecture

#### 1. Present an overview of the AlexNet architecture.

AlexNet is a deep convolutional neural network that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. It consists of five convolutional layers, followed by three fully connected layers, and uses ReLU activations and dropout for regularization.

#### 2. Explain the architectural innovations introduced in AlexNet that contributed to its breakthrough performance.

- **ReLU Activation**: Used ReLU activation functions instead of sigmoid or tanh, leading to faster training.
- **Dropout**: Employed dropout in fully connected layers to reduce overfitting.
- **Data Augmentation**: Applied data augmentation techniques like random cropping and flipping to increase the diversity of the training data.
- **GPU Utilization**: Took advantage of GPU acceleration for training large-scale deep networks.

#### 3. Discuss the role of convolutional layers, pooling layers, and fully connected layers in AlexNet.

- **Convolutional Layers**: Extract features from the input images by applying convolutional filters.
- **Pooling Layers**: Reduce the spatial dimensions of the feature maps, providing translation invariance and reducing the computational load.
- **Fully Connected Layers**: Perform classification based on the extracted features from the convolutional and pooling layers.


In [None]:

#### 4. Implement LeNet-5 using a deep learning framework of your choice (e.g., TensorFlow, PyTorch) and train it on a publicly available dataset (e.g., MNIST). Evaluate its performance and provide insights.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Define LeNet-5 architecture
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=0)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0)
        self.fc1 = nn.Linear(16*4*4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.avg_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.avg_pool2d(x, 2)
        x = x.view(-1, 16*4*4)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Data loading and preprocessing
transform = transforms.Compose([transforms.Resize((32, 32)), transforms.ToTensor()])
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

# Training setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LeNet5().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total}%')



In [None]:
# 4. Implement AlexNet using a deep learning framework of your choice and evaluate its performance on a dataset of your choice.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Define AlexNet architecture
class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, 10),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

# Data loading and preprocessing
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

# Training setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = AlexNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for i, (images, labels) in enumerate(train_loader):
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 99:  # Print every 100 mini-batches
            print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
            running_loss = 0.0

print('Finished Training')

# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')
