**Question 1:** What is a Convolutional Neural Network (CNN), and how does it differ from
traditional fully connected neural networks in terms of architecture and performance on
image data?
- A Convolutional Neural Network (CNN) is a neural network designed for grid-like data, such as images whereas fully connected networks that flatten images into vectors, CNNs preserve the spatial structure by using convolutional layers with small filters.

- **Architectural differnce**:CNNs include convolutional layers for feature extraction and pooling layers to reduce dimensions and introduce translation invariance. Fully connected networks lack this hierarchical feature learning and treat all inputs equally, making them inefficient for large images.

- **Performance-wise**: CNNs are far superior on image data.CNN an detect edges, shapes, and complex patterns hierarchically, handle larger images efficiently, and generalize better due to fewer parameters much better than fully connected neural networks.

**Question 2:** Discuss the architecture of LeNet-5 and explain how it laid the foundation
for modern deep learning models in computer vision. Include references to its original
research paper.
- It processes 32×32 grayscale images through a series of layers of convolutional layers extract local features, pooling layers reduce spatial dimensions and add translation invariance. It has a total of 7 layers.
**Foundation for deep learning model**:LeNet-5 introduced has features like local receptive fields, weight sharing, hierarchical feature extraction, and spatial pooling, which are central to all modern CNN architectures. These principles made training deep networks on image data feasible and effective.
- **Refernece**:Reference:
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

**Question 3:** Compare and contrast AlexNet and VGGNet in terms of design principles,
number of parameters, and performance. Highlight key innovations and limitations of
each.
- AlexNet was designed to handle large-scale image classification (ImageNet) and popularized deep CNNs. It uses 8 layers (5 convolutional, 3 fully connected) with ReLU activations and dropout to reduce overfitting. VGGNet emphasizes simplicity and depth by stacking small 3×3 convolutional filters repeatedly, resulting in very deep networks (16 or 19 layers) with uniform design.

- **Number of Parameters:**
AlexNet has around 60 million parameters, largely due to its large fully connected layers. VGGNet has 138 million parameters (VGG-16), because of the deep stack of convolutional layers combined with fully connected layers, making it heavier and more memory-intensive.

- **Performance:**
AlexNet won the 2012 ImageNet competition, achieving a top-5 error of 15.3%, demonstrating the effectiveness of deep CNNs and GPUs for training. VGGNet improved accuracy further with a top-5 error of 7.3%,

- **Key Innovations:**
AlexNet introduced ReLU activations, dropout, and GPU-accelerated training for deep networks. VGGNet’s main innovation was using very small (3×3) convolution filters and increasing depth systematically to improve feature representation.

- **Limitations:**
AlexNet is less uniform and more ad-hoc in design, with relatively large filters and fewer layers, limiting feature richness. VGGNet is computationally expensive and memory-intensive, making it harder to deploy in resource-constrained environments.

**Question 4:** What is transfer learning in the context of image classification? Explain
how it helps in reducing computational costs and improving model performance with
limited data.
- **Transfer learning**:Transfer learning in image classification is a technique where a neural network trained on a large dataset, like ImageNet, is reused as a starting point for a different but related task. Instead of training a model from scratch, we take the pretrained network and either fine-tune it on your smaller dataset or use it as a fixed feature extractor.

- **Cost**:This approach reduces computational costs because most of the network’s parameters are already learned, so you don’t need to perform full training.
- **Performance**:It also improves performance on limited data because the model has already learned general features like edges and shapes which can be adapted to the new task, making it more accurate and robust even with fewer labeled examples.


**Question 5:** Describe the role of residual connections in ResNet architecture. How do
they address the vanishing gradient problem in deep CNNs?
- **Role of residual connection in ResNet**:Residual connections in ResNet are shortcut connections that skip one or more layers and directly add the input of a layer to its output. Instead of learning a full mapping, each residual block learns a residual function.This allows the network to focus on learning only the changes needed at each layer rather than the complete transformation.

- **address the vanishing gradient problem in deep CNNs**:These connections address the vanishing gradient problem by providing a direct path for gradients to flow backward during training.Residual connections ensure that gradients can bypass certain layers, maintaining sufficient magnitude for effective weight updates and enabling training of extremely deep networks, sometimes exceeding hundreds of layers.


**Question 6:** Implement the LeNet-5 architectures using Tensorflow or PyTorch to
classify the MNIST dataset. Report the accuracy and training time.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import time
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(256, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.pool(x)
        x = self.relu(self.conv2(x))
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LeNet5().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
start_time = time.time()
epochs = 10
for epoch in range(epochs):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print(f"Epoch {epoch+1}/{epochs}, Batch {batch_idx}/{len(train_loader)}, Loss: {loss.item()}")
end_time = time.time()
training_time = end_time - start_time
print(f"Training time: {training_time} seconds")
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for data, target in test_loader:
        data, target = data.to(device), target.to(device)
        output = model(data)
        _, predicted = torch.max(output.data, 1)
        total += target.size(0)
        correct += (predicted == target).sum().item()
accuracy = 100 * correct / total
print(f"Test accuracy: {accuracy}%")

100%|██████████| 9.91M/9.91M [00:00<00:00, 53.0MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.62MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 13.4MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 6.12MB/s]


Epoch 1/10, Batch 0/938, Loss: 2.2957444190979004
Epoch 1/10, Batch 100/938, Loss: 0.18889173865318298
Epoch 1/10, Batch 200/938, Loss: 0.262501060962677
Epoch 1/10, Batch 300/938, Loss: 0.13307881355285645
Epoch 1/10, Batch 400/938, Loss: 0.25817835330963135
Epoch 1/10, Batch 500/938, Loss: 0.07323181629180908
Epoch 1/10, Batch 600/938, Loss: 0.18460804224014282
Epoch 1/10, Batch 700/938, Loss: 0.11794096976518631
Epoch 1/10, Batch 800/938, Loss: 0.0990828275680542
Epoch 1/10, Batch 900/938, Loss: 0.13926441967487335
Epoch 2/10, Batch 0/938, Loss: 0.1396399736404419
Epoch 2/10, Batch 100/938, Loss: 0.13415051996707916
Epoch 2/10, Batch 200/938, Loss: 0.04058391973376274
Epoch 2/10, Batch 300/938, Loss: 0.10102100670337677
Epoch 2/10, Batch 400/938, Loss: 0.08717747032642365
Epoch 2/10, Batch 500/938, Loss: 0.1171073168516159
Epoch 2/10, Batch 600/938, Loss: 0.009262421168386936
Epoch 2/10, Batch 700/938, Loss: 0.047096285969018936
Epoch 2/10, Batch 800/938, Loss: 0.10298836976289749
E

**Question 7:** Use a pre-trained VGG16 model (via transfer learning) on a small custom
dataset (e.g., flowers or animals). Replace the top layers and fine-tune the model.
Include your code and result discussion.

In [6]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader
import time

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])
])
val_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])
])
train_data = datasets.ImageFolder('data/train', transform=train_transforms)
val_data = datasets.ImageFolder('data/val', transform=val_transforms)
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)
val_loader = DataLoader(val_data, batch_size=32, shuffle=False)
num_classes = len(train_data.classes)
model = models.vgg16(weights='IMAGENET1K_V1')
for param in model.features.parameters():
    param.requires_grad = False
model.classifier = nn.Sequential(
    nn.Linear(25088, 4096),
    nn.ReLU(inplace=True),
    nn.Dropout(0.5),
    nn.Linear(4096, 1024),
    nn.ReLU(inplace=True),
    nn.Dropout(0.5),
    nn.Linear(1024, num_classes)
)
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=0.0005)
epochs = 5
start_time = time.time()
for epoch in range(epochs):
    model.train()
    running_loss, correct, total = 0.0, 0, 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        _, preds = torch.max(outputs, 1)
        correct += torch.sum(preds == labels).item()
        total += labels.size(0)
    train_acc = 100 * correct / total
    print(f"Epoch [{epoch+1}/{epochs}] Loss: {running_loss/len(train_loader):.4f} Train Acc: {train_acc:.2f}%")
training_time = time.time() - start_time
model.eval()
correct, total = 0, 0
with torch.no_grad():
    for images, labels in val_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, preds = torch.max(outputs, 1)
        correct += torch.sum(preds == labels).item()
        total += labels.size(0)
val_acc = 100 * correct / total
print(f"\nValidation Accuracy: {val_acc:.2f}%")
print(f"Training Time: {training_time:.2f} seconds")



FileNotFoundError: [Errno 2] No such file or directory: 'data/train'

**Question 8:** Write a program to visualize the filters and feature maps of the first
convolutional layer of AlexNet on an example input image.

In [5]:
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

alexnet = models.alexnet(weights='IMAGENET1K_V1')
alexnet.eval()
img_path = 'sample.jpg'
image = Image.open(img_path).convert('RGB')

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

input_tensor = transform(image).unsqueeze(0)
first_conv = alexnet.features[0]
weights = first_conv.weight.data.clone()
weights = (weights - weights.min()) / (weights.max() - weights.min())

plt.figure(figsize=(12, 6))
for i in range(8):
    plt.subplot(2, 4, i+1)
    w = weights[i].permute(1, 2, 0).numpy()
    plt.imshow(w)
    plt.axis('off')
    plt.title(f'Filter {i+1}')
plt.suptitle("AlexNet - First Conv Layer Filters", fontsize=14)
plt.show()

with torch.no_grad():
    feature_maps = first_conv(input_tensor)

feature_maps = feature_maps.squeeze(0)

feature_maps = (feature_maps - feature_maps.min()) / (feature_maps.max() - feature_maps.min())

plt.figure(figsize=(12, 12))
for i in range(8):
    plt.subplot(3, 3, i+1)
    plt.imshow(feature_maps[i].cpu().numpy(), cmap='viridis')
    plt.axis('off')
    plt.title(f'Map {i+1}')
plt.suptitle("AlexNet - First Conv Layer Feature Maps", fontsize=14)
plt.show()



FileNotFoundError: [Errno 2] No such file or directory: 'sample.jpg'

**Question 9:** Train a GoogLeNet (Inception v1) or its variant using a standard dataset
like CIFAR-10. Plot the training and validation accuracy over epochs and analyze
overfitting or underfitting.

In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import time

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

transform_train = transforms.Compose([
    transforms.Resize((96,96)),  # smaller than 224 for speed
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914,0.4822,0.4465),(0.2023,0.1994,0.2010))
])
transform_test = transforms.Compose([
    transforms.Resize((96,96)),
    transforms.ToTensor(),
    transforms.Normalize((0.4914,0.4822,0.4465),(0.2023,0.1994,0.2010))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False)

model = torchvision.models.googlenet(weights='IMAGENET1K_V1', aux_logits=False)
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, 10)
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

epochs = 5
train_acc_list, val_acc_list = [], []

start_time = time.time()
for epoch in range(epochs):
    model.train()
    correct, total = 0, 0
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        _, predicted = torch.max(outputs.data,1)
        total += labels.size(0)
        correct += (predicted==labels).sum().item()
    train_acc = 100*correct/total
    train_acc_list.append(train_acc)

    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data,1)
            total += labels.size(0)
            correct += (predicted==labels).sum().item()
    val_acc = 100*correct/total
    val_acc_list.append(val_acc)

    print(f"Epoch [{epoch+1}/{epochs}] Train Acc: {train_acc:.2f}% Val Acc: {val_acc:.2f}%")

training_time = time.time() - start_time
print(f"Training time: {training_time:.2f}s")

plt.figure(figsize=(8,6))
plt.plot(range(1, epochs+1), train_acc_list, label='Train Accuracy')
plt.plot(range(1, epochs+1), val_acc_list, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.title('GoogLeNet Fine-tuning on CIFAR-10')
plt.legend()
plt.show()


ValueError: The parameter 'aux_logits' expected value True but got False instead.

**Question 10:** You are working in a healthcare AI startup. Your team is tasked with
developing a system that automatically classifies medical X-ray images into normal,
pneumonia, and COVID-19. Due to limited labeled data, what approach would you
suggest using among CNN architectures discussed (e.g., transfer learning with ResNet
or Inception variants)? Justify your approach and outline a deployment strategy for
production use.
- I would use a transfer learning with ResNet as These models are already trained on large datasets (ImageNet) and can extract robust features from images, which is crucial when labeled medical X-ray data is limited.
- **Data Preparation**:Preprocess X-ray images by resizing, normalizing, and augmenting (contrast adjustments) to increase dataset diversity. This improves model generalization and mitigates overfitting due to small datasets.
- **Model Training**:Freeze the initial convolutional layers of the pretrained model and retrain the top layers on your X-ray dataset. Optionally, gradually unfreeze deeper layers for fine-tuning. Use categorical cross-entropy as the loss and monitor metrics like accuracy, precision and F1-score for evaluation.
- **Validation Strategy**:Validate the model using k-fold cross-validation or a separate validation set to ensure robustness of the model.
- **Deployment** DEploy the model using streamlit or flask.
- **Monitor**: monitor the data and change continuoulsy for beter performance.