# Assignment

Q.1. What is a Convolutional Neural Network (CNN), and how does it differ from
traditional fully connected neural networks in terms of architecture and performance on
image data?

Answer ->>

**Convolutional Neural Network :**

A Convolutional Neural Network (CNN) is a deep learning model designed primarily for processing grid-like data such as images, using convolutional layers to automatically extract hierarchical features like edges, textures, and objects.

**Architecture Differences :**
CNNs employ convolutional layers with sliding filters (kernels) that apply shared weights across local regions of the input, reducing parameters compared to fully connected neural networks (FCNNs), where every neuron connects to all neurons in the prior layer. Pooling layers in CNNs further downsample feature maps for efficiency and translation invariance, while FCNNs lack spatial awareness and treat inputs as flat vectors. CNNs often end with fully connected layers for classification, blending local feature extraction with global decisions.

**Performance on Images Diffrerences :**
CNNs outperform FCNNs on image data due to fewer parameters, enabling faster training and lower overfitting risk on high-dimensional inputs like 224x224x3 images, which would require millions of weights in FCNNs. Parameter sharing and local connectivity in CNNs capture spatial hierarchies effectively, unlike FCNNs' sensitivity to pixel shifts without positional context.

Q.2. Discuss the architecture of LeNet-5 and explain how it laid the foundation
for modern deep learning models in computer vision. Include references to its original
research paper.

Answer ->>

LeNet-5 is a pioneering convolutional neural network architecture introduced in 1998 by Yann LeCun and colleagues, designed for handwritten digit recognition, particularly the MNIST dataset. It contains seven layers excluding the input layer, comprising a mix of convolutional layers, subsampling (pooling) layers, and fully connected layers, establishing the foundational structure for modern CNNs.

**Architecture of LeNet-5**

The network features:

1. Convolutional layers (C1 and C3): C1 has 6 feature maps created by applying 5x5 filters to the input image (32x32 pixels) generating 28x28 feature maps. C3 has 16 feature maps that connect selectively to previous layer maps to encourage feature diversity.

2. Subsampling layers (S2 and S4): These perform average pooling with 2x2 windows and stride 2 to reduce spatial size and improve robustness to distortions.

3. Convolutional layer (C5): This layer has 120 feature maps connected in a fully connected manner but using convolution filters, yielding 1x1 outputs.

4. Fully connected layers (F6 and Output): The F6 layer has 84 units, and the output layer applies softmax to classify digits into 10 categories.

**Foundation for Modern Deep Learning :**


LeNet-5's architecture laid the groundwork by showing that automatic feature extraction through learned filters can outperform hand-engineered features in computer vision. Its core principles of convolutional layers, parameter sharing, local receptive fields, and pooling inspired later models like AlexNet and ResNet. This architecture demonstrated how hierarchical feature learning enables efficient training and accurate recognition on visual tasks.

**Reference to Original Research Paper :**

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner
“Gradient-Based Learning Applied to Document Recognition”
Proceedings of the IEEE, 86(11): 2278–2324, 1998.

Q.3. Compare and contrast AlexNet and VGGNet in terms of design principles,
number of parameters, and performance. Highlight key innovations and limitations of
each.

Answer ->>

**Design Principles :**

AlexNet uses larger 11x11 and 5x5 convolutional filters in its 8-layer structure, incorporates ReLU activations for faster training, overlapping pooling, and data augmentation like random cropping to combat overfitting. VGGNet emphasizes simplicity and depth with uniform 3x3 filters across 16 (VGG16) or 19 (VGG19) layers, relying on stacked small convolutions to approximate larger receptive fields while maintaining spatial resolution through stride-1 convolutions.

**Parameters and Performance :**

AlexNet has about 60 million parameters, balancing depth and compute for top-5 ImageNet error of 15.3%. VGGNet variants reach 138 million (VGG19), yielding better top-5 error around 7% but demanding far more training time and memory (over 500MB model size).

**Key Innovations and Limitations :**

- AlexNet innovations: Pioneered GPU parallelization (split across two GPUs), dropout in fully connected layers, and demonstrated deep CNN viability beyond LeNet. Limitations: Fewer layers limit feature hierarchy depth; larger kernels increase early computation.​

- VGGNet innovations: Uniform architecture enables easy scaling and transfer learning; proved deeper networks with small filters boost accuracy. Limitations: Excessive depth causes vanishing gradients without advanced techniques; naive stacking ignores efficiency.​

Q.4. What is transfer learning in the context of image classification? Explain
how it helps in reducing computational costs and improving model performance with
limited data.

Answer ->>

Transfer learning in image classification involves reusing a pre-trained model, typically trained on a large dataset like ImageNet, by fine-tuning it for a new, related task instead of training from scratch.​

**How It Reduces Computational Costs :**
Pre-trained models provide learned features (e.g., edges, textures) from earlier layers, allowing practitioners to freeze those layers and train only the final classification layers, which slashes training time from weeks to hours and requires less GPU/CPU power. This approach leverages billions of parameters already optimized, avoiding the high costs of processing massive datasets anew.​

**Improving Performance with Limited Data :**
With small datasets, models often overfit; transfer learning counters this by starting with robust, generalizable features that boost accuracy even on few samples, as the pre-trained backbone captures transferable visual hierarchies. Fine-tuning adapts these features to the target domain, yielding better generalization than random initialization, especially for tasks like medical imaging or custom object detection.​

Common Approaches :     
- Feature extraction: Freeze base layers, add new classifier; ideal for very limited data.

- Fine-tuning: Unfreeze top layers gradually for domain adaptation; balances speed and customization.​

Q.5. Describe the role of residual connections in ResNet architecture. How do
they address the vanishing gradient problem in deep CNNs?

Answer ->>

Residual connections in ResNet architecture are shortcut pathways that add the input
x
x directly to the output of stacked layers
F
(
x
)
F(x), forming the residual block output
x
+
F
(
x
)
x+F(x), where the network learns the residual function
F
F rather than the full mapping.​

**Role in ResNet :**

These connections enable training of very deep networks (e.g., ResNet-152) by allowing identity mappings, where layers can learn perturbations around the input, simplifying optimization and improving feature reuse across depths. They create direct information highways, preserving spatial details and gradients throughout the network.​

**Addressing Vanishing Gradients :**
In deep CNNs, gradients diminish during backpropagation due to repeated multiplications by weights near 1, causing vanishing gradients that stall learning in later layers. Residual connections bypass layers via skip paths, ensuring constant gradient flow (at least 1 from the identity), which stabilizes training and allows deeper architectures without degradation. This results in better convergence and accuracy on tasks like ImageNet classification.

Q.6. Implement the LeNet-5 architectures using Tensorflow or PyTorch to
classify the MNIST dataset. Report the accuracy and training time.

Answer ->>

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import time

# Define LeNet-5 architecture
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5, padding=2)  # 32x32 input, pad=2 for same size
        self.tanh1 = nn.Tanh()
        self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)  # input 14x14, output 10x10
        self.tanh2 = nn.Tanh()
        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)

        self.conv3 = nn.Conv2d(16, 120, kernel_size=5)  # input 5x5, output 1x1
        self.tanh3 = nn.Tanh()

        self.fc1 = nn.Linear(120, 84)
        self.tanh4 = nn.Tanh()
        self.fc2 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool1(self.tanh1(self.conv1(x)))
        x = self.pool2(self.tanh2(self.conv2(x)))
        x = self.tanh3(self.conv3(x))
        x = x.view(-1, 120)
        x = self.tanh4(self.fc1(x))
        x = self.fc2(x)
        return x

# Data preparation with resizing to 32x32 as LeNet-5 expects
transform = transforms.Compose([
    transforms.Resize((32,32)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_set = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_set = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=1000, shuffle=False)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LeNet5().to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
start_time = time.time()
for epoch in range(num_epochs):
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

# Testing accuracy
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

end_time = time.time()
accuracy = 100 * correct / total
training_time = end_time - start_time

print(f'Test Accuracy: {accuracy:.2f}%')
print(f'Training Time: {training_time:.2f} seconds')


Q.7.  Use a pre-trained VGG16 model (via transfer learning) on a small custom
dataset (e.g., flowers or animals). Replace the top layers and fine-tune the model.
Include your code and result discussion.

Answer ->>

In [None]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam

# -----------------------------
# 1. DATA PREPROCESSING
# -----------------------------
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)

train_gen = train_datagen.flow_from_directory(
    "dataset/train",         # <-- Change your path here
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

val_datagen = ImageDataGenerator(rescale=1./255)

val_gen = val_datagen.flow_from_directory(
    "dataset/val",          # <-- Change your path here
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

num_classes = train_gen.num_classes

# -----------------------------
# 2. LOAD PRETRAINED VGG16 MODEL
# -----------------------------
base_model = VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(224, 224, 3)
)

# Freeze layers initially
for layer in base_model.layers:
    layer.trainable = False

# -----------------------------
# 3. BUILD CUSTOM CLASSIFIER HEAD
# -----------------------------
model = Sequential([
    base_model,
    Flatten(),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(num_classes, activation='softmax')
])

model.compile(
    optimizer=Adam(1e-3),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

# -----------------------------
# 4. TRAIN TOP LAYERS (FEATURE EXTRACTION)
# -----------------------------
history = model.fit(
    train_gen,
    validation_data=val_gen,
    epochs=10
)

# -----------------------------
# 5. FINE-TUNE LAST FEW LAYERS OF VGG16
# -----------------------------
for layer in base_model.layers[-4:]:  # unfreeze last 4 conv layers
    layer.trainable = True

model.compile(
    optimizer=Adam(1e-5),  # lower LR for fine-tuning
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

history_fine = model.fit(
    train_gen,
    validation_data=val_gen,
    epochs=10
)

# -----------------------------
# 6. FINAL EVALUATION
# -----------------------------
loss, accuracy = model.evaluate(val_gen)
print(f"\nFinal Validation Accuracy: {accuracy * 100:.2f}%")



Q.8. Write a program to visualize the filters and feature maps of the first
convolutional layer of AlexNet on an example input image.

Answer ->>

In [None]:
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

# -----------------------------------------
# 1. Load Pretrained AlexNet
# -----------------------------------------
alexnet = models.alexnet(weights=models.AlexNet_Weights.IMAGENET1K_V1)
alexnet.eval()  # inference mode

# First convolution layer
conv1 = alexnet.features[0]   # Conv2d(3, 64, kernel_size=11, stride=4, padding=2)

# -----------------------------------------
# 2. Visualize Filters of First Conv Layer
# -----------------------------------------
filters = conv1.weight.data.clone()

# Normalize filters to [0,1] for visualization
min_val = filters.min()
max_val = filters.max()
filters = (filters - min_val) / (max_val - min_val)

# Plot filters
fig = plt.figure(figsize=(12, 12))
for i in range(1, 65):   # 64 filters
    ax = fig.add_subplot(8, 8, i)
    ax.imshow(filters[i-1].permute(1, 2, 0))
    ax.axis("off")

plt.suptitle("AlexNet Conv1 Filters", fontsize=20)
plt.show()

# -----------------------------------------
# 3. Preprocess Input Image
# -----------------------------------------
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

# Load your image
img_path = "sample.jpg"   # <-- Put your own image here
img = Image.open(img_path).convert("RGB")
input_tensor = transform(img).unsqueeze(0)

# -----------------------------------------
# 4. Pass Image Through First Conv Layer
# -----------------------------------------
with torch.no_grad():
    feature_maps = conv1(input_tensor)

# -----------------------------------------
# 5. Visualize Feature Maps (Activation Maps)
# -----------------------------------------
fm = feature_maps.squeeze(0)

fig = plt.figure(figsize=(12, 12))
for i in range(1, 65):  # 64 feature maps
    ax = fig.add_subplot(8, 8, i)
    ax.imshow(fm[i-1].cpu().numpy(), cmap="gray")
    ax.axis("off")

plt.suptitle("AlexNet Conv1 Feature Maps", fontsize=20)
plt.show()


Q.9. Train a GoogLeNet (Inception v1) or its variant using a standard dataset
like CIFAR-10. Plot the training and validation accuracy over epochs and analyze
overfitting or underfitting.

Answer ->>

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import time

# Inception Module (core of GoogLeNet)
class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
        # 1x1 branch
        self.branch1 = nn.Conv2d(in_channels, ch1x1, kernel_size=1)

        # 3x3 branch (with 1x1 reduction)
        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, ch3x3red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1)
        )

        # 5x5 branch (with 1x1 reduction)
        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, ch5x5red, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(ch5x5red, ch5x5, kernel_size=5, padding=2)
        )

        # Pooling branch
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_proj, kernel_size=1)
        )

    def forward(self, x):
        b1 = F.relu(self.branch1(x))
        b2 = F.relu(self.branch2(x))
        b3 = F.relu(self.branch3(x))
        b4 = F.relu(self.branch4(x))
        return torch.cat([b1, b2, b3, b4], 1)

# Simplified GoogLeNet (Inception v1) for CIFAR-10 (32x32 images)
class GoogLeNet(nn.Module):
    def __init__(self, num_classes=10):
        super(GoogLeNet, self).__init__()
        self.stem = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.Conv2d(64, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )

        self.inceptions = nn.Sequential(
            Inception(192, 64, 96, 128, 16, 32, 32),      # 128 channels total
            Inception(256, 128, 128, 192, 32, 96, 64),     # 480 channels total
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            Inception(480, 192, 96, 208, 16, 48, 64),      # 512 channels total
            Inception(512, 160, 112, 224, 24, 64, 64),     # 512 channels total
            Inception(512, 128, 128, 256, 24, 64, 64),     # 528 channels total
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            Inception(528, 256, 160, 320, 32, 128, 128),   # 832 channels total
            Inception(832, 384, 192, 384, 48, 128, 128),   # 1024 channels total
        )

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.2)
        self.fc = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.stem(x)           # 192 x 8 x 8
        x = self.inceptions(x)     # 1024 x 1 x 1 (after adaptive pooling)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)
        return x

# Data loading and preprocessing for CIFAR-10
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

# Training setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GoogLeNet(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

# Training loop with accuracy tracking
num_epochs = 50
train_losses, train_accs = [], []
val_accs = []

start_time = time.time()
for epoch in range(num_epochs):
    # Training
    model.train()
    running_loss, correct, total = 0.0, 0, 0
    for inputs, labels in trainloader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    train_loss = running_loss / len(trainloader)
    train_acc = 100. * correct / total
    train_losses.append(train_loss)
    train_accs.append(train_acc)

    # Validation
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for inputs, labels in testloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    val_acc = 100. * correct / total
    val_accs.append(val_acc)

    scheduler.step()
    print(f'Epoch {epoch+1}: Train Loss={train_loss:.4f}, Train Acc={train_acc:.2f}%, Val Acc={val_acc:.2f}%')

training_time = time.time() - start_time
print(f'\nTraining completed in {training_time/60:.2f} minutes')
print(f'Final Test Accuracy: {val_accs[-1]:.2f}%')

# Plot training curves
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(train_accs, label='Train Acc')
plt.plot(val_accs, label='Val Acc')
plt.title('Accuracy over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(train_losses, label='Train Loss')
plt.title('Training Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.savefig('googlenet_cifar10_training.png', dpi=150, bbox_inches='tight')
plt.show()


Q.10. You are working in a healthcare AI startup. Your team is tasked with
developing a system that automatically classifies medical X-ray images into normal,
pneumonia, and COVID-19. Due to limited labeled data, what approach would you
suggest using among CNN architectures discussed (e.g., transfer learning with ResNet
or Inception variants)? Justify your approach and outline a deployment strategy for
production use.

Answer ->>

**Recommended Approach: Transfer Learning with ResNet-50**

For classifying medical X-ray images (normal, pneumonia, COVID-19) with limited labeled data, use transfer learning with ResNet-50. ResNet-50 outperforms Inception variants in medical imaging tasks (93.81% vs 91.76% accuracy on similar 3-class fundus classification) due to residual connections that enable stable training on small datasets without vanishing gradients.​

**Key Benefits for Limited Data :**

- Pre-trained ImageNet features capture edges/textures relevant to X-rays

- Freeze early layers, fine-tune last 20-30% → 90%+ accuracy with 1-5k samples

- Robust to class imbalance (COVID-19 often underrepresented)​

**Implementation Strategy :**

Core approach using PyTorch

import torch

from torchvision.models import resnet50


model = resnet50(pretrained=True)

Modify for 3 classes + grayscale→RGB

model.fc = nn.Linear(model.fc.in_features, 3)

for param in model.parameters(): param.requires_grad = False

for param in model.layer4.parameters(): param.requires_grad = True  #
Fine-tune last block


**Training Pipeline :**

1. Data Prep: Resize X-rays to 224×224, convert grayscale→3 channels, heavy augmentation (rotation, brightness for X-ray variability)

2. Two-Stage Training: Feature extraction (5 epochs), fine-tuning top layers (10-15 epochs, lr=1e-4)

3. Metrics: F1-score per class (prioritize pneumonia/COVID recall), stratified k-fold validation

**Production Deployment Strategy :**

Healthcare AI Pipeline (MLOps):

├── Data Pipeline: DICOM→PNG preprocessing, online augmentation

├── Model Serving: FastAPI + TorchServe (50ms inference)

├── Monitoring: Class drift detection, accuracy per hospital scanner

├── Explainability: GradCAM heatmaps for clinician trust

└── Regulatory: Model cards, FDA 510(k) pathway documentation

**Deployment Architecture :**


step-1 : PACS System → Preprocessing Microservice → ResNet-50 (GPU pods)

    
step-2 : [Inference: 99.9% uptime] → GradCAM → Radiologist Dashboard
    
step-3 : Model Registry (MLflow) ← Retraining Pipeline (new X-rays)