Question 1: What is a Convolutional Neural Network (CNN), and how does it differ from
traditional fully connected neural networks in terms of architecture and performance on
image data?
- A Convolutional Neural Network (CNN) is a specialized type of deep neural network designed primarily for image data (though also used in speech, video, and text). CNNs automatically and adaptively learn spatial hierarchies of features (edges, textures, objects) through convolutional operations, reducing the need for manual feature extraction.
- Difference :
1. Connections :
- ANN/MLP: Each neuron is connected to every neuron in the next layer (dense connections).
- CNN: Neurons connect only to a small local region of the input (receptive field).
2. Parameters :
- ANN/MLP: Requires a large number of parameters when input size is high (e.g., images).
- CNN: Has fewer parameters because the same filters (kernels) are shared across the image.
3. Feature Extraction
- ANN/MLP: Often requires manual feature engineering before training.
- CNN: Performs automatic feature extraction using convolutional filters.
4. Spatial Information
- ANN/MLP: Treats all inputs as independent features, ignoring spatial locality.
- CNN: Preserves spatial relationships (nearby pixels stay related in convolution).
5. Scalability
- ANN/MLP: Not suitable for high-dimensional data like large images (becomes computationally expensive).
- CNN: Highly scalable and efficient for image, video, and other structured data.
6. Performance on Images
- ANN/MLP: Performs poorly on image recognition tasks due to lack of spatial awareness.
- CNN: Achieves high accuracy in image classification, object detection, and segmentation.
7. Architecture Structure
- ANN/MLP: Made up of only fully connected layers.
- CNN: Includes convolutional layers, pooling layers, and finally fully connected layers for classification.

Question 2: Discuss the architecture of LeNet-5 and explain how it laid the foundation for modern deep learning models in computer vision. Include references to its original research paper.
- LeNet-5 is one of the earliest Convolutional Neural Networks (CNNs), proposed by Yann LeCun et al. (1998) in the paper “Gradient-Based Learning Applied to Document Recognition” (LeCun et al., 1998). It was designed primarily for handwritten digit recognition (MNIST dataset).
- Architecture of LeNet-5 (1998):
1. Input Layer
 - Input: 32 × 32 grayscale image.MNIST digits (28×28) were zero-padded to fit this input size.
2. C1 – Convolutional Layer
 - 6 filters (kernels) of size 5 × 5.Output: 28 × 28 × 6 feature maps.Detects simple local features (edges, corners).
3. S2 – Subsampling (Pooling) Layer
 - Average pooling with 2 × 2 filters and stride 2.Output: 14 × 14 × 6.Reduces dimensionality while preserving important features.
4. C3 – Convolutional Layer
 - 16 filters of size 5 × 5.Output: 10 × 10 × 16 feature maps.More complex features extracted.
5. S4 – Subsampling Layer
 - Average pooling with 2 × 2 filters.Output: 5 × 5 × 16.
6. C5 – Convolutional Layer
 - 120 filters of size 5 × 5 (fully connected to the previous layer).Output: 1 × 1 × 120 (flattened).
7. F6 – Fully Connected Layer
 - 84 neurons (inspired by biological neurons).Activation function: tanh.
8. Output Layer
 - 10 neurons (for digits 0–9).Activation: softmax (for classification).
- How LeNet-5 Laid the Foundation for Modern Deep Learning:
- Introduction of Convolution + Pooling
  - Showed that CNNs can automatically extract hierarchical features from images.Reduced dependency on manual feature engineering.
- Weight Sharing
  - Convolutional filters are shared across spatial positions → drastically reduced number of parameters.Enabled training on limited computational resources (1990s hardware).
- Hierarchical Feature Learning
  - From simple edges in early layers → complex shapes in deeper layers.Concept still used in modern CNNs (AlexNet, VGG, ResNet).
- Generalization to Vision Tasks
  - Though trained on handwritten digits, the idea extended to image classification, object detection, and recognition.
- Foundation for Modern CNNs
- AlexNet (2012), which revived deep learning, was directly inspired by LeNet-5’s design.Today’s architectures (ResNet, EfficientNet, Vision Transformers) still rely on concepts introduced by LeNet-5.

Question 3: Compare and contrast AlexNet and VGGNet in terms of design principles, number of parameters, and performance. Highlight key innovations and limitations of each.
1. Year & Contribution
- AlexNet (2012)
 - Proposed by Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton.Won ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012 by a huge margin.Marked the revival of deep learning in computer vision.
- VGGNet (2014)
  - Proposed by Karen Simonyan and Andrew Zisserman (Oxford VGG group).Focused on studying the effect of network depth on performance.Popularized the use of very deep networks with small filters (3×3).
2. Design Principles
- AlexNet:
 - 8 layers (5 convolutional + 3 fully connected).Used large filters (11×11, 5×5) in early layers.Used ReLU activation, which sped up training compared to sigmoid/tanh.Used Dropout for regularization and Data Augmentation (image translation, reflection).Introduced GPU training for large-scale models.
- VGGNet:
 - 16–19 layers (very deep compared to AlexNet).Used only 3×3 convolution filters, stacked to simulate larger receptive fields.Used 2×2 max pooling for downsampling.Fully connected layers at the end (similar to AlexNet).
Emphasized depth and simplicity as design choice.
3. Number of Parameters
- AlexNet: ~ 60 million parameters.
- VGGNet (VGG-16): ~ 138 million parameters (much heavier and memory intensive).
4. Performance (ImageNet Classification)
- AlexNet (2012):Top-5 error rate: 15.3%.First to dramatically outperform traditional computer vision methods.
- VGGNet (2014):Top-5 error rate: 7.3% (significant improvement).Depth allowed capturing more complex patterns.
5. Key Innovations
- AlexNet:First large-scale CNN for ImageNet.Introduced ReLU activation and Dropout.Showed importance of GPU acceleration.
- VGGNet:Introduced the idea of using small 3×3 filters repeatedly instead of large ones.Demonstrated that network depth is crucial for performance.Provided a simple, modular design (easy to generalize).
6. Limitations
- AlexNet:
 - Shallow by today’s standards (only 8 layers).Still had a very large number of parameters → prone to overfitting.Large filter sizes in early layers were less efficient.
- VGGNet:
 - Extremely computationally expensive (138M parameters, ~500MB model size).
Slow to train and memory intensive → not practical for deployment.
Despite depth, lacked techniques like residual connections (introduced later in ResNet).

Question 4: What is transfer learning in the context of image classification? Explain how it helps in reducing computational costs and improving model performance with limited data.
- Transfer Learning is a deep learning technique where a model trained on a large dataset (e.g., ImageNet with millions of images) is reused (transferred) for a new but related task with a smaller dataset.Instead of training from scratch, we use the pre-trained model’s learned features and adapt (fine-tune) it for the new problem.
- How It Works
1. Pre-trained Model
- A CNN (e.g., ResNet, VGG, Inception) trained on a large dataset learns general features such as edges, textures, shapes.
2. Feature Reuse
- These learned features are applicable to many vision tasks (e.g., cats vs. dogs, medical imaging).
3. Fine-tuning / Feature Extraction
- Feature Extraction: Freeze early layers, only train the classifier layers on new data.
- Fine-Tuning: Unfreeze some deeper layers and retrain them for the new dataset.
- How Transfer Learning Helps
1. Reduces Computational Cost
- Training large CNNs from scratch requires huge datasets and expensive GPUs. With transfer learning, only a small portion of the model is retrained, saving time and resources.
2. Improves Performance with Limited Data
- In many domains (e.g., medical imaging), labeled data is scarce.Transfer learning leverages knowledge from large datasets, boosting accuracy even with small datasets.
3. Faster Convergence
- Since the model already has good initial weights, training converges faster than training from scratch.
4. Better Generalization
- Pre-trained models provide robust, generalized feature representations, reducing overfitting when data is limited.
- Example : A CNN trained on ImageNet (1,000 categories) can be fine-tuned to classify different flowers, medical scans, or animal species with only a few thousand images.

Question 5: Describe the role of residual connections in ResNet architecture. How do they address the vanishing gradient problem in deep CNNs?   
1. Background Problem : As CNNs got deeper (e.g., VGG with 19 layers), researchers noticed that simply stacking more layers:
 - Did not always improve accuracy : Often caused vanishing/exploding gradients, making training unstable.Training error sometimes increased with depth (degradation problem).
 - ResNet (Residual Network) Solution : Proposed by He et al. (2015) in “Deep Residual Learning for Image Recognition”.Introduced Residual Connections (or Skip Connections).
- Role of Residual Connections
 - Definition
  - A residual block learns:y=F(x)+x
    - where:F(x) = transformation (convolution, batch norm, ReLU).
    - x = input (added directly to the output).
 - Shortcut/Skip Path
  - The input x bypasses (skips) one or more layers and is added to the output of those layers.If the learned mapping F(x) is zero, the block simply passes the input forward → acts like an identity mapping.
- How They Address Vanishing Gradient Problem
 - Gradient Flow Improvement : In backpropagation, the gradient can flow directly through the skip connection, avoiding getting too small.Prevents gradients from vanishing in very deep networks.
 - Easier Optimization : Instead of forcing layers to learn the full mapping H(x), residual blocks learn the residual function F(x)=H(x)−x.Learning residuals is easier and more stable than learning direct mappings.
 - Training Very Deep Networks : ResNet successfully trained networks with 50, 101, and even 152 layers, which was impossible before due to vanishing gradients.




In [1]:
#Question 6: Implement the LeNet-5 architectures using Tensorflow or PyTorch to classify the MNIST dataset. Report the accuracy and training time.
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import time

# ---------------------------
# LeNet-5 Model Definition
# ---------------------------
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)   # C1: 1x28x28 -> 6x24x24
        self.pool = nn.AvgPool2d(2, stride=2)         # S2: 6x24x24 -> 6x12x12
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)  # C3: 6x12x12 -> 16x8x8
        # S4: 16x8x8 -> 16x4x4 after pooling
        self.fc1 = nn.Linear(16*4*4, 120)             # C5
        self.fc2 = nn.Linear(120, 84)                 # F6
        self.fc3 = nn.Linear(84, 10)                  # Output layer

    def forward(self, x):
        x = F.tanh(self.conv1(x))
        x = self.pool(x)
        x = F.tanh(self.conv2(x))
        x = self.pool(x)
        x = x.view(-1, 16*4*4)  # Flatten
        x = F.tanh(self.fc1(x))
        x = F.tanh(self.fc2(x))
        x = self.fc3(x)
        return x


# ---------------------------
# Data Loading (MNIST)
# ---------------------------
transform = transforms.Compose([transforms.ToTensor()])

trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=1000, shuffle=False)

# ---------------------------
# Training
# ---------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LeNet5().to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

start_time = time.time()

for epoch in range(5):  # 5 epochs
    running_loss = 0.0
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")

training_time = time.time() - start_time

# ---------------------------
# Evaluation
# ---------------------------
correct = 0
total = 0
with torch.no_grad():
    for images, labels in testloader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total

print(f"\nTest Accuracy: {accuracy:.2f}%")
print(f"Training Time: {training_time:.2f} seconds")


100%|██████████| 9.91M/9.91M [00:00<00:00, 34.1MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.03MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 9.29MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 8.14MB/s]


Epoch 1, Loss: 0.3136
Epoch 2, Loss: 0.1042
Epoch 3, Loss: 0.0689
Epoch 4, Loss: 0.0541
Epoch 5, Loss: 0.0434

Test Accuracy: 98.22%
Training Time: 123.67 seconds


In [None]:
#Question 7: Use a pre-trained VGG16 model (via transfer learning) on a small custom dataset (e.g., flowers or animals). Replace the top layers and fine-tune the model. Include your code and result discussion.
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.applications import VGG16
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import time

# ---------------------------
# 1. Load Pre-trained VGG16 (without top layers)
# ---------------------------
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))

# Freeze base model layers (so they don't update during training)
for layer in base_model.layers:
    layer.trainable = False

# ---------------------------
# 2. Add Custom Classifier on Top
# ---------------------------
model = models.Sequential([
    base_model,
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(5, activation='softmax')   # change "5" to number of classes in your dataset
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# ---------------------------
# 3. Data Preparation (Custom Dataset)
# ---------------------------
# Suppose you have folders: dataset/train & dataset/val with subfolders for each class
train_datagen = ImageDataGenerator(rescale=1./255,
                                   rotation_range=20,
                                   zoom_range=0.2,
                                   horizontal_flip=True)

val_datagen = ImageDataGenerator(rescale=1./255)

train_dir = "dataset/train"   # <-- replace with your path
val_dir = "dataset/val"       # <-- replace with your path

train_generator = train_datagen.flow_from_directory(
    train_dir, target_size=(224, 224), batch_size=32, class_mode="categorical")

val_generator = val_datagen.flow_from_directory(
    val_dir, target_size=(224, 224), batch_size=32, class_mode="categorical")

# ---------------------------
# 4. Training
# ---------------------------
start_time = time.time()

history = model.fit(
    train_generator,
    epochs=5,
    validation_data=val_generator,
    verbose=2
)

training_time = time.time() - start_time

# ---------------------------
# 5. Fine-Tuning (Unfreeze some deeper layers)
# ---------------------------
# Unfreeze last few convolutional blocks for fine-tuning
for layer in base_model.layers[-4:]:
    layer.trainable = True

model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),  # smaller LR
              loss="categorical_crossentropy",
              metrics=["accuracy"])

history_finetune = model.fit(
    train_generator,
    epochs=3,
    validation_data=val_generator,
    verbose=2
)

# ---------------------------
# 6. Evaluation
# ---------------------------
val_loss, val_acc = model.evaluate(val_generator, verbose=0)

print(f"\nValidation Accuracy after Fine-Tuning: {val_acc*100:.2f}%")
print(f"Total Training Time: {training_time:.2f} seconds")


In [None]:
#Question 8: Write a program to visualize the filters and feature maps of the first convolutional layer of AlexNet on an example input image.
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt

# ---------------------------
# 1. Load Pre-trained AlexNet
# ---------------------------
alexnet = models.alexnet(pretrained=True)
alexnet.eval()

# First convolutional layer
first_conv = alexnet.features[0]

# ---------------------------
# 2. Load Example Image
# ---------------------------
img_path = "example.jpg"  # <-- replace with your image path
image = Image.open(img_path).convert("RGB")

# Preprocess: resize, center crop, normalize
transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

img_tensor = transform(image).unsqueeze(0)  # Add batch dimension

# ---------------------------
# 3. Visualize Filters (Weights of Conv1)
# ---------------------------
filters = first_conv.weight.data.clone()

fig, axes = plt.subplots(4, 8, figsize=(12,6))
for i, ax in enumerate(axes.flat):
    if i < filters.shape[0]:
        f = filters[i].cpu()
        f = (f - f.min()) / (f.max() - f.min())  # normalize to [0,1]
        ax.imshow(f.permute(1,2,0))
        ax.axis("off")
plt.suptitle("First Convolutional Layer Filters (AlexNet)")
plt.show()

# ---------------------------
# 4. Visualize Feature Maps
# ---------------------------
with torch.no_grad():
    feature_maps = first_conv(img_tensor)

feature_maps = feature_maps.squeeze(0)  # remove batch dim

fig, axes = plt.subplots(4, 8, figsize=(12,6))
for i, ax in enumerate(axes.flat):
    if i < feature_maps.shape[0]:
        fmap = feature_maps[i].cpu()
        fmap = (fmap - fmap.min()) / (fmap.max() - fmap.min())
        ax.imshow(fmap, cmap="gray")
        ax.axis("off")
plt.suptitle("Feature Maps from First Convolutional Layer (AlexNet)")
plt.show()


In [None]:
#Question 9: Train a GoogLeNet (Inception v1) or its variant using a standard dataset like CIFAR-10. Plot the training and validation accuracy over epochs and analyze overfitting or underfitting.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# ---------------------------
# 1. Device config
# ---------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# ---------------------------
# 2. Data Preprocessing
# ---------------------------
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465),
                         (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465),
                         (0.2023, 0.1994, 0.2010)),
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=100,
                                         shuffle=False, num_workers=2)

# ---------------------------
# 3. Load GoogLeNet
# ---------------------------
from torchvision.models import googlenet

net = googlenet(num_classes=10, aux_logits=True)  # CIFAR-10 has 10 classes
net = net.to(device)

# ---------------------------
# 4. Loss & Optimizer
# ---------------------------
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

# ---------------------------
# 5. Training Loop
# ---------------------------
num_epochs = 10
train_acc_list, val_acc_list = [], []

for epoch in range(num_epochs):
    net.train()
    correct, total = 0, 0

    for inputs, labels in trainloader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs, aux2, aux1 = net(inputs)  # GoogLeNet has 3 outputs
        loss1 = criterion(outputs, labels)
        loss2 = criterion(aux1, labels)
        loss3 = criterion(aux2, labels)
        loss = loss1 + 0.3*(loss2 + loss3)  # weighted loss
        loss.backward()
        optimizer.step()

        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    train_acc = 100.*correct/total
    train_acc_list.append(train_acc)

    # Validation
    net.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for inputs, labels in testloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = net(inputs)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    val_acc = 100.*correct/total
    val_acc_list.append(val_acc)

    print(f"Epoch [{epoch+1}/{num_epochs}] "
          f"Train Acc: {train_acc:.2f}% | Val Acc: {val_acc:.2f}%")

# ---------------------------
# 6. Plot Accuracy
# ---------------------------
plt.figure(figsize=(8,6))
plt.plot(range(1, num_epochs+1), train_acc_list, label="Training Accuracy")
plt.plot(range(1, num_epochs+1), val_acc_list, label="Validation Accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy (%)")
plt.title("GoogLeNet on CIFAR-10")
plt.legend()
plt.grid(True)
plt.show()


100%|██████████| 170M/170M [00:01<00:00, 88.3MB/s]


In [None]:
#Question 10: You are working in a healthcare AI startup. Your team is tasked with developing a system that automatically classifies medical X-ray images into normal, pneumonia, and COVID-19. Due to limited labeled data, what approach would you suggest using among CNN architectures discussed (e.g., transfer learning with ResNet or Inception variants)? Justify your approach and outline a deployment strategy for production use.
# train_resnet_medxray.py
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, callbacks
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.utils.class_weight import compute_class_weight
import time
import json

# -----------------------
# Config (edit as needed)
# -----------------------
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
NUM_CLASSES = 3
TRAIN_DIR = "dataset/train"   # subfolders: normal/, pneumonia/, covid/
VAL_DIR   = "dataset/val"
TEST_DIR  = "dataset/test"
OUTPUT_DIR = "output_model"
os.makedirs(OUTPUT_DIR, exist_ok=True)
SEED = 42

# -----------------------
# Data generators / augmentation
# -----------------------
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    width_shift_range=0.08,
    height_shift_range=0.08,
    shear_range=0.05,
    zoom_range=0.08,
    horizontal_flip=True,
    fill_mode='nearest'
)

val_datagen = ImageDataGenerator(rescale=1./255)

train_gen = train_datagen.flow_from_directory(
    TRAIN_DIR, target_size=IMG_SIZE, batch_size=BATCH_SIZE, class_mode='categorical', shuffle=True, seed=SEED
)
val_gen = val_datagen.flow_from_directory(
    VAL_DIR, target_size=IMG_SIZE, batch_size=BATCH_SIZE, class_mode='categorical', shuffle=False
)
test_gen = val_datagen.flow_from_directory(
    TEST_DIR, target_size=IMG_SIZE, batch_size=BATCH_SIZE, class_mode='categorical', shuffle=False
)

# -----------------------
# Compute class weights to address imbalance
# -----------------------
labels_list = []
for cls_idx, cls_name in enumerate(train_gen.class_indices):
    pass
# sklearn expects integer labels; we read filenames to get labels
# Build labels array for class_weight calculation
num_train_samples = train_gen.samples
y_for_weights = np.zeros(num_train_samples, dtype=int)
# iterate generator once (slow but okay for weight calc)
i = 0
for x_batch, y_batch in train_gen:
    batch_size = y_batch.shape[0]
    y_int = np.argmax(y_batch, axis=1)
    y_for_weights[i:i+batch_size] = y_int
    i += batch_size
    if i >= num_train_samples:
        break

class_weights = compute_class_weight('balanced', classes=np.unique(y_for_weights), y=y_for_weights)
class_weights = {i: w for i, w in enumerate(class_weights)}
print("Class weights:", class_weights)

# Reset train_gen (generator internal state)
train_gen.reset()

# -----------------------
# Build model (ResNet50 base)
# -----------------------
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(IMG_SIZE[0], IMG_SIZE[1], 3))
base_model.trainable = False  # feature extraction phase

model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.BatchNormalization(),
    layers.Dropout(0.5),
    layers.Dense(256, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.4),
    layers.Dense(NUM_CLASSES, activation='softmax')
])

model.compile(
    optimizer=optimizers.Adam(learning_rate=1e-3),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

# -----------------------
# Callbacks
# -----------------------
es = callbacks.EarlyStopping(monitor='val_loss', patience=6, restore_best_weights=True)
mc = callbacks.ModelCheckpoint(os.path.join(OUTPUT_DIR, 'best_resnet50.h5'), monitor='val_loss', save_best_only=True)
lr_cb = callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)

# -----------------------
# Stage 1: Train top classifier (feature extraction)
# -----------------------
EPOCHS_FE = 10
start = time.time()
history_fe = model.fit(
    train_gen,
    epochs=EPOCHS_FE,
    validation_data=val_gen,
    class_weight=class_weights,
    callbacks=[es, mc, lr_cb],
    verbose=2
)
t_fe = time.time() - start
print(f"Feature-extraction training time: {t_fe:.1f}s")

# -----------------------
# Stage 2: Fine-tuning - unfreeze some top layers
# -----------------------
# Unfreeze last conv block (example)
base_model.trainable = True
# Freeze earlier layers if memory / small data
for layer in base_model.layers[:-30]:
    layer.trainable = False

model.compile(
    optimizer=optimizers.Adam(learning_rate=1e-5),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

EPOCHS_FT = 10
start = time.time()
history_ft = model.fit(
    train_gen,
    epochs=EPOCHS_FT,
    validation_data=val_gen,
    class_weight=class_weights,
    callbacks=[es, mc, lr_cb],
    verbose=2
)
t_ft = time.time() - start
print(f"Fine-tuning time: {t_ft:.1f}s")

# -----------------------
# Evaluate on test set
# -----------------------
model.load_weights(os.path.join(OUTPUT_DIR, 'best_resnet50.h5'))
test_loss, test_acc = model.evaluate(test_gen, verbose=1)
print(f"Test accuracy: {test_acc*100:.2f}%")

# Save final model and class indices
model.save(os.path.join(OUTPUT_DIR, 'resnet50_med_xray_final.h5'))
with open(os.path.join(OUTPUT_DIR, 'class_indices.json'), 'w') as f:
    json.dump(train_gen.class_indices, f)

# -----------------------
# Temperature scaling (simple calibration on validation set)
# -----------------------
import numpy as np
from scipy.optimize import minimize

# get logits on validation set
val_gen.reset()
probs = []
labels = []
for x_batch, y_batch in val_gen:
    logits = model.predict(x_batch, verbose=0)  # softmax outputs
    probs.append(logits)
    labels.append(y_batch)
    if len(probs)*BATCH_SIZE >= val_gen.samples:
        break
probs = np.vstack(probs)
labels = np.vstack(labels)
y_true = np.argmax(labels, axis=1)

# temperature scaling: optimize temperature T to minimize NLL on val set
logits = np.log(np.clip(probs, 1e-12, 1.0))  # approximate logits via log(probs)
def nll_T(T):
    T = T[0]
    scaled = logits / T
    scaled = np.exp(scaled - np.max(scaled, axis=1, keepdims=True))
    scaled = scaled / scaled.sum(axis=1, keepdims=True)
    # negative log-likelihood
    ll = -np.sum(np.log(np.clip(scaled[np.arange(len(y_true)), y_true],1e-12,None)))
    return ll

res = minimize(nll_T, x0=[1.0], bounds=[(0.05, 10.0)])
T_opt = res.x[0]
print("Optimal temperature:", T_opt)

# save temperature
with open(os.path.join(OUTPUT_DIR, 'temperature.txt'), 'w') as f:
    f.write(str(float(T_opt)))


In [None]:
#10
# serve_model.py
from flask import Flask, request, jsonify
import tensorflow as tf
import numpy as np
from PIL import Image
import io
import json

app = Flask(__name__)
MODEL_PATH = "output_model/resnet50_med_xray_final.h5"
CLASS_INDICES = "output_model/class_indices.json"
TEMP_PATH = "output_model/temperature.txt"

model = tf.keras.models.load_model(MODEL_PATH)
with open(CLASS_INDICES, 'r') as f:
    class_indices = json.load(f)
inv_class = {v:k for k,v in class_indices.items()}
with open(TEMP_PATH,'r') as f:
    temperature = float(f.read().strip())

IMG_SIZE = (224,224)

def preprocess_image(file_bytes):
    image = Image.open(io.BytesIO(file_bytes)).convert('RGB').resize(IMG_SIZE)
    arr = np.array(image)/255.0
    arr = np.expand_dims(arr, 0)
    return arr

@app.route('/predict', methods=['POST'])
def predict():
    if 'file' not in request.files:
        return jsonify({'error':'no file uploaded'}), 400
    file = request.files['file']
    img = preprocess_image(file.read())
    probs = model.predict(img)[0]
    # temperature scaling
    logits = np.log(np.clip(probs,1e-12,1.0))
    scaled = np.exp(logits / temperature)
    scaled = scaled / scaled.sum()
    top_idx = int(np.argmax(scaled))
    pred_class = inv_class[top_idx]
    return jsonify({
        'pred_class': pred_class,
        'probabilities': {inv_class[i]: float(scaled[i]) for i in range(len(scaled))}
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)


Q10.
1. Short recommendation & justification
- Suggested approach (high level):Use transfer learning with a modern backbone such as ResNet50 or EfficientNetB0–B3, start with ImageNet weights, perform heavy domain-specific data augmentation, use class-imbalance handling (class weights or focal loss), calibrate and estimate uncertainty (e.g., temperature scaling, MC-dropout), and fine-tune the top convolutional blocks. Add explainability (Grad-CAM) and rigorous external/clinical validation before deployment.
- Why this approach:Transfer learning from ImageNet is effective for medical imaging tasks when labeled data is limited — many reviews and empirical studies show robust performance gains vs training from scratch.ResNet variants (ResNet50/101) and EfficientNet family consistently perform well in chest X-ray classification and medical imaging tasks; some studies report EfficientNet and ResNet as top performers for chest radiographs.Model calibration and uncertainty matter in clinical settings — poorly calibrated probabilities can harm triage/decision making; incorporate calibration and uncertainty quantification in the pipeline.Regulatory & lifecycle expectations (validation, monitoring, change control) should be planned early — e.g., FDA draft guidance emphasizes lifecycle management and post-market monitoring for AI medical devices.
2) Code — Transfer learning with ResNet50 (TensorFlow / Keras)
This script is ready to run. It:loads images from train/, val/, test/ directories (each with subfolders normal/, pneumonia/, covid/),
builds a ResNet50-based classifier,uses augmentation, class weights, and early stopping,performs initial training (feature extraction) then fine-tuning,
saves the final model and shows how to apply temperature scaling for calibration.
- Deployment strategy (production-ready checklist)
 - Data & Validation
   - Curate diverse, de-identified datasets from multiple hospitals and imaging devices; split into training / internal validation / external validation sets. Perform statistical comparison of cohort distributions.
 - Model lifecycle & Regulatory
   - Document dataset provenance, training hyperparameters, performance on held-out and external sets, and a clinical validation plan. Follow device lifecycle guidance (e.g., FDA AI/ML draft guidance). Plan for post-market monitoring and retraining procedures.
 - Performance & Safety
  - Evaluate not only accuracy but sensitivity/recall for critical classes (e.g., COVID/pneumonia), false negative risks, and calibration/uncertainty for triage decisions. Include a referral threshold: low-confidence predictions should be sent to a radiologist.
 - Explainability & Clinician UX
  - Provide visual explanations (Grad-CAM saliency maps) alongside predictions to help clinicians interpret model focus areas. This increases trust and helps detect dataset/domain shifts.
 - Robustness & Monitoring
  - Run continuous performance monitoring: track input distribution drift, model accuracy trends, and triggered alerts if performance drops. Implement pipelines for safe retraining and versioning (MLOps).
 - Serving & Scalability
  - For production: use TF-Serving / TorchServe behind an API gateway, containerized (Docker), orchestrated (Kubernetes). Use GPU nodes for batch inference; autoscale for peak loads. Use TLS, authentication, logging, and audit trails.Implement asynchronous pipelines where human review is required but do not auto-treat patients based on model output alone without clinical oversight.
 - Human-in-the-loop & Fail-safes
  - Require clinician confirmation for action, and route uncertain predictions to specialists. Keep an easy way to collect feedback labels for continuous improvement.
5) Practical tips, pitfalls & further improvements
- Try EfficientNet family as an alternative backbone for better parameter efficiency — many recent papers report strong performance on chest X-rays.
Use cross-validation and external test sets from different hospitals for realistic generalization estimates.Consider ensembles or model-agnostic uncertainty (ensembles, MC-dropout) to improve reliability.Always measure calibration (ECE, NLL) and perform post-hoc calibration if needed.