Question 1: What is a Convolutional Neural Network (CNN), and how does it differ from traditional fully connected neural networks in terms of architecture and performance on image data?

    -> A Convolutional Neural Network (CNN) is a deep learning architecture designed to automatically and adaptively learn spatial hierarchies of features from images. Unlike traditional fully connected neural networks where each neuron connects to every input, CNNs use convolutional layers with local receptive fields and shared weights to capture spatial patterns efficiently. This makes them highly effective for image data, reducing the number of parameters and improving performance in tasks like image classification, detection, and recognition.

Question 2: Discuss the architecture of LeNet-5 and explain how it laid the foundation for modern deep learning models in computer vision. Include references to its original research paper.

    -> LeNet-5, introduced by Yann LeCun et al. in 1998 (“Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE), is one of the earliest CNN architectures designed for handwritten digit recognition. It consists of seven layers, including convolutional, subsampling (pooling), and fully connected layers, followed by a softmax output layer. LeNet-5 demonstrated how convolutional and pooling operations could efficiently extract hierarchical features from images, laying the foundation for modern CNNs like AlexNet, VGG, and ResNet by showcasing the power of deep feature learning in computer vision tasks.


Question 3: Compare and contrast AlexNet and VGGNet in terms of design principles,number of parameters, and performance. Highlight key innovations and limitations of each.

    -> AlexNet, introduced by Krizhevsky et al. in 2012, was a breakthrough CNN architecture that popularized deep learning by achieving record performance on ImageNet. It used five convolutional layers, ReLU activations, dropout, and GPU training to significantly outperform traditional methods. VGGNet, proposed by Simonyan and Zisserman in 2014, built upon AlexNet by using a deeper architecture with 16–19 layers and smaller 3×3 convolution filters for better feature extraction. While VGGNet achieved higher accuracy with a more uniform design, it required far more parameters and computational resources than AlexNet, making it slower and more memory-intensive despite its improved generalization.


Question 4: What is transfer learning in the context of image classification? Explain how it helps in reducing computational costs and improving model performance with limited data.   

    -> Transfer learning in image classification involves using a pre-trained model, such as VGG, ResNet, or Inception, trained on large datasets like ImageNet, and fine-tuning it for a new but related task. This approach allows the model to reuse previously learned visual features, significantly reducing the need for large labeled datasets and extensive training time. By leveraging learned representations, transfer learning improves model performance, especially when data is scarce, while lowering computational costs since only the final layers are retrained instead of the entire network.


Question 5: Describe the role of residual connections in ResNet architecture. How do they address the vanishing gradient problem in deep CNNs?   

    -> Residual connections in the ResNet architecture enable the network to learn identity mappings by adding shortcut links that skip one or more layers. These connections allow gradients to flow directly through the network during backpropagation, effectively mitigating the vanishing gradient problem that occurs in very deep CNNs. By letting layers learn residual functions rather than complete transformations, ResNet ensures easier optimization, enabling the successful training of extremely deep networks with hundreds of layers while maintaining high accuracy and stability.
    

In [2]:
# Install TensorFlow into this notebook's kernel
# If you have a supported NVIDIA GPU and drivers, you can replace the second line with:
# %pip install tensorflow[and-cuda]
%pip install --upgrade pip
%pip install tensorflow

Collecting pip
  Downloading pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.2-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.2


In [3]:
# Quick environment check after install
import sys, site
import tensorflow as tf
print("Python executable:", sys.executable)
print("TensorFlow version:", tf.__version__)
print("Site-packages:")
for p in site.getsitepackages():
    print(" -", p)

Python executable: /usr/bin/python3
TensorFlow version: 2.19.0
Site-packages:
 - /usr/local/lib/python3.12/dist-packages
 - /usr/lib/python3/dist-packages
 - /usr/lib/python3.12/dist-packages


In [4]:
# Question 6: Implement the LeNet-5 architectures using Tensorflow or PyTorch to classify the MNIST dataset. Report the accuracy and training time.

import time
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist

tf.random.set_seed(42)
np.random.seed(42)

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = (x_train.astype("float32") / 255.0)[..., np.newaxis]
x_test = (x_test.astype("float32") / 255.0)[..., np.newaxis]

# Pad to 32x32 for a faithful LeNet-5 input size (original used 32x32)
x_train = np.pad(x_train, ((0, 0), (2, 2), (2, 2), (0, 0)), mode="constant")
x_test = np.pad(x_test, ((0, 0), (2, 2), (2, 2), (0, 0)), mode="constant")

# Define LeNet-5 (tanh activations + average pooling)
model = models.Sequential([
    layers.Conv2D(6, kernel_size=(5, 5), activation='tanh', input_shape=(32, 32, 1)),
    layers.AveragePooling2D(pool_size=(2, 2), strides=2),
    layers.Conv2D(16, kernel_size=(5, 5), activation='tanh'),
    layers.AveragePooling2D(pool_size=(2, 2), strides=2),
    # Original LeNet-5 has a conv layer that produces 120 feature maps with 5x5 kernels
    layers.Conv2D(120, kernel_size=(5, 5), activation='tanh'),
    layers.Flatten(),
    layers.Dense(84, activation='tanh'),
    layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

start_time = time.time()
history = model.fit(
    x_train, y_train,
    epochs=10,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)
train_time = time.time() - start_time

test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"Test Accuracy: {test_acc * 100:.2f}%")
print(f"Training Time: {train_time:.2f} seconds")

Epoch 1/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 65ms/step - accuracy: 0.7264 - loss: 0.9264 - val_accuracy: 0.9492 - val_loss: 0.1799
Epoch 2/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 65ms/step - accuracy: 0.9385 - loss: 0.2042 - val_accuracy: 0.9672 - val_loss: 0.1139
Epoch 3/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 66ms/step - accuracy: 0.9598 - loss: 0.1325 - val_accuracy: 0.9767 - val_loss: 0.0858
Epoch 4/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 80ms/step - accuracy: 0.9696 - loss: 0.0991 - val_accuracy: 0.9787 - val_loss: 0.0714
Epoch 5/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 66ms/step - accuracy: 0.9752 - loss: 0.0799 - val_accuracy: 0.9817 - val_loss: 0.0630
Epoch 6/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 83ms/step - accuracy: 0.9797 - loss: 0.0672 - val_accuracy: 0.9830 - val_loss: 0.0575
Epoch 7/10
[1m4

Question 10: Strategy for X-ray classification with limited labels (normal, pneumonia, COVID-19)

 Answer:
Approach: Transfer learning with a strong pretrained backbone (e.g., ResNet50, EfficientNet-B0/B3)
trained on ImageNet, optionally further pretrain on large chest X-ray datasets (e.g., CheXpert)
 using self-supervised or weakly supervised methods if labels are scarce.
 - Data handling: Use class-balanced sampling, heavy augmentations (random rotation, shift, CLAHE/contrast,
   random cropping, cutout), and ensure strict patient-wise splits.
 - Fine-tuning: Freeze backbone → train new head; then unfreeze last block(s) with low LR; use early stopping
  and model checkpointing; evaluate with stratified k-fold if dataset small.
- Loss/metrics: Focal loss or class-weighted cross-entropy; metrics: AUC, sensitivity/specificity, F1.
- Explainability: Integrate Grad-CAM for clinician trust and error analysis.
- Deployment: Export to ONNX/TF SavedModel; serve behind an API with GPU/CPU autoscaling; add input QA checks,
 DICOM de-identification, audit logging; monitor drift with periodic re-evaluation and threshold tuning.
- Privacy & compliance: HIPAA-safe storage, role-based access, and human-in-the-loop review for edge cases.

print("Strategy outlined for production-ready, data-limited medical X-ray classification.")