## Question 1: What is a Convolutional Neural Network (CNN), and how does it differ from traditional fully connected neural networks in terms of architecture and performance on image data?
- A Convolutional Neural Network (CNN) is a class of deep neural networks designed to automatically and adaptively learn spatial hierarchies of features from input images. Key components are:

  - Convolutional layers: apply learned kernels (filters) that convolve across spatial dimensions, producing feature maps that capture local patterns (edges, textures).

  - Pooling layers: downsample spatial resolution (max/average pooling) for translation invariance and dimensionality reduction.

  - Activation functions: non-linearities such as ReLU.

  - Fully connected (dense) layers: typically placed near the output to perform classification from learned features.

  - Batch normalization, dropout, etc. for regularization and training stability.

#### Differences vs. fully connected networks (FCNs):

- Parameter sharing: Convolutional kernels are reused across spatial locations → far fewer parameters than FCNs for images. FCNs need parameters proportional to input size x hidden units, which explodes for images.

- Local connectivity: CNNs connect neurons to local patches (receptive fields), capturing spatial locality; FCNs flatten the image and lose spatial relationships.

- Translation equivariance / invariance: Convolutions plus pooling mean that features are recognized regardless of position; FCNs lack this property.

- Better performance on images: CNNs exploit structure of images, leading to much better accuracy and faster training for vision tasks.

- Computational efficiency: Due to sparse local connections and weight sharing, CNNs are computationally tractable on high-dim images.

- Summary: CNNs are architecturally specialized to handle grid-like data (images), producing strong generalization and efficiency advantages over plain FCNs on vision tasks.

## Question 2:Discuss the architecture of LeNet-5 and explain how it laid the foundation for modern deep learning models in computer vision. Include references to its original research paper.

- LeNet-5 (1998, Yann LeCun et al.) is one of the earliest successful CNNs, built for handwritten digit recognition (MNIST). Core architecture:
 - Input: 32×32 grayscale image.
 - C1: Convolutional layer with 6 filters, kernel 5×5 → output 28×28×6.
 - S2: Average pooling/subsampling (2×2) → output 14×14×6.
 - C3: Convolutional layer with 16 filters (5×5) → output 10×10×16. (Connections between C1 and C3 were not fully dense).
 - S4: Subsampling 2×2 → output 5×5×16.
 - C5: Convolutional (5×5) producing 1×1×120 (acts like fully connected).
 - F6: Fully connected layer with 84 units.
 - Output: 10-way softmax.
- Foundational ideas introduced:
 - Local receptive fields and hierarchical feature extraction.
 - Weight sharing (convolutions) to reduce parameters.
 - Pooling/subsampling to reduce resolution and build invariance.
 - End-to-end training with backpropagation on convolutional layers.

##### Why it mattered:
- LeNet-5 provided a practical, trainable architecture that exploited image structure. Its design principles (conv → pool → conv → pool → dense) remain central to modern CNNs (AlexNet, VGG, ResNet). The original paper: “Gradient-based learning applied to document recognition” by LeCun et al., 1998 (commonly cited).

## Question 3: Compare and contrast AlexNet and VGGNet in terms of design principles, number of parameters, and performance. Highlight key innovations and limitations.

- AlexNet (2012):
 - Architecture: 5 convolutional layers + 3 dense layers; used ReLU, local response normalization (LRN), overlapping max-pooling.
 - Key innovations: demonstration that deep CNNs trained on GPUs achieve state-of-the-art on ImageNet; introduced ReLU for faster training; used dropout for regularization and data augmentation.
 - Parameters: ~60–100 million (depends on exact variant).
 - Performance: Won ImageNet 2012 by large margin; revived interest in deep learning.
 - Limitations: Large number of parameters, early-stage techniques (LRN not used later), relatively large filters (11×11 first layer).

- VGGNet (2014):
 - Architecture: Very deep (VGG16 / VGG19) made of stacked 3×3 convolutional layers and periodic max-pooling, ended with three dense layers.
 - Key idea: Replace large filters by stacks of smaller 3×3 filters → same receptive field but fewer parameters and more non-linearities (better representational power).
 - Parameters: VGG16 ≈ 138 million (heavy due to FC layers).
 - Performance: Top performance on ImageNet for its time; demonstrated depth matters.
 - Limitations: Very heavy (storage/compute), slow inference; brittle to parameter count and memory.

#### Comparison summary:

- Both advanced deep-learning vision. AlexNet showed the power of deep conv nets with GPUs; VGG emphasized depth and simplicity (3×3 conv stacks).

- VGG typically achieves higher accuracy than AlexNet for same training pipeline, but at higher parameter cost.

- Later architectures (ResNet, Inception) improved accuracy while being more parameter-efficient and easier to train.

### Question 4: What is transfer learning in the context of image classification? Explain how it helps in reducing computational costs and improving model performance with limited data.

- Transfer learning: reuse of a model (or its parameters) trained on a source task/dataset (e.g., ImageNet) as the starting point for a target task. In image classification, a pre-trained CNN’s early and intermediate layers capture general visual features (edges, textures) that are often useful for other vision tasks.

#### Common approaches:
- Feature extraction: freeze pretrained convolutional base; train only new top classifier layers on target dataset.
- Fine-tuning: unfreeze some top convolutional layers and jointly train them with the new classifier, using a smaller learning rate.

#### Why it helps:
- Reduces computational cost: training only top layers or fine-tuning a small subset requires less compute and time than training from scratch.
- Better performance with limited data: pretrained features act as strong priors, preventing overfitting and improving generalization when labeled target data is scarce.
- Faster convergence: model starts close to a useful solution, requiring fewer epochs to reach good performance.

#### When to use:
- When target dataset is small or similar in domain to the source (e.g., natural images).
- Fine-tune when you have moderate data; use pure feature extraction when data is very scarce.

## Question 5 Describe the role of residual connections in ResNet architecture. How do they address the vanishing gradient problem in deep CNNs?
- Residual connections (skip connections) add the input of a block to its output: if a block computes F(x), the block's output becomes y = F(x) + x (optionally followed by activation). This simple identity shortcut allows the network to learn residual mappings.

#### Why they help:
- Easier optimization: learning F(x) = H(x) - x (residual) is often easier than directly learning H(x). If an extra layer hurts performance, the residual block can learn F(x) = 0 thereby preserving identity mapping.
- Gradient flow: gradients can propagate directly through the identity path back to earlier layers, mitigating vanishing gradients and enabling much deeper networks (ResNet-50/101/152).
- Better generalization and accuracy: deeper ResNets outperform shallower networks by allowing effective training of hundreds of layers.
- Summary: Residual connections provide direct pathways for information and gradients, stabilizing training of very deep networks and reducing degradation problems.

## Question 6 Implement LeNet-5 using TensorFlow/Keras to classify the MNIST dataset. Report accuracy and training time.
- Notes: The script below will:
- Load MNIST, preprocess, create LeNet-5-like model
- Train and print training time, validation accuracy
- Save final model and history

In [None]:
# Q6_lenet_mnist.py
# Run: python Q6_lenet_mnist.py  OR run in Colab

import tensorflow as tf
from tensorflow.keras import layers, models
import time

# 1. Load and preprocess MNIST
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# LeNet originally used 32x32; we'll resize to 32x32 and normalize
import numpy as np
x_train = np.expand_dims(x_train, axis=-1).astype("float32") / 255.0
x_test = np.expand_dims(x_test, axis=-1).astype("float32") / 255.0

# Resize to 32x32
x_train = tf.image.resize(x_train, [32, 32]).numpy()
x_test = tf.image.resize(x_test, [32, 32]).numpy()

# One-hot
num_classes = 10
y_train_cat = tf.keras.utils.to_categorical(y_train, num_classes)
y_test_cat = tf.keras.utils.to_categorical(y_test, num_classes)

# 2. LeNet-5 like model
def LeNet5(input_shape=(32,32,1), num_classes=10):
    model = models.Sequential([
        layers.Conv2D(6, kernel_size=5, activation='tanh', input_shape=input_shape, padding='valid'),
        layers.AveragePooling2D(),  # S2
        layers.Conv2D(16, kernel_size=5, activation='tanh', padding='valid'),
        layers.AveragePooling2D(),  # S4
        layers.Conv2D(120, kernel_size=5, activation='tanh'),  # C5
        layers.Flatten(),
        layers.Dense(84, activation='tanh'),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

model = LeNet5()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

# 3. Train
batch_size = 128
epochs = 10

start = time.time()
history = model.fit(x_train, y_train_cat, batch_size=batch_size, epochs=epochs,
                    validation_split=0.1, verbose=2)
end = time.time()

train_time = end - start
print(f"Training time (s): {train_time:.2f}")

# 4. Evaluate
test_loss, test_acc = model.evaluate(x_test, y_test_cat, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")

# Save model and history
model.save("lenet_mnist.h5")
import json
with open("lenet_mnist_history.json", "w") as f:
    json.dump({k: [float(x) for x in v] for k,v in history.history.items()}, f)

# How to report after running:

# Training time printed (seconds). Report as “Training time: X seconds (on GPU/CPU)”.

# Test accuracy printed (e.g., 0.99). Report the final accuracy and validation curve (plot history).

## Question 7: Use a pre-trained VGG16 model via transfer learning on a small custom dataset (flowers). Replace top layers and fine-tune.
- Notes: This script downloads the TensorFlow flower_photos dataset, prepares train/val/test splits, builds a VGG16-based classifier, and fine-tunes the top layers. It prints training time and final accuracy.

In [None]:
# Q7_vgg16_flowers.py
# Run in Colab (GPU recommended)
import tensorflow as tf, time, os
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

# 1. Download flower dataset (TensorFlow example)
DATA_URL = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
dataset_dir = tf.keras.utils.get_file(origin=DATA_URL, fname="flower_photos", untar=True)
dataset_dir = os.path.join(os.path.dirname(dataset_dir), "flower_photos")

# 2. Create training and validation datasets
img_size = (224, 224)  # VGG16 expects 224x224
batch_size = 32

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir, validation_split=0.2, subset="training", seed=123,
    image_size=img_size, batch_size=batch_size
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    dataset_dir, validation_split=0.2, subset="validation", seed=123,
    image_size=img_size, batch_size=batch_size
)

class_names = train_ds.class_names
num_classes = len(class_names)
print("Classes:", class_names)

# Prefetch
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.prefetch(buffer_size=AUTOTUNE)

# 3. Build model: VGG16 base
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet',
                                         input_shape=img_size+(3,))
base_model.trainable = False  # freeze base

# Add top classifier
inputs = tf.keras.Input(shape=img_size+(3,))
x = tf.keras.applications.vgg16.preprocess_input(inputs)
x = base_model(x, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

# 4. Train top layers
epochs_head = 8
start = time.time()
history1 = model.fit(train_ds, epochs=epochs_head, validation_data=val_ds)
end = time.time()
print("Feature-extract training time (s):", end-start)

# 5. Fine-tune: unfreeze some top VGG blocks
base_model.trainable = True
# Freeze lower layers, unfreeze last convolutional block
for layer in base_model.layers[:-4]:
    layer.trainable = False

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])

start = time.time()
history2 = model.fit(train_ds, epochs=5, validation_data=val_ds)
end = time.time()
print("Fine-tuning time (s):", end-start)

# 6. Save and report
model.save("vgg16_flowers_finetuned.h5")
# Evaluate on validation set (optionally create test split)
val_loss, val_acc = model.evaluate(val_ds)
print(f"Validation accuracy after fine-tuning: {val_acc:.4f}")

# Plot training history (optional)
acc = history1.history['accuracy'] + history2.history['accuracy']
val_acc = history1.history['val_accuracy'] + history2.history['val_accuracy']
plt.plot(acc, label='train_acc'); plt.plot(val_acc, label='val_acc'); plt.legend(); plt.show()


# Discussion points to include in the assignment PDF after running:

# Number of classes and dataset size.

# Training times for feature-extract stage and fine-tuning stage (both printed).

# Final validation accuracy and a short analysis: whether model overfits, whether more augmentation is needed, etc.


## Question 8: Visualize the filters and feature maps of the first convolutional layer of AlexNet on an example input image.
- Notes: We'll implement an AlexNet-like small model in Keras (first conv layer similar to AlexNet) and visualize:
 - The learned filters (weights) of the first Conv2D layer
 - The feature maps (activations) after feeding a sample image

In [None]:
# Q8_alexnet_filters_featuremaps.py
# Run in Colab or local

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing import image
import os

# 1. Define small AlexNet-ish model (only first few layers needed)
def AlexNet_small(input_shape=(227,227,3), num_classes=1000):
    inputs = tf.keras.Input(shape=input_shape)
    x = layers.Conv2D(96, kernel_size=11, strides=4, activation='relu')(inputs)  # first layer like AlexNet
    x = layers.MaxPooling2D(pool_size=3, strides=2)(x)
    x = layers.Conv2D(256, kernel_size=5, padding='same', activation='relu')(x)
    x = layers.MaxPooling2D(pool_size=3, strides=2)(x)
    x = layers.Flatten()(x)
    x = layers.Dense(4096, activation='relu')(x)
    x = layers.Dense(num_classes, activation='softmax')(x)
    return models.Model(inputs, x)

model = AlexNet_small(input_shape=(227,227,3), num_classes=10)
model.summary()

# 2. Randomly initialize or load weights (we'll use random weights here)
# To visualize filters that are meaningful, you would train this model on a dataset.
# But you can still visualize initial filters.

# 3. Visualize filters of first conv layer
first_conv = model.layers[1]  # layer index for first Conv2D
weights = first_conv.get_weights()[0]  # shape (11,11,3,96)
print("weights shape:", weights.shape)

# Normalize filter values to 0-1 for visualization
w_min, w_max = weights.min(), weights.max()
filters = (weights - w_min) / (w_max - w_min)

n_filters = min(16, filters.shape[3])
fig = plt.figure(figsize=(12,6))
for i in range(n_filters):
    f = filters[:, :, :, i]
    # resize for display
    ax = fig.add_subplot(2, n_filters//2, i+1)
    ax.imshow(f)
    ax.axis('off')
plt.suptitle("First-layer filters (initial random weights)")
plt.show()

# 4. Feature maps on one sample image
# Use a sample image from keras or local path
img_path = tf.keras.utils.get_file('elephant.jpg',
                                   'https://upload.wikimedia.org/wikipedia/commons/7/73/Loxodonta_africana_%28cropped%29.jpg')
img = image.load_img(img_path, target_size=(227,227))
img_arr = image.img_to_array(img)
img_input = np.expand_dims(img_arr, axis=0) / 255.0

# Create a model that outputs activations of first conv layer
activation_model = models.Model(inputs=model.input, outputs=first_conv.output)
activations = activation_model.predict(img_input)  # shape (1, out_h, out_w, 96)
print("activations shape:", activations.shape)

# Plot some feature maps
n_maps = 16
fig = plt.figure(figsize=(12,6))
for i in range(n_maps):
    ax = fig.add_subplot(2, n_maps//2, i+1)
    act = activations[0, :, :, i]
    ax.imshow(act, cmap='viridis')
    ax.axis('off')
plt.suptitle("Feature maps from first conv layer (random weights)")
plt.show()

# Note: With random weights the filters/feature maps are not semantically meaningful. After training on real data, the filter visualizations show edges, color blobs, etc.

## Question 9: Train an Inception variant (InceptionV3) on CIFAR-10. Plot training/validation accuracy over epochs and analyze overfitting or underfitting.
- Notes: InceptionV3 expects bigger input, so images are resized. This script shows training and plotting. Use GPU for reasonable speed.

In [None]:
# Q9_inceptionv3_cifar10.py
import tensorflow as tf, time
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import numpy as np

# 1. Load CIFAR-10
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
num_classes = 10

# Resize to 150x150 for InceptionV3
IMG_SIZE = 150
x_train = tf.image.resize(x_train, [IMG_SIZE, IMG_SIZE]).numpy().astype('float32')/255.0
x_test = tf.image.resize(x_test, [IMG_SIZE, IMG_SIZE]).numpy().astype('float32')/255.0

# 2. Build model with InceptionV3 base
base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet',
                                               input_shape=(IMG_SIZE, IMG_SIZE, 3))
base_model.trainable = False

inputs = tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = tf.keras.applications.inception_v3.preprocess_input(inputs)
x = base_model(x, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)
model = models.Model(inputs, outputs)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

# 3. Train
epochs = 10
batch_size = 64

start = time.time()
history = model.fit(x_train, y_train, validation_split=0.1, epochs=epochs, batch_size=batch_size, verbose=2)
end = time.time()
print(f"Training time (s): {end-start:.2f}")

# 4. Plot accuracy
plt.plot(history.history['accuracy'], label='train_acc')
plt.plot(history.history['val_accuracy'], label='val_acc')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.legend(); plt.title('InceptionV3 on CIFAR-10 (feature-extract)')
plt.show()

# 5. Evaluate on test set
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")

# 6. (Optional) Fine-tune top base layers for more accuracy (unfreeze some top layers)

# Analysis you should include in your PDF after running:

# Plot: training vs validation accuracy — if validation gap large (val << train) → overfitting (suggest augmentation, regularization).

# If both low → underfitting (suggest larger model, longer training, reduce regularization).

# Report training time and final test accuracy.

## Question 10 You are working in a healthcare AI startup. Task: classify X-ray images into normal, pneumonia, and COVID-19 with limited labeled data. Which approach and deployment strategy?

Answer (recommended approach + code snippet + deployment plan)

Recommended approach

Use transfer learning with a robust backbone: e.g., ResNet50 or EfficientNet-B0/B3 pretrained on ImageNet. These backbones extract general features and are proven effective for medical imagery when fine-tuned.

Data handling

Gather all labeled X-rays (3 classes). If extremely limited, augment with domain-similar public datasets (e.g., NIH ChestX-ray, RSNA pneumonia dataset, COVIDx) after ensuring licensing and ethical use.

Data augmentation (careful): rotations, shifts, slight brightness/contrast; avoid unrealistic transforms that change clinical features.

Class imbalance: use class weighting, oversampling, or focal loss.

Training strategy

Stage 1: Freeze the base model, train only top classifier head (feature extraction).

Stage 2: Unfreeze top few convolutional blocks and fine-tune with a low learning rate.

Use cross-validation (k-fold) if data small; report sensitivity (recall), specificity, AUC for each class — accuracy alone is insufficient in medical tasks.

Regularization & calibration

Use dropout, weight decay, and early stopping.

Calibrate predicted probabilities (Platt scaling / temperature scaling) before deployment.

Explainability

Integrate Grad-CAM or other saliency maps to visualize model focus — mandatory for medical settings to support clinicians.

Evaluation

Use withheld test set from different source (external validation) to estimate generalization.

Report confusion matrix, per-class precision/recall, F1, ROC/AUC, and NPV/PPV for clinical relevance.

Deployment strategy

Model packaging: export a SavedModel (TensorFlow) or ONNX.

Serving: use TensorFlow Serving or a Dockerized FastAPI/Flask app with GPU inference if needed.

Monitoring: log inputs, outputs, model confidence, and implement data drift detection; monitor for distributional shifts.

Human-in-the-loop: allow clinician review for low-confidence cases; start as decision-support, not autonomous diagnosis.

Security & compliance: HIPAA/GDPR rules, secure data handling, audit logs, version control for model and data.

CI/CD: automated tests, canary rollout, A/B testing, rollback strategies.

In [None]:
# Q10_resnet_xray.py
import tensorflow as tf, time
from tensorflow.keras import layers, models, optimizers

# Assume you have train_dir, val_dir with subfolders 'normal','pneumonia','covid'
train_dir = "/path/to/train"
val_dir = "/path/to/val"
img_size = (224,224)
batch_size = 16

train_ds = tf.keras.preprocessing.image_dataset_from_directory(train_dir, image_size=img_size, batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(val_dir, image_size=img_size, batch_size=batch_size)

# Data augmentation pipeline
data_augmentation = tf.keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.05),
    layers.RandomZoom(0.05),
    layers.RandomContrast(0.05),
])

base_model = tf.keras.applications.ResNet50(include_top=False, weights='imagenet', input_shape=img_size+(3,))
base_model.trainable = False

inputs = tf.keras.Input(shape=img_size+(3,))
x = data_augmentation(inputs)
x = tf.keras.applications.resnet.preprocess_input(x)
x = base_model(x, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(3, activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)

model.compile(optimizer=optimizers.Adam(1e-4), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

start = time.time()
history = model.fit(train_ds, validation_data=val_ds, epochs=10)
print("Feature-extract time:", time.time()-start)

# Fine-tune: unfreeze last conv block
base_model.trainable = True
for layer in base_model.layers[:-50]:
    layer.trainable = False

model.compile(optimizer=optimizers.Adam(1e-5), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
start = time.time()
history_ft = model.fit(train_ds, validation_data=val_ds, epochs=10)
print("Fine-tune time:", time.time()-start)

# Save
model.save("resnet_xray_model")



# Deployment outline

# Export SavedModel.

# Create a Docker image using tensorflow/serving or a small FastAPI wrapper that loads the model and exposes /predict.

# Use TLS, authentication, and write audit logs.

# Set up monitoring dashboards (latency, throughput, prediction distributions).

# Conduct clinician validation and pilot deployment in controlled clinical setting before large-scale roll-out.

# Ethics & safety: Ensure annotated data quality, obtain clinician sign-off, and be cautious: models are decision-support tools, not replacements for clinical judgement.