
1 — What is the role of filters and feature maps in Convolutional Neural Network (CNN)?

Ans: **Filters (Kernels):**
- Filters (also called kernels) are small learnable weight matrices (e.g., 3x3, 5x5) that slide over the input (image or feature map) performing element-wise multiplication and summation (convolution).
- Each filter is trained to detect a particular local pattern such as edges, textures, corners, or more complex motifs in deeper layers.
- Filters are shared spatially (same weights at all image locations), enabling translation equivariance and substantial parameter savings compared to fully-connected layers.

**Feature maps (Activation maps):**
- A feature map is the result of convolving a filter over the input and applying a non-linear activation. It is a 2D map (or 3D when stacked across channels) that highlights where the learned pattern appears in the input.
- Stacking multiple filters produces multiple feature maps — together they form the output volume of a convolutional layer.
- Early-layer feature maps capture low-level features (edges, color blobs). Deeper-layer feature maps capture higher-level, task-specific features (object parts, shapes).

.



2 — Explain the concepts of padding and stride in CNNs(Convolutional Neural Network). How do they affect the output dimensions of feature maps?

Ans: **Stride:**
- Stride is the step size with which the filter moves across the input. A stride of 1 moves the filter one pixel at a time; stride 2 moves two pixels, etc.
- Larger stride reduces the spatial dimensions of the output (downsampling effect) and reduces computational cost and overlap between receptive fields.

**Padding:**
- Padding adds artificial border pixels around the input (usually zeros — 'zero padding') so that filters can be applied at image edges.
- Common types: 'valid' (no padding), 'same' (padding chosen so output has same spatial dimensions as input for stride=1), and explicit integer padding values.

Effects
- **No padding + stride 1** → output shrinks by K−1 on each spatial axis.
- **Same padding + stride 1** → output size ≈ input size (useful when preserving spatial resolution).
- **Stride > 1** → spatial downsampling; used as an alternative to pooling for reducing feature map size.
- Padding controls boundary behavior; stride controls sampling density and computational cost.



3 — Define receptive field in the context of CNNs. Why is it important for deep architectures?

Ans:
- The receptive field of a unit (neuron) in a CNN is the size of the region in the input image that affects that unit's activation.
- For a single convolution layer with kernel K, the receptive field of a unit equals K (assuming stride 1). For deeper architectures, receptive fields grow because each layer aggregates information from previous layers.

**Importance for deep architectures:**
- Larger receptive fields let deeper neurons "see" a larger area of the original image, enabling recognition of larger patterns or object-level context.
- Proper receptive field design is critical: too small → model cannot capture global context; too large (or too aggressive downsampling) → loss of fine detail or spatial precision.
- Techniques to increase receptive field: stacking layers, using larger kernels, dilated convolutions, or pooling/strided convs. Dilated convolutions increase receptive field without reducing resolution or increasing parameter count dramatically.
- Effective receptive field (empirical notion) often smaller than theoretical maximum; network design and initialization affect how much of the theoretical receptive field actually influences the output.



 4 — Discuss how filter size and stride influence the number of parameters in a CNN.

Ans: **Parameters in a convolutional layer:**
- For a conv layer with `F_out` output filters, `F_in` input channels, and kernel size KxK, the number of parameters (weights) is:
  `params = F_out * (F_in * K * K) + F_out` (the +F_out term is biases if used).

**Effect of filter size (K):**
- Larger K increases per-filter parameter count quadratically (K^2). For example, 5x5 has ~2.78× parameters of 3x3 (25 vs 9). Smaller kernels (3x3) are parameter-efficient and can be stacked to obtain larger receptive fields while keeping parameters lower (two 3x3 layers have effective receptive field 5x5 but fewer params than a single 5x5).

**Effect of stride:**
- Stride changes output spatial dimensions but **does not change** the number of parameters (weights) in the filters — parameters depend only on kernel size and channel counts. However, stride changes computational cost and activations count (fewer positions → fewer runtime operations), which indirectly affects memory and speed.





5 — Compare and contrast different CNN-based architectures like LeNet, AlexNet, and VGG in terms of depth, filter sizes, and performance.

Ans: **LeNet (1990s):**
- Very early CNN (LeNet-5) designed for digit recognition (MNIST).
- Shallow: a few conv layers (e.g., conv -> pool -> conv -> pool -> FC layers).
- Small kernels (5x5), few filters, small parameter count compared to modern nets.
- Performance: Good for small datasets and simple tasks like digit recognition; outdated for large-scale vision tasks.

**AlexNet (2012):**
- Breakthrough: won ImageNet 2012, revived deep learning for vision.
- Deeper and wider than LeNet: ~8 learned layers (5 conv + 3 FC), ReLU activations, dropout, data augmentation, GPU training.
- Larger kernels in early layers (11x11 in first layer) and many filters (e.g., 96, 256).
- Demonstrated the importance of depth, GPUs, ReLU, and regularization for large-scale image classification.

**VGG (2014):**
- Very deep (16–19 weight layers) using a simple repeating pattern: stacks of 3x3 conv layers + pooling.
- Emphasis on depth and uniform small kernels (3x3) which allowed expressive receptive fields with fewer params than larger single kernels at the same depth of abstraction.
- VGG has many parameters (large FC layers) making it heavy; good performance on ImageNet and useful as a feature extractor, but computationally expensive.




In [1]:

# Ans: 6
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical


(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape((-1,28,28,1)).astype('float32') / 255.0
x_test = x_test.reshape((-1,28,28,1)).astype('float32') / 255.0
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)


model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


history = model.fit(x_train, y_train_cat, epochs=5, batch_size=128, validation_split=0.1)


test_loss, test_acc = model.evaluate(x_test, y_test_cat, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 71ms/step - accuracy: 0.8165 - loss: 0.5853 - val_accuracy: 0.9840 - val_loss: 0.0560
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 70ms/step - accuracy: 0.9732 - loss: 0.0901 - val_accuracy: 0.9872 - val_loss: 0.0436
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 73ms/step - accuracy: 0.9808 - loss: 0.0618 - val_accuracy: 0.9895 - val_loss: 0.0378
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 68ms/step - accuracy: 0.9841 - loss: 0.0519 - val_accuracy: 0.9890 - val_loss: 0.0369
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 63ms/step - accuracy: 0.9873 - loss: 0.0417 - val_accuracy: 0.9900 - val_loss: 0.0342
Test accuracy: 0.9894


In [2]:

# Ans: 7
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)

model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', padding='same', input_shape=(32,32,3)),
    layers.Conv2D(32, (3,3), activation='relu', padding='same'),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu', padding='same'),
    layers.Conv2D(64, (3,3), activation='relu', padding='same'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax'),
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(x_train, y_train_cat, epochs=10, batch_size=128, validation_split=0.1)

test_loss, test_acc = model.evaluate(x_test, y_test_cat, verbose=0)
print(f"CIFAR-10 test accuracy: {test_acc:.4f}")


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step
Epoch 1/10
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m150s[0m 423ms/step - accuracy: 0.3134 - loss: 1.8526 - val_accuracy: 0.5406 - val_loss: 1.2913
Epoch 2/10
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m147s[0m 417ms/step - accuracy: 0.5467 - loss: 1.2635 - val_accuracy: 0.6406 - val_loss: 1.0173
Epoch 3/10
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m204s[0m 423ms/step - accuracy: 0.6266 - loss: 1.0550 - val_accuracy: 0.6972 - val_loss: 0.8769
Epoch 4/10
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m145s[0m 412ms/step - accuracy: 0.6821 - loss: 0.8993 - val_accuracy: 0.7158 - val_loss: 0.8055
Epoch 5/10
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m148s[0m 422ms/step - accuracy: 0.7182 - loss: 0.8076 - val_accuracy: 0.7406 - val_loss: 0.7498
Epoch

In [4]:

# Ans: 8
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader


class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1, 1)
        self.pool = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)
        self.dropout = nn.Dropout(0.4)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x


transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_ds = datasets.MNIST('.', train=True, download=True, transform=transform)
test_ds  = datasets.MNIST('.', train=False, download=True, transform=transform)
train_loader = DataLoader(train_ds, batch_size=128, shuffle=True)
test_loader  = DataLoader(test_ds, batch_size=256)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)


model.train()
for epoch in range(1):
    total_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}, loss={total_loss/len(train_loader):.4f}")

model.eval()
correct = 0; total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)
print(f"Test accuracy: {correct/total:.4f}")


RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x12544 and 3136x128)

In [5]:

# Ans: 9

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models

train_datagen = ImageDataGenerator(rescale=1./255,
                                   rotation_range=15,
                                   width_shift_range=0.1,
                                   height_shift_range=0.1,
                                   shear_range=0.1,
                                   zoom_range=0.1,
                                   horizontal_flip=True,
                                   fill_mode='nearest')

val_datagen = ImageDataGenerator(rescale=1./255)

train_dir = 'dataset/train'  # change to your path
val_dir = 'dataset/val'

train_gen = train_datagen.flow_from_directory(train_dir, target_size=(150,150), batch_size=32, class_mode='binary')
val_gen   = val_datagen.flow_from_directory(val_dir,   target_size=(150,150), batch_size=32, class_mode='binary')

# Simple model
model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(150,150,3)),
    layers.MaxPooling2D(2,2),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Fit (example; paths must exist)
# history = model.fit(train_gen, validation_data=val_gen, epochs=20)
print('Prepared data generators and model. Replace dataset paths and uncomment model.fit to train.')


FileNotFoundError: [Errno 2] No such file or directory: 'dataset/train'


Ans: 10

### 1) Data preparation
- Collect a labeled chest X-ray dataset (e.g., public datasets: COVID-19 or RSNA pneumonia dataset). Ensure data privacy and consent if using private datasets.
- Preprocess: resize images to a fixed size (e.g., 224x224), apply normalization, convert to 3-channels if needed, and split into train/val/test by patient to avoid leakage.
- Augmentation: rotations, flips, slight brightness/contrast shifts — but avoid unrealistic transforms that change clinical meaning.

### 2) Model training
- Use a transfer-learning approach: start from a pretrained backbone (EfficientNet, ResNet50, DenseNet) and fine-tune.
- Replace final classification head with a binary output (sigmoid) and use binary cross-entropy loss.
- Use class weights if dataset is imbalanced; use focal loss if false negatives are especially costly.
- Regularization: data augmentation, dropout, weight decay. Monitor validation metrics (AUC, sensitivity/recall, specificity, precision).
- Use early stopping and model checkpointing (save best by validation AUC or sensitivity).

### 3) Explainability & evaluation
- Use Grad-CAM, Integrated Gradients, or saliency maps to produce visual explanations for predictions; include these in the clinician UI for auditability.
- Evaluate clinically relevant metrics: sensitivity (recall), specificity, AUC, and confusion matrix at operating thresholds.

### 4) Packaging & serving
- Export model as a saved artifact (TensorFlow SavedModel or PyTorch TorchScript / ONNX) for faster serving.
- Create a lightweight API using FastAPI or Flask that loads the model and serves prediction endpoints (POST image -> JSON probability + explanation heatmap).

### 5) Streamlit web app (simple flow)
- Build a Streamlit UI to upload an X-ray image, display the image, run inference via the loading model, show prediction probability and Grad-CAM overlay, and allow clinician feedback (correct/incorrect) to be stored for retraining.
- Example Streamlit script (simplified) — see code cell below.

### 6) Monitoring, security, and deployment
- Deploy via Docker to a cloud provider (Heroku, AWS ECS/Fargate, GCP Cloud Run). For GPUs use a suitable instance (e.g., AWS EC2 GPU or GCP GPU VM).
- Implement logging, input validation, model versioning, and CI/CD for model updates. Monitor concept drift and data distribution over time.
- Ensure compliance with medical device regulations if required; maintain audit logs and clinician sign-off flows.
