# CNN Architecture

1. What is the role of filters and feature maps in Convolutional Neural
Network (CNN)?

 - Filters and feature maps are the foundational components of the Convolutional Layer in a Convolutional Neural Network, working together to extract meaningful, hierarchical features from input data, typically images.

 - Filters are small, learnable, two-dimensional matrices of weights. They act as feature detectors within the CNN.

    - Feature Detection: The primary role of a filter is to detect and emphasize a specific, localized visual pattern in the input data.
      - In the initial layers of a CNN, different filters learn to detect very low-level features, such as edges, corners, or specific textures.
      - In deeper layers, filters learn to combine these low-level features to recognize increasingly complex, abstract patterns, such as object parts or entire objects.
    - Convolution Operation: The filter slides across the entire width and height of the input. At each position, it performs an element-wise multiplication between its weights and the corresponding input patch, then sums the results. This single sum becomes one pixel in the output.
    - Parameter Sharing: The same filter is applied across the entire input. This not only significantly reduces the number of learnable parameters but also grants the network translational equivariance, meaning it can detect the specific pattern regardless of where it appears in the image.
    - Learned Weights: The numerical values within each filter are not hand-designed; they are learned during the network's training process using backpropagation to optimize performance for the specific task.

  - A Feature Map is the output generated by one filter after it has been convolved over the entire input.
    - Location and Strength of Features: The feature map is a 2D array of activation values. Each value represents the response of the specific filter at a particular spatial location in the input.
      - A high value in the feature map indicates that the feature the filter is designed to detect was strongly present in the corresponding region of the input.
      - A low value indicates the feature was absent or weak.
    - Feature Representation: The feature map is essentially a transformed representation of the input, highlighting where a specific feature exists. If a convolutional layer uses $N$ different filters, it will produce $N$ different feature maps, each highlighting the presence of the pattern learned by its corresponding filter. These maps are often stacked together, forming the 3D output of the convolutional layer.
    - Input to Next Layer: The stack of feature maps from one convolutional layer serves as the input for the next layer, allowing the network to build a hierarchy of feature ,combining simple features from early layers into more complex ones in deeper layers.


2. Explain the concepts of padding and stride in CNNs(Convolutional Neural
Network). How do they affect the output dimensions of feature maps?

 - Padding and Stride are two crucial hyperparameters in the convolutional layer of a CNN that control the spatial arrangement and size of the output feature maps.
   1. StrideStride defines the number of steps the filter shifts across the input volume during the convolution operation.
   - Concept: If the stride is $S$, the filter moves $S$ pixels horizontally, then $S$ pixels vertically, and repeats.
   - Default: The most common stride value is 1.
   - Effect on Output: A larger stride causes the filter to skip over input regions, resulting in a smaller output feature map. It also reduces the computational load.
   2. Padding involves adding extra rows and columns of values to the border of the input volume before the convolution operation.
   - Concept: Padding is used primarily for two reasons:
     -  Preserve Spatial Size: It allows you to maintain the spatial dimensions of the input volume after the convolution. Without padding, the output feature map is always smaller than the input.
     - Ensure Edge Features: It allows the filter to center itself over the pixels at the boundaries of the input, ensuring that the features at the edges are processed equally, which would otherwise be underrepresented.
   - Types:
     - "Valid" Padding (No Padding): No extra zeros are added. The output size will shrink.
     - "Same" Padding: Enough zero-padding is added to ensure that the output feature map has the same spatial dimensions as the input feature map. The required padding $P$ is calculated based on the filter size $F$.


3. Define receptive field in the context of CNNs. Why is it important for deep
architectures?

 - The receptive field of a neuron in a Convolutional Neural Network is the specific region of the input image that influences the activation of that neuron. It defines the maximum spatial extent of the input that the neuron can "see" and use to compute its feature.
   - An individual neuron in the first layer has a receptive field equal to the size of the filter.
   - As you move deeper into the network, a neuron in a subsequent layer is connected to a patch of the previous layer's feature map. Because that patch itself is connected to a larger area of the original input, the receptive field of the deep neuron grows relative to the original input image.

 - The controlled growth of the receptive field is arguably the most critical architectural concept that makes deep CNNs effective for complex image tasks.It facilitates the creation of a spatial feature hierarchy:

   1. Feature Hierarchy and Context: Deep architectures are designed to extract features at multiple levels of abstraction:
      - Shallow Layers: Neurons have small receptive fields, allowing them to detect low-level, localized features like edges, corners, and specific textures.
      - Deep Layers: As the receptive field expands, neurons in deeper layers can combine the low-level features into high-level, complex features that represent object parts or even entire objects. This global context is essential for tasks like object recognition and scene understanding.
    2. Global Understanding: Many computer vision tasks require understanding the context of the entire image:
       - Image Classification: To classify an image as "cat," the final layer must have a receptive field that covers a large portion of the input to confirm the presence of all relevant features.
       - Semantic Segmentation: For pixel-wise prediction, each output pixel needs a large receptive field to incorporate surrounding context, preventing the network from making decisions based only on local texture.

   3. Parameter Efficiency:
      - The CNN achieves this large receptive field by stacking small filters rather than using one massive filter.
      - This stacked approach is more parameter-efficient, yet it achieves the same spatial coverage, making deep models practical to train.


4. Discuss how filter size and stride influence the number of parameters in a
CNN.

 - The number of learnable parameters in a Convolutional Neural Network is a crucial factor influencing model complexity, training time, and memory usage. Both filter size and stride directly influence this number, though they do so in different ways.

   1. Influence of Filter Size on Parameters
   - The filter size has a direct and linear influence on the number of parameters within a single convolutional layer.
   - Calculation: The total number of parameters in a convolutional layer is calculated as: $$\text{Parameters} = (F \times F \times N_{in} + 1) \times N_{out}$$
   - Direct Impact: If you increase the filter size $F$, the $F \times F$ term in the equation grows quadratically, leading to a significant increase in parameters for that layer.

   2. Influence of Stride on ParametersThe stride has no direct influence on the number of learnable parameters in a convolutional layer.
   - Parameter Definition: The learnable parameters are the weights of the filters and the bias terms. These are defined by the filter size, the input channels, and the output channels.Stride is a hyperparameter that defines how the convolution operation is performed, but it does not change the internal definition of the filter itself.
   - Indirect Impact: While stride doesn't change the parameters of the current layer, it has a profound indirect influence on the parameter count of subsequent layers:
      - A larger stride reduces the spatial dimensions of the current layer's output feature map.
      - If the next layer is a fully connected layer, the total number of connections in that FC layer is proportional to the size of the feature map it receives. A smaller feature map drastically reduces the parameter count in the subsequent FC layer.
      - If the next layer is another convolutional layer, a smaller feature map size reduces the memory and computation required for that subsequent convolution, but the number of parameters in the subsequent convolution itself remains determined only by its own filter size and channel counts.


5. Compare and contrast different CNN-based architectures like LeNet,AlexNet, and VGG in terms of depth, filter sizes, and performance.

 - LeNet
    - Depth / structure: Shallow: ~5-7 layers (Conv → Pool → Conv → Pool → FC).
    - Filter sizes: Typically 5*5 in conv layers.
    - Dataset / use: Designed for small grayscale images.
    - Activation / normalization: Originally used sigmoid/tanh; no batch norm.
    - Performance: Good for digit recognition; too small for large-scale images.
    - Notes: Pioneering architecture that introduced conv + subsampling and end-to-end training.

 - AlexNet
    - Depth: ~8 learned layers. Much deeper than LeNet.
    - Filter sizes: Early layer used large 11*11 filters followed by 5*5, 33.
    - Dataset: ImageNet — demonstrated large-scale viability.
    - Activation / normalization: ReLU activations, dropout, data augmentation; used GPUs and local response norm.
    - Performance: Big leap in ImageNet accuracy.
    - Notes: Introduced practices still used today: ReLU, aggressive augmentation, dropout, GPU training.

 - VGG
    - Depth: Very deep — common variants: VGG16, VGG19.
    - Filter sizes: Uniform use of 3*3 conv filters.
    - Design philosophy: Replace large filters with multiple 3*3 to get more nonlinearity and fewer params per receptive field.
    - Performance: Strong ImageNet performance; descriptors useful for transfer learning.
    - Cost: High parameter count and compute.
    - Notes: Simplicity and uniformity made it extremely popular for feature extraction.

6. Using keras, build and train a simple CNN model on the MNIST dataset
from scratch. Include code for module creation, compilation, training, and evaluation.

- Answer
      
      import tensorflow as tf
      from tensorflow.keras import layers, models, datasets

      # 1 Load + preprocess
      (x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
      x_train = x_train.reshape(-1,28,28,1).astype("float32")/255.0
      x_test  = x_test.reshape(-1,28,28,1).astype("float32")/255.0

       # 2 Model creation
       model = models.Sequential([
          layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
          layers.BatchNormalization(),
          layers.MaxPooling2D((2,2)),
          layers.Conv2D(64, (3,3), activation='relu'),
          layers.BatchNormalization(),
          layers.MaxPooling2D((2,2)),
          layers.Flatten(),
          layers.Dense(128, activation='relu'),
          layers.Dropout(0.4),
          layers.Dense(10, activation='softmax')
       ])

      # 3 Compile
      model.compile(optimizer='adam',
                   loss='sparse_categorical_crossentropy',
                   metrics=['accuracy'])

      model.summary()

      # 4 Train
      history = model.fit(x_train, y_train, epochs=6, batch_size=128, validation_split=0.1)

      # 5 Evaluate
      test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
      print(f"Test accuracy: {test_acc:.4f}")

7.  Load and preprocess the CIFAR-10 dataset using Keras, and create a
CNN model to classify RGB images. Show your preprocessing and architecture.

 - Answer

       import tensorflow as tf
       from tensorflow.keras import layers, models, datasets
       from tensorflow.keras.preprocessing.image import ImageDataGenerator

       # 1) Load
       (x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
       x_train = x_train.astype("float32")/255.0
       x_test  = x_test.astype("float32")/255.0

       # 2) Data augmentation
       datagen = ImageDataGenerator(
           rotation_range=15,
           width_shift_range=0.1,
           height_shift_range=0.1,
           horizontal_flip=True,
           validation_split=0.1
       )
       train_gen = datagen.flow(x_train, y_train, batch_size=64, subset='training')
       val_gen   = datagen.flow(x_train, y_train, batch_size=64, subset='validation')

       # 3) Model architecture (simple but effective)
       model = models.Sequential([
           layers.Conv2D(32, (3,3), padding='same', activation='relu', input_shape=(32,32,3)),
           layers.BatchNormalization(),
           layers.Conv2D(32, (3,3), padding='same', activation='relu'),
           layers.BatchNormalization(),
           layers.MaxPooling2D((2,2)),
           layers.Dropout(0.25),

           layers.Conv2D(64, (3,3), padding='same', activation='relu'),
           layers.BatchNormalization(),
           layers.Conv2D(64, (3,3), padding='same', activation='relu'),
           layers.BatchNormalization(),
           layers.MaxPooling2D((2,2)),
           layers.Dropout(0.25),

           layers.Flatten(),
           layers.Dense(512, activation='relu'),
           layers.BatchNormalization(),
           layers.Dropout(0.5),
           layers.Dense(10, activation='softmax')
       ])

       model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
       model.summary()

       # 4) Train
       model.fit(train_gen, epochs=30, validation_data=val_gen)

       # 5) Evaluate
       print(model.evaluate(x_test, y_test))

8.  Using PyTorch, write a script to define and train a CNN on the MNIST
dataset. Include model definition, data loaders, training loop, and accuracy evaluation.

- Answer
           
      import torch
      import torch.nn as nn
      import torch.optim as optim
      from torchvision import datasets, transforms
      from torch.utils.data import DataLoader

      device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

      # 1) Data loaders
      transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
      train_ds = datasets.MNIST('data', train=True, download=True, transform=transform)
      test_ds  = datasets.MNIST('data', train=False, transform=transform)

      train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)
      test_loader  = DataLoader(test_ds, batch_size=1000)

      # 2) Model
      class SimpleCNN(nn.Module):
          def __init__(self):
              super().__init__()
              self.conv = nn.Sequential(
                  nn.Conv2d(1, 32, 3, 1), nn.ReLU(),
                  nn.Conv2d(32, 64, 3, 1), nn.ReLU(),
                  nn.MaxPool2d(2),
                  nn.Dropout(0.25)
              )
              self.fc = nn.Sequential(
                  nn.Flatten(),
                  nn.Linear(64*12*12, 128),
                  nn.ReLU(),
                  nn.Dropout(0.5),
                  nn.Linear(128, 10)
              )

          def forward(self, x):
              x = self.conv(x)
              return self.fc(x)
      
      model = SimpleCNN().to(device)
      optimizer = optim.Adam(model.parameters(), lr=1e-3)
      criterion = nn.CrossEntropyLoss()
      
      # 3) Training loop
      def train(epoch):
          model.train()
          total_loss = 0
          for batch_idx, (data, target) in enumerate(train_loader):
              data, target = data.to(device), target.to(device)
              optimizer.zero_grad()
              output = model(data)
              loss = criterion(output, target)
              loss.backward()
              optimizer.step()
              total_loss += loss.item()
          print(f"Epoch {epoch} - Train loss: {total_loss/len(train_loader):.4f")

      def test():
          model.eval()
          correct = 0
          with torch.no_grad():
              for data, target in test_loader:
                  data, target = data.to(device), target.to(device)
                  output = model(data)
                  pred = output.argmax(dim=1)
                  correct += pred.eq(target).sum().item()
          acc = correct / len(test_ds)
          print(f"Test accuracy: {acc:.4f}")
          return acc

      for epoch in range(1, 6):
          train(epoch)
          test()

9. Given a custom image dataset stored in a local directory, write code using
Keras ImageDataGenerator to preprocess and train a CNN model.

- Answer

      from tensorflow.keras.preprocessing.image import ImageDataGenerator
      from tensorflow.keras import layers, models

      # Directory structure:
      # dataset/
      #   train/
      #     classA/
      #     classB/
      #   val/
      #     classA/
      #     classB/

       train_datagen = ImageDataGenerator(
          rescale=1./255,
          rotation_range=20,
          width_shift_range=0.1,
          height_shift_range=0.1,
          shear_range=0.1,
          zoom_range=0.1,
          horizontal_flip=True
      )

      val_datagen = ImageDataGenerator(rescale=1./255)

      train_generator = train_datagen.flow_from_directory(
          'dataset/train',
          target_size=(150,150),
          batch_size=32,
          class_mode='categorical'
      )

       val_generator = val_datagen.flow_from_directory(
          'dataset/val',
          target_size=(150,150),
          batch_size=32,
          class_mode='categorical'
      )

       # Simple CNN
       model = models.Sequential([
         layers.Conv2D(32, (3,3), activation='relu', input_shape=(150,150,3)),
         layers.MaxPooling2D(2,2),
         layers.Conv2D(64, (3,3), activation='relu'),
         layers.MaxPooling2D(2,2),
         layers.Flatten(),
         layers.Dense(128, activation='relu'),
         layers.Dense(train_generator.num_classes, activation='softmax')
      ])
     
      model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

      # Train
      model.fit(train_generator, validation_data=val_generator, epochs=15)

      # Save model
      model.save('custom_cnn.h5')


10.  You are working on a web application for a medical imaging startup. Your
task is to build and deploy a CNN model that classifies chest X-ray images into “Normal” and “Pneumonia” categories. Describe your end-to-end approach–from data preparation and model training to deploying the model as a web app using Streamlit.

 - Answer
    1. Data & preparation
    - Dataset sources: e.g., public chest X-ray datasets.
    - Directory structure: data/train/Normal, data/train/Pneumonia, similarly for val and test.
    - Preprocessing:
      - Convert to RGB, resize.
      - Normalize pixel values.
      - Apply data augmentation— be conservative with augmentations to avoid unrealistic images.
    - Class imbalance: use class weights or oversampling. Keep a held-out test set from a different hospital/source if possible for robustness.

    2. Model selection & training
    - Recommendation: Use transfer learning— e.g., DenseNet121, ResNet50, EfficientNetB0. DenseNet variants are popular for chest X-ray tasks.
    - Architecture skeleton (Keras):

          from tensorflow.keras.applications import DenseNet121
          from tensorflow.keras import layers, models
          from tensorflow.keras.optimizers import Adam

          base = DenseNet121(weights='imagenet', include_top=False, input_shape=(224,224,3))
          base.trainable = False  # freeze initially

          model = models.Sequential([
              base,
              layers.GlobalAveragePooling2D(),
              layers.Dropout(0.5),
              layers.Dense(256, activation='relu'),
              layers.Dropout(0.3),
              layers.Dense(1, activation='sigmoid')  # binary classification
          ])

          model.compile(optimizer=Adam(1e-4), loss='binary_crossentropy', metrics=['accuracy', 'AUC'])

    3. Validation & evaluation

    - Evaluate on a held-out test set and, if possible, an external dataset from a different hospital.
    - Report: ROC curve, AUC, sensitivity, specificity, precision, recall, confusion matrix.
    - Use Grad-CAM or other explainability maps to visualize salient regions.

    4. Model packaging & saving
    - Save model (model.save('xray_model.h5')) and optionally create a lightweight inference wrapper that does preprocessing and returns probability + explanation maps.

    5. Streamlit web app (simple inference UI)

     # app.py (Streamlit)
import streamlit as st
import tensorflow as tf
from PIL import Image
import numpy as np

# Load model (ensure the model file is in the same folder)
model = tf.keras.models.load_model('xray_model.h5')

st.title("Chest X-ray: Normal vs Pneumonia (Demo)")

uploaded_file = st.file_uploader("Upload a chest X-ray image (jpg/png)", type=['jpg','jpeg','png'])
if uploaded_file is not None:
    img = Image.open(uploaded_file).convert('RGB').resize((224,224))
    st.image(img, caption='Uploaded image', use_column_width=True)
    x = np.array(img).astype('float32')/255.0
    x = np.expand_dims(x, axis=0)  # batch dimension

        prob = model.predict(x)[0][0]
         st.write(f"Model probability (Pneumonia): {prob:.3f}")
    
         if prob > 0.5:
        st.error("Prediction: Pneumonia — probability {:.2f}".format(prob))
         else:
            st.success("Prediction: Normal — probability {:.2f}".format(1-prob))

          # Optionally: show Grad-CAM heatmap (if implemented)
