1.What is the role of filters and feature maps in Convolutional Neural
Network (CNN)?
- Filters (also called kernels) and feature maps are fundamental components of a Convolutional Neural Network (CNN) that work together to enable the network to learn and represent important features from input data, such as images.

- Filters are small learnable matrices of weights that slide (convolve) over the input data. Each filter acts as a specialized feature detector, capturing local patterns like edges, corners, textures, or more complex shapes depending on the network depth. During training, CNNs learn the optimal filter weights to detect the most discriminative features for the task at hand, such as image classification or object detection. Filters help the network achieve translation invariance, meaning they can detect the same patterns regardless of their location in the input.

- Feature maps are the outputs generated by convolving these filters over the spatial dimensions of the input. Each feature map corresponds to one filter and highlights the spatial locations where the filter’s pattern is detected. Multiple filters produce multiple feature maps, each emphasizing different aspects or patterns in the input. Feature maps can be seen as spatial activations showing the presence and intensity of specific features, which become progressively more abstract deeper in the network

- Filters perform localized convolutions to extract specific features from the input.

- Feature maps are the resulting spatial representations that encode how strongly and where those features appear.

- Together, filters and feature maps enable CNNs to transform raw input data into meaningful hierarchical feature representations for further processing and prediction.

- This interaction allows CNNs to recognize complex structures, reduce input dimensionality, and generalize effectively across positions and scales in visual data.


2.Explain the concepts of padding and stride in CNNs(Convolutional Neural Network). How do they affect the output dimensions of feature maps?
- Padding and stride in CNNs can be simply explained as follows:

- Padding is when you add extra pixels around the edge of an image (usually zeros) before applying a filter. This makes sure the size of the output feature map doesn’t get smaller after the convolution. For example, if you want the output size to be the same as the input size, you use padding. Without padding, the feature map shrinks because the filter can’t fully slide over the image edges.

- Stride is the number of pixels the filter moves each time it slides over the image. A stride of 1 means the filter moves one pixel at a time, capturing detailed information and creating a larger output. A larger stride (like 2) means the filter jumps more pixels, which makes the output smaller because it scans fewer positions.

- How padding and stride affect output size:

  - More padding keeps the output size bigger or the same as the input.

  - Larger stride makes the output smaller by skipping pixels.

  - Padding keeps output size from shrinking.

  - Stride controls how much the output size reduces by moving the filter faster over the input.

  - This helps CNNs balance between detail and computation efficiency.

3. Define receptive field in the context of CNNs. Why is it important for deep architectures?
- In CNNs, the receptive field is the area of the input image that influences or "sees" the value of a particular feature in a specific layer. It tells us how much of the original input each neuron in the network is looking at.

- The receptive field starts small in early layers, where neurons focus on small parts of the image (like edges or textures). As we go deeper into the network, the receptive field gets larger, allowing neurons to "see" bigger parts of the image and recognize more complex patterns, like objects or scenes.

- The receptive field is important because:

  - It ensures that deeper layers capture enough context of the input to make meaningful decisions.

  - For tasks like object detection or segmentation, a large enough receptive field ensures the network considers the whole object, not just a small part.

  - It guides how deep or complex a network needs to be to capture the necessary information.

  - the receptive field defines the size of the input region that affects each output unit, and larger receptive fields in deep CNNs help the model understand global and detailed patterns at once.

4.Discuss how filter size and stride influence the number of parameters in a CNN.
- In simple and clear terms, here is how filter size and stride influence the number of parameters in a Convolutional Neural Network (CNN):

- Filters and Parameters
  - Filter Size: A filter (or kernel) is a small grid of numbers (weights) used to scan through the input image or feature map and detect features like edges or textures. The filter size is usually something like 3x3 or 5x5.

  - Channels: Images have multiple channels (for example, three channels for RGB color images). The filter size applies to all these input channels. So, if the filter is 3x3 and the input has 3 channels, the filter weights cover all 3x3x3=27


  - Number of Filters: A CNN layer has many such filters, like 32 or 64 filters. Each filter learns to look for a different feature.

  - Calculating Parameters: Each weight in the filter is a parameter to learn. We also include one additional parameter per filter called bias. So, the formula for parameters in a convolutional layer is:

- Stride and Parameters
  - Stride means how many pixels the filter moves at a time while scanning the image.

  - Stride affects the size (width and height) of the output feature map. A larger stride means the filter "jumps" further, resulting in a smaller output size.

  - Importantly, stride does not affect the number of parameters, because the filter weights stay the same regardless of how the filter moves.

- Bigger filters mean more weights to learn and more parameters.

- More filters also mean more parameters.

- Stride changes the output size but NOT the number of parameters.

- This understanding helps design CNNs that balance model complexity (number of parameters) and computational efficiency depending on the task and data


5. Compare and contrast different CNN-based architectures like LeNet,AlexNet, and VGG in terms of depth, filter sizes, and performance.
- Here is a descriptive comparison of CNN-based architectures LeNet, AlexNet, and VGG based on their depth, filter sizes, and performance:

- LeNet
  - Depth: LeNet is a relatively shallow network with about 7 layers in total (3 convolutional layers followed by 2 subsampling (pooling) layers and 2 fully connected layers).

  - Filter Sizes: It uses relatively large filters in the first layers, such as 5x5 kernels.

  - Performance: LeNet was one of the earliest CNNs and was designed for simple image recognition tasks like digit recognition (MNIST). It works well for small images but is less effective on complex datasets.

  - Characteristics: Simpler and uses sigmoid or tanh activations originally; computationally inexpensive but not very deep.

- AlexNet
  - Depth: AlexNet is deeper than LeNet, with 8 layers (5 convolutional + 3 fully connected).

  - Filter Sizes: The first convolutional layer uses large 11x11 filters with stride 4 for downsampling, then follow smaller filters like 5x5 and 3x3 in subsequent layers.

  - Performance: AlexNet marked a breakthrough by winning the 2012 ImageNet challenge, significantly improving image classification accuracy on large, diverse datasets.

  - Characteristics: Introduced ReLU activation for faster training, used overlapping max-pooling (3x3 kernel, stride 2), and leveraged GPUs for efficient training. It has about 60 million parameters.

  - Impact: Started the era of modern deep CNNs with increased depth and computational power.

- VGG
  - Depth: VGG is much deeper, most commonly configured with 16 or 19 layers, using small 3x3 filters stacked sequentially.

  - Filter Sizes: Uses only 3x3 convolutional filters consistently throughout the network but stacks many such layers to imitate large receptive fields.

  - Performance: VGG achieved very high accuracy in image recognition (ImageNet), demonstrating that increasing depth with small filters can improve feature representation and overall performance.

  - Characteristics: Uses uniform architecture with repeated blocks of convolutions followed by max-pooling, simple and elegant design, but very computationally expensive with around 138 million parameters.

  - Impact: Popularized the idea that deeper networks with small filters are more effective, influencing many future architectures

In [1]:
#6.Using keras, build and train a simple CNN model on the MNIST dataset from scratch. Include code for module creation, compilation, training, and evaluation.
# Import necessary modules
import numpy as np
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical

# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize pixel values to range [0,1]
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# Reshape images to add channel dimension (28, 28, 1)
train_images = np.expand_dims(train_images, -1)
test_images = np.expand_dims(test_images, -1)

# One-hot encode labels
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)

# Build CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),  # Conv layer with 32 filters
    MaxPooling2D((2, 2)),  # Downsample with max pooling
    Conv2D(64, (3, 3), activation='relu'),  # More filters to learn complex features
    MaxPooling2D((2, 2)),
    Flatten(),  # Flatten feature maps into 1D vector
    Dense(64, activation='relu'),  # Fully connected hidden layer
    Dense(10, activation='softmax')  # Output layer with 10 units for 10 classes
])

# Compile model with optimizer, loss, and metrics
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels,
          epochs=5,
          batch_size=64,
          validation_split=0.2)

# Evaluate on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_acc:.4f}')


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 54ms/step - accuracy: 0.8541 - loss: 0.4713 - val_accuracy: 0.9732 - val_loss: 0.0843
Epoch 2/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 49ms/step - accuracy: 0.9790 - loss: 0.0674 - val_accuracy: 0.9807 - val_loss: 0.0637
Epoch 3/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 49ms/step - accuracy: 0.9862 - loss: 0.0435 - val_accuracy: 0.9871 - val_loss: 0.0453
Epoch 4/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 48ms/step - accuracy: 0.9905 - loss: 0.0314 - val_accuracy: 0.9879 - val_loss: 0.0433
Epoch 5/5
[1m750/750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 48ms/step - accuracy: 0.9926 - loss: 0.0235 - val_accuracy: 0.9858 - val_loss: 0.0467
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.9880 - loss: 0.0377
Test accuracy: 0.9904


In [2]:
#7. Load and preprocess the CIFAR-10 dataset using Keras, and create a CNN model to classify RGB images. Show your preprocessing and architecture.

# Import necessary libraries
import numpy as np
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values to range [0,1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode target labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Define CNN architecture
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Summary of the model
model.summary()


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [5]:
#8.Using PyTorch, write a script to define and train a CNN on the MNIST dataset. Include model definition, data loaders, training loop, and accuracy evaluation.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define transformations for MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # Mean and std for MNIST
])

# Load MNIST training and test datasets
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders for batch processing
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Define CNN model
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)  # input channels=1 (grayscale), output=32 filters
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout1 = nn.Dropout2d(0.25)
        self.fc1 = nn.Linear(9216, 128)  # 9216 = 64 * 12 * 12 (computed after conv and pooling)
        self.dropout2 = nn.Dropout(0.5)
        self.fc2 = nn.Linear(128, 10)  # 10 classes for digits 0-9

    def forward(self, x):
        x = F.relu(self.conv1(x))      # Conv1 + ReLU
        x = F.relu(self.conv2(x))      # Conv2 + ReLU
        x = self.pool(x)               # Max pooling
        x = self.dropout1(x)           # Dropout
        x = torch.flatten(x, 1)        # Flatten from (batch, features, h, w) to (batch, features)
        x = F.relu(self.fc1(x))        # Fully connected + ReLU
        x = self.dropout2(x)           # Dropout
        x = self.fc2(x)                # Output layer
        return F.log_softmax(x, dim=1) # Log probabilities for NLLLoss

# Set device (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create model instance and move to device
model = CNN().to(device)

# Define optimizer and loss function
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.NLLLoss()

# Training loop
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()          # Zero the gradients
        output = model(data)           # Forward pass
        loss = criterion(output, target)  # Calculate loss
        loss.backward()                # Backpropagation
        optimizer.step()               # Update weights

        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)}'
                  f' ({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

# Evaluation function
def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item() * data.size(0)  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)}'
          f' ({accuracy:.2f}%)\n')
    return accuracy

# Run training and testing
num_epochs = 5
for epoch in range(1, num_epochs + 1):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)



Test set: Average loss: 0.0441, Accuracy: 9855/10000 (98.55%)


Test set: Average loss: 0.0405, Accuracy: 9881/10000 (98.81%)


Test set: Average loss: 0.0340, Accuracy: 9880/10000 (98.80%)


Test set: Average loss: 0.0322, Accuracy: 9897/10000 (98.97%)


Test set: Average loss: 0.0289, Accuracy: 9913/10000 (99.13%)



In [12]:
#9. Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model.

import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Suppose you have image data as numpy arrays
# images: a numpy array of shape (num_samples, height, width, channels)
# labels: corresponding labels, integer encoded (num_samples,)

# Example placeholder arrays (replace with your actual data loaded in numpy arrays)
images = np.random.random((1000, 150, 150, 3))  # 1000 RGB images of 150x150
labels = np.random.randint(0, 2, 1000)          # Binary labels

# Normalize pixel values
images = images.astype('float32') / 255.0

# One-hot encode labels assuming two classes
labels = to_categorical(labels, num_classes=2)

# Create an ImageDataGenerator with augmentation
datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

# Flow from numpy arrays (batches of images and labels)
train_generator = datagen.flow(images, labels, batch_size=32)

# Define a simple CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
    MaxPooling2D(2, 2),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(2, activation='softmax')
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model using the generator
model.fit(train_generator, steps_per_epoch=len(images)//32, epochs=10)




  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  self._warn_if_super_not_called()


Epoch 1/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 709ms/step - accuracy: 0.5046 - loss: 0.6938
Epoch 2/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 494us/step - accuracy: 0.5938 - loss: 0.6922 
Epoch 3/10




[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 736ms/step - accuracy: 0.5161 - loss: 0.6930
Epoch 4/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 383us/step - accuracy: 0.5000 - loss: 0.6932 
Epoch 5/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 717ms/step - accuracy: 0.5019 - loss: 0.6932
Epoch 6/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 380us/step - accuracy: 0.5938 - loss: 0.6917 
Epoch 7/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 720ms/step - accuracy: 0.4899 - loss: 0.6934
Epoch 8/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 402us/step - accuracy: 0.5938 - loss: 0.6921 
Epoch 9/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 714ms/step - accuracy: 0.5183 - loss: 0.6930
Epoch 10/10
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 376us/step - accuracy: 0.5625 - loss: 0.6921 


<keras.src.callbacks.history.History at 0x7edb39037290>

10.You are working on a web application for a medical imaging startup. Your
task is to build and deploy a CNN model that classifies chest X-ray images into “Normal”
and “Pneumonia” categories. Describe your end-to-end approach–from data preparation
and model training to deploying the model as a web app using Streamlit.

- To build and deploy a CNN model that classifies chest X-ray images into "Normal" and "Pneumonia" categories for a medical imaging web app, here is an end-to-end approach:

- Data Preparation
  - Dataset: Use a publicly available chest X-ray dataset (e.g., the Pneumonia dataset by Kermany et al.) or proprietary clinical data.

  - Data Organization: Organize images into two folders, Normal and Pneumonia, for training, validation, and test sets.

  - Preprocessing: Resize images to a fixed input size (e.g., 150x150 pixels). Normalize pixel values to. Use data augmentation like random rotation, shifting, and flipping to increase dataset variability and reduce overfitting.

  - Data Loading: Use Keras ImageDataGenerator to load, batch, and augment images from directories.

- Model Training
  - Model Architecture: Build a CNN with multiple convolutional layers followed by max pooling, ReLU activations, dropout for regularization, and fully connected layers ending with a sigmoid or softmax layer for binary classification.

  - Transfer Learning (optional): Use pre-trained models like VGG16, ResNet, or DenseNet as feature extractors and fine-tune on the pneumonia dataset to boost performance with limited data.

  - Compile: Use Adam optimizer, binary cross-entropy loss for two classes, and metrics like accuracy and AUC.

  - Training: Train the model with mini-batches, validate on a separate set, monitor for overfitting, and save the best model weights.

- Model Evaluation and Testing
  - Evaluate on a held-out test set measuring accuracy, precision, recall, F1-score, and AUC.

  - Visualize confusion matrices and ROC curves to understand model strengths and weaknesses.

  - Perform error analysis to assess misclassified cases and consider lung segmentation or image enhancement preprocessing if needed.

- Model Deployment Using Streamlit
  - Model Export: Save the trained model using Keras .h5 or TensorFlow SavedModel format.

  - Streamlit App:

    - Build a UI that allows users to upload chest X-ray images.

    - Preprocess uploaded images to the required input size and normalization.

    - Load the trained CNN model.

    - Run inference on uploaded images to classify them as "Normal" or "Pneumonia."

    - Display the prediction and optionally confidence score.

  - Deployment:

    - Deploy the Streamlit app on a cloud platform (e.g., Streamlit Cloud, AWS, Azure).

    - Ensure the app supports GPU acceleration if needed and has necessary integrations for patient data privacy.

Example Components in Code
  - Use ImageDataGenerator for data augmentation and loading.

  - Define CNN architecture or load pre-trained models from Keras.

  - Use model.fit() to train, model.evaluate() to test.

  - Streamlit app structure to handle image upload, preprocessing, and prediction display.