# CNN Architecture

1. What is the role of filters and feature maps in Convolutional Neural Network (CNN)?

 Ans. Filters in a Convolutional Neural Network (CNN) are small matrices of learnable weights that slide over the input data (such as an image) to detect specific local patterns, like edges, textures, or shapes. Each filter is designed, through the learning process, to extract certain features by performing a convolution operation—multiplying the filter with patches of the input and summing up the result. This process produces a new matrix called a feature map.

 Feature maps are the outputs generated by applying filters to the input data. Each feature map highlights the presence and location of a particular feature that the corresponding filter is specialized to detect. In early layers of a CNN, feature maps might capture simple features like edges or colors, while deeper layers capture more complex and abstract patterns by combining information from previous layers.


2. Explain the concepts of padding and stride in CNNs(Convolutional Neural Network). How do they affect the output dimensions of feature maps?

 Ans. Padding and stride are fundamental concepts in Convolutional Neural Networks (CNNs) that control how filters move over input data and determine the dimensions of the output feature maps produced by convolutional layers.

 Padding refers to adding extra pixels (usually zeros) around the border of the input data before the convolution operation. The main reason for using padding is to preserve the spatial dimensions of the input, especially when using bigger filters. Without padding, the output feature map becomes smaller than the input, which could lead to a rapid reduction in feature map size after several layers and possible loss of border information. Common types of padding include 'valid' (no padding, leading to smaller outputs) and 'same' (padding such that the output size remains the same as the input size).

 Stride is the step size with which the filter moves across the input data. A stride of 1 means the filter shifts one pixel at a time, leading to large and detailed output feature maps. Increasing the stride (e.g., stride = 2) causes the filter to move by more than one pixel, reducing the size of the output feature map and effectively downsampling the input.

 Both padding and stride directly affect the output dimensions. The general formula for calculating the size of the output feature map  is:

                  O = ((n + 2p - f)/s) + 1

 where O is the output size, n is the input size, p is the padding applied, f is the filter (kernel) size, and s is the stride. Adding padding increases the output size, while increasing stride reduces it.



3. Define receptive field in the context of CNNs. Why is it important for deep architectures?

 Ans. In Convolutional Neural Networks (CNNs), the receptive field refers to the specific region of the input data (such as an image) that a particular neuron or feature in a given layer "sees" or is influenced by. For the first convolutional layer, a neuron's receptive field might be a small patch, like 3×3 pixels. In deeper layers, each neuron's receptive field is composed of multiple patches from earlier layers, causing it to span a much larger area of the original input. This expansion allows the network to hierarchically learn from low-level local features to more complex and abstract representations of objects.

 The size of the receptive field is crucial for deep CNN architectures because it determines how much context the network uses to make predictions. If the receptive field is too small, neurons may only detect local patterns and miss larger objects or global context, which limits performance on tasks like object detection or scene understanding. Conversely, large receptive fields allow deep layers to capture relationships across the entire input, enabling the model to recognize complex shapes, objects, or even scenes by aggregating information from different regions.

 When designing deep architectures, a properly sized receptive field ensures that the network can adequately model the scale of objects present in the data. Techniques like increasing the number of layers, using larger kernels, or employing dilated convolutions are common ways to expand the receptive field without excessive computation or parameter growth. Thus, the receptive field helps balance local feature detection and global context understanding, which is essential for effective image analysis and high-level vision tasks.


4. Discuss how filter size and stride influence the number of parameters in a CNN.

 Ans. Filter size in a CNN determines the dimensions of the region each filter covers when scanning the input volume (for example, a 3x3 filter covers a 3-by-3 area). The number of parameters in a convolutional layer is directly influenced by the size of the filter: for each filter, the total parameters are calculated as
 k1 x k2 x C + 1, where k1 and k2 are filter width and height, C is the number of input channels, and the '+1' accounts for the bias term. Thus, larger filters have more weights, increasing the number of parameters in the layer.

 Stride refers to how far the filter shifts across the input with each move. Importantly, stride does not affect the number of parameters in a convolutional layer because the filter weights remain the same regardless of how often they are applied. Changing stride only alters the output feature map's dimensions: a large stride produces a smaller feature map and fewer activations, while a stride of one creates a larger map, but the weights and biases (the parameters) for the filter remain unchanged.



5. Compare and contrast different CNN-based architectures like LeNet,
AlexNet, and VGG in terms of depth, filter sizes, and performance.

 Ans. LeNet, AlexNet, and VGG: Comparing CNN Architectures:

 Depth: LeNet is one of the earliest and shallowest CNN architectures, typically composed of seven layers including convolution, pooling, and fully connected layers. AlexNet increased depth and complexity significantly, using eight layers (five convolutional and three fully connected), which helped it achieve breakthrough accuracy on large-scale datasets like ImageNet. VGG pushed the concept of deep learning further, featuring 16 or 19 layers (VGG16/VGG19), mostly convolutional layers stacked deeper than previous models.

 Filter Sizes: LeNet predominantly uses larger filters (e.g., 5 x 5) at its convolutional layers. AlexNet employs various filter sizes, including larger initial filters (often 11 x 11 and 5 x 5) followed by smaller ones. VGG, however, popularized the use of small 3 x 3 filters throughout its entire network, often stacking them to increase depth and capture more complex features. The choice of smaller, uniform filters in VGG has been shown to enhance feature extraction and generalization while keeping individual layers computationally manageable.

 Performance: LeNet works well for simple image classification tasks (like handwritten digit recognition) but is less effective on large, complex datasets due to its shallow structure. AlexNet, with its increased depth, use of ReLU activations, and techniques like dropout and max pooling, delivered dramatic improvements on large-scale images and won the ILSVRC 2012 competition. VGG's much deeper and uniform structure gave even higher accuracy on benchmarks such as ImageNet, with VGG-16 attaining about 92.7% top-5 accuracy. However, VGG models have a high computational demand, requiring more memory and training time due to the large number of parameters (over 130 million for VGG16).


In [3]:
# 6) Using keras, build and train a simple CNN model on the MNIST dataset from scratch. Include code for module creation, compilation, training, and evaluation.


import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0
x_train = np.expand_dims(x_train, -1)  # Shape: (num_samples, 28, 28, 1)
x_test = np.expand_dims(x_test, -1)

num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# Build the CNN model
model = keras.Sequential([
    layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(
    x_train, y_train,
    batch_size=128,
    epochs=5,
    validation_split=0.1,
    verbose=2
)

# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
422/422 - 43s - 102ms/step - accuracy: 0.9303 - loss: 0.2379 - val_accuracy: 0.9807 - val_loss: 0.0630
Epoch 2/5
422/422 - 43s - 101ms/step - accuracy: 0.9816 - loss: 0.0616 - val_accuracy: 0.9858 - val_loss: 0.0534
Epoch 3/5
422/422 - 41s - 96ms/step - accuracy: 0.9863 - loss: 0.0444 - val_accuracy: 0.9883 - val_loss: 0.0445
Epoch 4/5
422/422 - 41s - 97ms/step - accuracy: 0.9895 - loss: 0.0333 - val_accuracy: 0.9882 - val_loss: 0.0402
Epoch 5/5
422/422 - 41s - 98ms/step - accuracy: 0.9918 - loss: 0.0266 - val_accuracy: 0.9888 - val_loss: 0.0378
Test accuracy: 0.9888


In [4]:
# 7) Load and preprocess the CIFAR-10 dataset using Keras, and create a CNN model to classify RGB images. Show your preprocessing and architecture.


import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Preprocess the images: scale pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert the labels to one-hot encoded vectors
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# Define a simple CNN model for CIFAR-10
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), padding='same', activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), padding='same', activation='relu'),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes, activation='softmax')
])

# Display the model summary
model.summary()

# Optionally, compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 0us/step


In [5]:
# 8)  Using PyTorch, write a script to define and train a CNN on the MNIST dataset. Include model definition, data loaders, training loop, and accuracy evaluation.


import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Data transforms and loaders
transform = transforms.Compose([
    transforms.ToTensor(),  # converts data to PyTorch tensor and normalizes to [0,1]
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Define CNN model
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)  # output: 32x28x28
        self.pool1 = nn.MaxPool2d(2, 2)              # output: 32x14x14
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1) # output: 64x14x14
        self.pool2 = nn.MaxPool2d(2, 2)              # output: 64x7x7
        self.fc1 = nn.Linear(64*7*7, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = self.pool1(x)
        x = torch.relu(self.conv2(x))
        x = self.pool2(x)
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Instantiate the model, loss function, and optimizer
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 5
model.train()
for epoch in range(num_epochs):
    total_loss = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {total_loss/len(train_loader):.4f}')

# Evaluation (accuracy on test set)
model.eval()
correct, total = 0, 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)
print(f'Test Accuracy: {100 * correct / total:.2f}%')


100%|██████████| 9.91M/9.91M [00:01<00:00, 5.25MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 159kB/s]
100%|██████████| 1.65M/1.65M [00:01<00:00, 1.51MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 6.69MB/s]


Epoch 1/5, Loss: 0.1830
Epoch 2/5, Loss: 0.0476
Epoch 3/5, Loss: 0.0348
Epoch 4/5, Loss: 0.0257
Epoch 5/5, Loss: 0.0197
Test Accuracy: 99.32%


In [7]:
# 9) Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model.



from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import Adam
import os
import numpy as np
from PIL import Image

# Create a dummy dataset directory for demonstration
dataset_dir = 'my_dummy_dataset'
if not os.path.exists(dataset_dir):
    os.makedirs(dataset_dir)
    # Create dummy class subdirectories
    class_names = ['class1', 'class2', 'class3']
    for class_name in class_names:
        class_path = os.path.join(dataset_dir, class_name)
        os.makedirs(class_path, exist_ok=True)
        # Create some dummy images in each class
        for i in range(5):
            dummy_image = np.random.randint(0, 255, (150, 150, 3), dtype=np.uint8)
            img = Image.fromarray(dummy_image)
            img.save(os.path.join(class_path, f'image_{i}.png'))
    print(f"Created dummy dataset at '{dataset_dir}'")
else:
    print(f"Dummy dataset directory '{dataset_dir}' already exists.")

# ImageDataGenerator for preprocessing and augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    validation_split=0.2  # Reserve 20% data for validation
)

# Creating train and validation generators
train_generator = train_datagen.flow_from_directory(
    dataset_dir,
    target_size=(150, 150),
    batch_size=32,
    class_mode='categorical',
    subset='training',
    shuffle=True
)

validation_generator = train_datagen.flow_from_directory(
    dataset_dir,
    target_size=(150, 150),
    batch_size=32,
    class_mode='categorical',
    subset='validation',
    shuffle=False
)

# Define a simple CNN model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dense(train_generator.num_classes, activation='softmax')
])

# Compile the model
model.compile(optimizer=Adam(),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
# Note: With a tiny dummy dataset, training might not be meaningful or may raise warnings.
# This is just to demonstrate the code runs without FileNotFoundError.
model.fit(
    train_generator,
    validation_data=validation_generator,
    epochs=1,
    verbose=2
)

Created dummy dataset at 'my_dummy_dataset'
Found 12 images belonging to 3 classes.
Found 3 images belonging to 3 classes.


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  self._warn_if_super_not_called()


1/1 - 4s - 4s/step - accuracy: 0.3333 - loss: 1.1013 - val_accuracy: 0.3333 - val_loss: 2.8586


<keras.src.callbacks.history.History at 0x7cf877bc6120>

10. You are working on a web application for a medical imaging startup. Your task is to build and deploy a CNN model that classifies chest X-ray images into “Normal” and “Pneumonia” categories. Describe your end-to-end approach-from data preparation and model training to deploying the model as a web app using Streamlit.

 Ans.

 Data Preparation:

  Dataset Collection: Gather a labeled dataset of chest X-ray images, sorted into "Normal" and "Pneumonia" folders. A popular public dataset is the "Chest X-ray Images (Pneumonia)" dataset from Kaggle.


 Data Directory Structure:

      dataset/
        train/
          Normal/
          Pneumonia/
        val/
          Normal/
          Pneumonia/
        test/
          Normal/
          Pneumonia/


 Preprocessing and Augmentation: Use tools like Keras'ImageDataGenerator to:
    
    a. Rescale pixel values to.
    b. Apply real-time augmentation (rotations, zoom, horizontal flips) to reduce overfitting.
    c. Resize images to a suitable input size (e.g., 224×224).

 Model Building and Training:

  Model Architecture: Design a CNN or use transfer learning with pretrained models (e.g., MobileNet, ResNet) for better performance on small medical datasets.

  Example (Transfer Learning):

  

```
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)

model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
    layer.trainable = False  # Freeze base model layers at first

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

```

 Training:

  a. Use model.fit() with your training and validation data generators.

  b. Monitor validation accuracy and use callbacks like ModelCheckpoint and EarlyStopping.


 Model Evaluation and Saving
   
   a. Evaluate the model using the separate test set and record the metrics (accuracy, precision, recall, AUC).

   b. Save the trained model as an .h5 or .keras file:
          
          model.save("chest_xray_cnn_model.h5")

 Deploying as a Web App (Streamlit)

   a. Setup Streamlit Project: Install Streamlit (pip install streamlit) and create an app.py file.

   b. App Interface:

       i. Upload X-ray image.
       ii. Preprocess the image as during model training (resize, scale).
       iii. Load the saved model using keras.models.load_model.
       iv. Perform prediction and display the result ("Normal" or "Pneumonia") with probability.

   c. Streamlit Example:



```
import streamlit as st
from PIL import Image
import numpy as np
from tensorflow.keras.models import load_model

model = load_model('chest_xray_cnn_model.h5')

st.title("Chest X-ray Classifier")
uploaded_file = st.file_uploader("Upload a Chest X-ray image", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    image = Image.open(uploaded_file).convert('RGB').resize((224, 224))
    st.image(image, caption='Uploaded Image', use_column_width=True)
    img_array = np.array(image) / 255.0
    img_expanded = np.expand_dims(img_array, axis=0)
    prediction = model.predict(img_expanded)[0][0]
    label = "Pneumonia" if prediction > 0.5 else "Normal"
    st.write(f"Prediction: **{label}** (Probability: {prediction:.2f})")

```

  Deployment

     a. Deploy the Streamlit app to a cloud service (e.g., Streamlit Community Cloud, Heroku, AWS EC2, or Google Cloud Run).
     b. Ensure your requirements (requirements.txt) and model file are included.
     c. Optionally, secure the app and monitor for performance/reliability.
