## **CNN Architecture | Vikash Kumar | wiryvikash15@gmail.com**

**1. What is the role of filters and feature maps in Convolutional Neural Network (CNN)?**

In a Convolutional Neural Network (CNN), filters and feature maps are the core components that enable the network to learn and detect patterns in input data (like images).

**Filters (or Kernels):**

- A filter is a small matrix of learnable weights. For example, a $3 \times 3$ or $5 \times 5$ matrix.

- Role: Its role is to act as a feature detector. Each filter is trained to detect a specific, low-level feature, such as a vertical edge, a horizontal edge, a specific color, a corner, or a simple texture.

- Process: The filter "slides" (or convolves) across the entire input image, one patch at a time. At each position, it computes a dot product between its own weights and the pixel values of the image patch it's currently covering. This operation produces a single number.

**Feature Maps (or Activation Maps):**

- A feature map is the 2D matrix (an image) that results from applying one filter across the entire input.

- Role: Its role is to show the presence and location of the specific feature that the filter was trained to detect.

- Process: The collection of numbers from the filter's dot products at every position forms the feature map. A high-value (high activation) in the feature map means the feature was strongly detected at that location in the input. A low value means the feature was not present.

- A single convolutional layer typically learns multiple filters (e.g., 32 or 64), each detecting a different feature. Therefore, a convolutional layer produces a stack of 32 or 64 feature maps as its output, one for each filter.

**2. Explain the concepts of padding and stride in CNNs (Convolutional Neural Network). How do they affect the output dimensions of feature maps?**

Padding and stride are two crucial hyperparameters in a convolutional layer that control the mechanics of the convolution operation and the spatial size of the resulting feature map.

**Padding:**

   - Concept: Padding refers to the process of adding extra pixels (usually with a value of 0) around the border of an input image or feature map before the convolution operation.

   - Purpose:
   
      - Preserving Spatial Dimensions: Without padding, each convolution operation shrinks the output size. This rapid shrinking limits the number of layers a network can have. Padding (specifically "same" padding) allows the output feature map to have the same width and height as the input.
      - Improving Edge Detection: Filters can only be centered on pixels that are not on the edge. This means pixels at the very border of the image are processed fewer times than pixels in the center, and their information is partially lost. Padding allows the filter to be centered on border pixels, ensuring they are given full consideration.

**Stride:**

  - Concept: Stride defines the number of pixels the filter "steps" or "slides" over the input at a time. A stride of 1 ($S=1$) means the filter moves one pixel at a time (horizontally or vertically). A stride of 2 ($S=2$) means it skips every other pixel, moving two pixels at a time.
  - Purpose: Stride is primarily used for downsampling. A stride greater than 1 significantly reduces the spatial dimensions (width and height) of the output feature map, which helps to reduce the computational load and aggregate information over a wider area.

 **Effect on Output Dimensions:**

 The spatial dimensions (Width and Height) of the output feature map are calculated by the following formula:$$Output\_Dimension = \frac{(W - K + 2P)}{S} + 1$$

 Where:$W$ = Input dimension (e.g., input width)

 $K$ = Filter/Kernel dimension (e.g., filter width)

 $P$ = Padding (number of pixels added to one side)

 $S$ = StrideAs we can see, increasing padding

 ($P$) increases the output size, while increasing stride ($S$) decreases the output size.

**3. Define receptive field in the context of CNNs.Why is it important for deep architectures?**

**Receptive Field:** The receptive field of a neuron in a CNN (i.e., a single "pixel" in a feature map) is the specific region or area in the original input image that this neuron "sees" or is affected by.

- In the first convolutional layer, a neuron's receptive field is simply the size of the filter (e.g., $3 \times 3$).

- In the second layer, a neuron is looking at a $3 \times 3$ patch of the first layer's feature map. But each of those 9 neurons in the first layer was looking at a $3 \times 3$ patch of the original image. Therefore, the neuron in the second layer has an effective receptive field of $5 \times 5$ on the original input.

**Importance in Deep Architectures:** The concept of a growing receptive field is fundamental to why deep (multi-layered) architectures work so well:

**1. Hierarchical Feature Learning:** Deep architectures create a hierarchy of features.

- Early Layers: Have small receptive fields. They learn to detect simple, local features like edges, corners, and colors.

- Middle Layers: Have medium receptive fields. They combine the simple features from earlier layers to learn more complex textures and patterns (e.g., an "eye" or a "wheel").

- Deep Layers: Have large receptive fields (sometimes covering the entire image). They combine the complex patterns from middle layers to detect abstract, high-level objects (e.g., a "human face" or a "car").


**2. Efficiency:** This hierarchical stacking is computationally more efficient than using one single, enormous filter to detect a complex object. It allows the network to learn and share low-level feature detectors (like edge detectors) across many different high-level object concepts.

In short, depth in a CNN is a mechanism for systematically increasing the receptive field, allowing the network to learn complex, large-scale patterns from simple, local ones.

**4. Discuss how filter size and stride influence the number of parameters in a CNN.**

 Parameters are the learnable weights and biases within the network.

 - Filter Size:Filter size has a direct and significant influence on the number of parameters. The parameters are the weights inside the filter.

 A $3 \times 3$ filter has $3 \times 3 = 9$ parameters (weights).

 A $5 \times 5$ filter has $5 \times 5 = 25$ parameters.

 A $7 \times 7$ filter has $7 \times 7 = 49$ parameters.

 The number of parameters also scales with the depth of the input (number of input channels, $C_{in}$). A $3 \times 3$ filter on an RGB image (3 channels) actually has $3 \times 3 \times 3 = 27$ weights.

 The total parameters for a single filter (plus its one bias) is: (filter_height * filter_width * C_in) + 1_bias.

 If the layer has $N$ filters (to produce $N$ feature maps), the total parameters for the layer is: $N \times ((filter\_height \times filter\_width \times C_{in}) + 1)$.

 Conclusion: Larger filter sizes dramatically increase the number of parameters.

 **Stride:** Stride has no influence at all on the number of parameters.

 Explanation: The parameters are the weights inside the filter, and these weights are fixed (for that pass) regardless of how the filter moves. Stride only dictates where the filter is applied and how many "steps" it takes to cross the image.

 Effect: Stride only affects the spatial dimensions (width and height) of the output feature map. A larger stride creates a smaller output, but the number of weights used to create that output remains the same.In summary: Filter size (along with input channels and number of filters) determines the parameter count. Stride determines the output size.

**5. Compare and contrast different CNN-based architectures like LeNet, AlexNet, and VGG in terms of depth, filter sizes, and performance.**

LeNet, AlexNet, and VGG represent key evolutionary steps in the development of deep CNNs.

**LeNet (LeNet-5)**
- Year: 1998
- Depth: Shallow (5 layers: 2 Conv, 3 FC)
- Filter Sizes: Primarily $5 \times 5$ filters.
- Activation: Sigmoid or Tanh
- Key Innovations: The first widely successful CNN. Proved the concept of stacked Conv + Pooling layers for feature extraction. Used for digit recognition.
- Performance: State-of-the-art for its time on simple tasks (like MNIST digits).
- Parameters: ~60,000



**AlexNet**
- 2012
- Deeper (8 layers: 5 Conv, 3 FC)
- Varied sizes. Large $11 \times 11$ in the first layer, then $5 \times 5$ and $3 \times 3$.
- ReLU (Rectified Linear Unit). This was a key innovation that solved the vanishing gradient problem and allowed for deeper models.
- Won the 2012 ImageNet competition, proving CNNs' dominance. Used ReLU, Dropout (for regularization), and data augmentation.
- Groundbreaking performance on a complex dataset (ImageNet). Top-5 error of 15.3%.
- ~60 Million

**VGG (VGG-16/19)**
- 2014
- Very Deep (16 or 19 layers: 13/16 Conv, 3 FC)
- Standardized on very small $3 \times 3$ filters stacked together.
- ReLU.
- Proved that depth is critical. Its simple, homogeneous architecture (just stacking $3 \times 3$ conv blocks) showed that networks could be built much deeper and achieve better performance.
- State-of-the-art in 2014. Improved on AlexNet (Top-5 error of ~7.3%). VGG's simplicity and performance made it a very popular "backbone" for many other tasks (like object detection and segmentation).
- ~138 Million (VGG-16). Very "heavy" model with many parameters, mostly in the fully connected layers.

**Contrast Summary:**

- LeNet was the "proof of concept" on a small scale.

- AlexNet was the "breakthrough," proving CNNs could scale to complex problems by introducing key components like ReLU and Dropout.

- VGG was the "scaling" architecture, demonstrating that a simple, standardized, and very deep stack of small filters was a highly effective and generalizable design.

**6. Using keras, build and train a simple CNN model on the MNIST dataset from scratch.Include code for module creation, compilation, training, and evaluation.**

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Input
from tensorflow.keras.utils import to_categorical


# Load the MNIST dataset (handwritten digits)
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the images
# Reshape to include the channel dimension (1 for grayscale)
#    MNIST images are 28x28 pixels
x_train = x_train.reshape((x_train.shape[0], 28, 28, 1))
x_test = x_test.reshape((x_test.shape[0], 28, 28, 1))

#  Normalize pixel values from 0-255 to 0.0-1.0
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Preprocess the labels (target)
# We use 'sparse_categorical_crossentropy' as the loss function,
# which accepts integer labels directly, so one-hot encoding is not strictly needed.
# If we used 'categorical_crossentropy', we would uncomment the lines below:
# y_train = to_categorical(y_train, 10)
# y_test = to_categorical(y_test, 10)

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

# --- Build the CNN Model (Module Creation) ---

model = Sequential([
    # Define the input shape in the first layer
    Input(shape=(28, 28, 1)),

    # Convolutional Layer 1
    # 32 filters, 3x3 kernel size, ReLU activation
    Conv2D(32, kernel_size=(3, 3), activation='relu'),

    # Pooling Layer 1
    # 2x2 pooling window
    MaxPooling2D(pool_size=(2, 2)),

    # Convolutional Layer 2
    # 64 filters, 3x3 kernel
    Conv2D(64, kernel_size=(3, 3), activation='relu'),

    # Pooling Layer 2
    MaxPooling2D(pool_size=(2, 2)),

    # Flatten the 3D feature maps into a 1D vector
    Flatten(),

    # Fully Connected (Dense) Layer
    Dense(128, activation='relu'),

    # Output Layer
    # 10 units (one for each digit 0-9), softmax for multi-class probability
    Dense(10, activation='softmax')
])

# Display the model architecture
model.summary()

# ---  Compile the Model ---

model.compile(
    optimizer='adam',                         # Efficient optimizer
    loss='sparse_categorical_crossentropy',   # Loss function for integer labels
    metrics=['accuracy']                      # Metric to monitor
)

# ---  Train the Model ---

print("\n--- Starting Model Training ---")
history = model.fit(
    x_train,
    y_train,
    epochs=5,                                # Number of times to iterate over the dataset
    batch_size=64,                           # Number of samples per gradient update
    validation_data=(x_test, y_test)         # Data to evaluate against at the end of each epoch
)
print("--- Model Training Finished ---")

# ---  Evaluate the Model ---

print("\n--- Evaluating Model on Test Data ---")
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=2)
print(f"\nTest Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy*100:.2f}%")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Training data shape: (60000, 28, 28, 1)
Test data shape: (10000, 28, 28, 1)



--- Starting Model Training ---
Epoch 1/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 64ms/step - accuracy: 0.8949 - loss: 0.3674 - val_accuracy: 0.9815 - val_loss: 0.0549
Epoch 2/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 63ms/step - accuracy: 0.9856 - loss: 0.0461 - val_accuracy: 0.9836 - val_loss: 0.0508
Epoch 3/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m56s[0m 60ms/step - accuracy: 0.9899 - loss: 0.0315 - val_accuracy: 0.9880 - val_loss: 0.0363
Epoch 4/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 61ms/step - accuracy: 0.9924 - loss: 0.0231 - val_accuracy: 0.9906 - val_loss: 0.0302
Epoch 5/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 64ms/step - accuracy: 0.9951 - loss: 0.0155 - val_accuracy: 0.9913 - val_loss: 0.0283
--- Model Training Finished ---

--- Evaluating Model on Test Data ---
313/313 - 3s - 10ms/step - accuracy: 0.9913 - loss: 0.0283

Test Loss: 0.0283
Tes

**7. Load and preprocess the CIFAR-10 dataset using Keras, and create a CNN model to classify RGB images.Show your preprocessing and architecture.**




In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Input
import matplotlib.pyplot as plt
import numpy as np


# Load the CIFAR-10 dataset (32x32 color images in 10 classes)
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# CIFAR-10 images are 32x32x3 (RGB), so no reshaping is needed
# Class names for reference
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Preprocessing: Normalize pixel values from 0-255 to 0.0-1.0
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Labels are already integers (0-9), which 'sparse_categorical_crossentropy' can use.

print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

# ---  Create the CNN Model Architecture ---
# This model needs to be deeper than the MNIST one, as CIFAR-10 is a more complex dataset.

model = Sequential([
    Input(shape=(32, 32, 3)), # 32x32 pixels with 3 color channels

    # Block 1
    Conv2D(32, (3, 3), activation='relu', padding='same'),
    Conv2D(32, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25), # Add dropout for regularization

    # Block 2
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),

    # Flatten and Dense Layers
    Flatten(),
    Dense(512, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax') # 10 output classes
])

# Display the model architecture
model.summary()

# ---  Compile the Model ---

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# ---  Train the Model ---

print("\n--- Starting Model Training ---")
history = model.fit(
    x_train,
    y_train,
    epochs=15,  # Needs more epochs than MNIST
    batch_size=64,
    validation_data=(x_test, y_test)
)
print("--- Model Training Finished ---")

# ---  Evaluate the Model ---

print("\n--- Evaluating Model on Test Data ---")
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=2)
print(f"\nTest Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy*100:.2f}%")

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step
Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)



--- Starting Model Training ---
Epoch 1/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 14ms/step - accuracy: 0.3392 - loss: 1.7884 - val_accuracy: 0.5903 - val_loss: 1.1578
Epoch 2/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 8ms/step - accuracy: 0.5836 - loss: 1.1727 - val_accuracy: 0.6637 - val_loss: 0.9476
Epoch 3/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.6644 - loss: 0.9504 - val_accuracy: 0.6992 - val_loss: 0.8655
Epoch 4/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.7057 - loss: 0.8386 - val_accuracy: 0.7279 - val_loss: 0.7770
Epoch 5/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 7ms/step - accuracy: 0.7373 - loss: 0.7471 - val_accuracy: 0.7430 - val_loss: 0.7386
Epoch 6/15
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 7ms/step - accuracy: 0.7572 - loss: 0.6911 - val_accuracy: 0.7373 - val_loss:

**8. Using PyTorch, write a script to define and train a CNN on the MNIST dataset.
Include model definition, data loaders, training loop, and accuracy evaluation**

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# --- Setup Device (GPU or CPU) ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Input channel = 1 (grayscale), Output channels = 32
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        # Input = 32, Output = 64
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)

        # Max pooling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2) # Halves the size

        # After conv1 + pool -> 28x28 -> 14x14
        # After conv2 + pool -> 14x14 -> 7x7
        # 64 channels * 7x7 image size
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10) # 10 output classes
        self.dropout = nn.Dropout(0.25)

    def forward(self, x):
        # Conv 1 -> ReLU -> Pool
        x = self.pool(F.relu(self.conv1(x)))
        # Conv 2 -> ReLU -> Pool
        x = self.pool(F.relu(self.conv2(x)))

        # Flatten the tensor for the dense layer
        x = x.view(-1, 64 * 7 * 7) # -1 infers the batch size

        # Dense 1 -> ReLU -> Dropout
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        # Dense 2 (Output)
        x = self.fc2(x)
        # CrossEntropyLoss applies log_softmax internally
        return x

# Instantiate the model and move it to the device
model = Net().to(device)
print(model)

# ---  Prepare Data Loaders ---

# Transformations: Convert to Tensor and Normalize
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)) # Mean/Stddev for MNIST
])

# Download and load training data
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Download and load test data
test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# ---  Define Loss Function and Optimizer ---
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# ---  Training Loop ---

def train(epoch):
    model.train() # Set the model to training mode
    for batch_idx, (data, target) in enumerate(train_loader):
        # Move data to the device (GPU/CPU)
        data, target = data.to(device), target.to(device)

        # 1. Zero the gradients
        optimizer.zero_grad()

        # 2. Forward pass
        output = model(data)

        # 3. Calculate loss
        loss = criterion(output, target)

        # 4. Backward pass (compute gradients)
        loss.backward()

        # 5. Update weights
        optimizer.step()

        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)}'
                  f' ({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

# --- Accuracy Evaluation Loop ---

def test():
    model.eval() # Set the model to evaluation mode (disables dropout, etc.)
    test_loss = 0
    correct = 0
    with torch.no_grad(): # Disable gradient calculation
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item() # Sum up batch loss
            pred = output.argmax(dim=1, keepdim=True) # Get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)}'
          f' ({accuracy:.2f}%)\n')

# --- Run Training and Evaluation ---
num_epochs = 5
for epoch in range(1, num_epochs + 1):
    train(epoch)
    test()

print("Training finished.")

Using device: cuda
Net(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=3136, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
  (dropout): Dropout(p=0.25, inplace=False)
)


100%|██████████| 9.91M/9.91M [00:00<00:00, 17.9MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 484kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.47MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 9.77MB/s]



Test set: Average loss: 0.0000, Accuracy: 9871/10000 (98.71%)


Test set: Average loss: 0.0000, Accuracy: 9900/10000 (99.00%)


Test set: Average loss: 0.0000, Accuracy: 9897/10000 (98.97%)


Test set: Average loss: 0.0000, Accuracy: 9896/10000 (98.96%)


Test set: Average loss: 0.0000, Accuracy: 9913/10000 (99.13%)

Training finished.


**9. Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model.**

In [4]:
import os

# Create the directory structure
os.makedirs('./dataset/train/class_A', exist_ok=True)
os.makedirs('./dataset/train/class_B', exist_ok=True)
os.makedirs('./dataset/validation/class_A', exist_ok=True)
os.makedirs('./dataset/validation/class_B', exist_ok=True)

print("Dummy folders created!")

Dummy folders created!


In [5]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Input

train_dir = './dataset/train'
validation_dir = './dataset/validation'
IMG_HEIGHT = 150
IMG_WIDTH = 150
BATCH_SIZE = 32

# ---  Create ImageDataGenerators ---
# We apply data augmentation to the training data to prevent overfitting.
# We ONLY rescale the validation data (no augmentation).

print("Setting up data generators...")
# Training Data Generator with Augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,            # Normalize pixel values to [0, 1]
    rotation_range=40,         # Randomly rotate images
    width_shift_range=0.2,     # Randomly shift width
    height_shift_range=0.2,    # Randomly shift height
    shear_range=0.2,           # Apply shear transformation
    zoom_range=0.2,            # Randomly zoom in
    horizontal_flip=True,      # Randomly flip horizontally
    fill_mode='nearest'        # Strategy for filling new pixels
)

# Validation Data Generator (only rescaling)
validation_datagen = ImageDataGenerator(rescale=1./255)

# --- Create Data Flows from Directories ---

# 'flow_from_directory' links the generator to the directory structure
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH), # Resize all images to 150x150
    batch_size=BATCH_SIZE,
    class_mode='binary'  # 'binary' for 2 classes, 'categorical' for >2
)

validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary'
)

# --- Build a CNN Model ---
# (Using a simple model for demonstration)

model = Sequential([
    Input(shape=(IMG_HEIGHT, IMG_WIDTH, 3)), # 150x150 RGB images

    Conv2D(32, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),

    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),

    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),

    Flatten(),
    Dense(512, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid') # 1 neuron + sigmoid for binary classification
])

# ---  Compile the Model ---
model.compile(
    optimizer='adam',
    loss='binary_crossentropy', # Use binary_crossentropy for 2 classes
    metrics=['accuracy']
)

model.summary()

# --- Train the Model using the Generators ---
# We use 'fit' (which accepts generators) instead of 'fit_generator'
# We must specify steps_per_epoch and validation_steps

# These are often calculated as:
# steps_per_epoch = train_generator.samples // BATCH_SIZE
# validation_steps = validation_generator.samples // BATCH_SIZE
# For this example, we'll hardcode them.

print("\n--- Starting Model Training with Generators ---")
history = model.fit(
    train_generator,
    steps_per_epoch=20,  # Adjust based on the dataset size (e.g., total_train_images // BATCH_SIZE)
    epochs=10,
    validation_data=validation_generator,
    validation_steps=10  # Adjust based on the dataset size (e.g., total_val_images // BATCH_SIZE)
)
print("--- Model Training Finished ---")

# --- Evaluate the Model ---
print("\n--- Evaluating Model ---")
loss, accuracy = model.evaluate(validation_generator, steps=10)
print(f"Validation Loss: {loss:.4f}")
print(f"Validation Accuracy: {accuracy*100:.2f}%")

Setting up data generators...
Found 39 images belonging to 2 classes.
Found 10 images belonging to 2 classes.



--- Starting Model Training with Generators ---


  self._warn_if_super_not_called()


Epoch 1/10
[1m 2/20[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m1:33[0m 5s/step - accuracy: 0.6850 - loss: 4.7487



[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 478ms/step - accuracy: 0.5300 - loss: 8.4677 - val_accuracy: 0.5000 - val_loss: 1.7194
Epoch 2/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 358ms/step - accuracy: 0.5158 - loss: 1.6651 - val_accuracy: 0.5000 - val_loss: 0.9716
Epoch 3/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 150ms/step - accuracy: 0.4344 - loss: 0.9526 - val_accuracy: 0.5000 - val_loss: 0.7029
Epoch 4/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 152ms/step - accuracy: 0.5365 - loss: 0.6927 - val_accuracy: 0.6000 - val_loss: 0.7032
Epoch 5/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 339ms/step - accuracy: 0.5401 - loss: 0.6845 - val_accuracy: 0.5000 - val_loss: 0.7075
Epoch 6/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 393ms/step - accuracy: 0.5573 - loss: 0.6745 - val

**10. You are working on a web application for a medical imaging startup. Your task is to build and deploy a CNN model that classifies chest X-ray images into "Normal" and "Pneumonia" categories. Describe your end-to-end approach-from data preparation and model training to deploying the model as a web app using Streamlit.**

This is a complete, end-to-end project. The approach is broken into two main parts: Part 1 (Model Training) and Part 2 (Streamlit Deployment).

**Part 1: End-to-End Approach (Model Training)**

The approach would prioritize Transfer Learning, as medical imaging datasets are often relatively small, and pre-trained models capture low-level features (edges, textures) that are highly relevant to X-rays.

1. Data Preparation:

  - Sourcing: We would use a publicly available dataset, such as the "Chest X-Ray Images (Pneumonia)" dataset from Kaggle.

  - Organization:Structure the data into the Keras-compatible directory format:

chest_xray/

    train/
        NORMAL/
        PNEUMONIA/
    test/
        NORMAL/
        PNEUMONIA/
    val/
        NORMAL/
        PNEUMONIA/

**Preprocessing (using ImageDataGenerator):**

  - Training Generator: Apply resizing (e.g., to $224 \times 224$ to match VGG/ResNet), rescaling (1./255), and data augmentation (slight rotations, width/height shifts, zoom). This is crucial for handling class imbalance (if any) and making the model robust.
  
  - Validation/Test Generator: Apply only resizing and rescaling.
  
 **2. Model Building (Transfer Learning):**
  
  - Base Model: We'd select a powerful, pre-trained model like VGG16, ResNet50, or MobileNetV2 (good for fast inference). Let's use MobileNetV2.
  
  - Loading: We will load the MobileNetV2 base, pre-trained on imagenet, setting include_top=False (to remove its original classification layer) and specifying the input_shape=(224, 224, 3).
  
  - Freezing: We will "freeze" the weights of the base model (base_model.trainable = False) so that only my new, custom classification head is trained during the initial phase.
  
  - Building the "Head": we will stack new layers on top of the base model's output:
  
    - GlobalAveragePooling2D(): To flatten the feature maps.
    
    - Dense(128, activation='relu'): A hidden dense layer.
    
    - Dropout(0.5): For regularization.
    
    - Dense(1, activation='sigmoid'): The final output layer (1 neuron + sigmoid for "Normal" vs. "Pneumonia" binary classification).
    
  **Model Training & Saving:**
  
  - Compilation: We will compile the model using optimizer='adam', loss='binary_crossentropy', and metrics=['accuracy'].
  
  - Training: We will train the model using model.fit() with the train_generator and validation_generator. we would also use callbacks like EarlyStopping (to prevent overfitting) and ModelCheckpoint (to save the best model).
  
  - Fine-Tuning (Optional): After the head is trained,We might "unfreeze" the top few layers of the base model and re-train at a very low learning rate to fine-tune it specifically for X-ray features.
  
  - Saving: Once training is complete,we will save the final, best-performing model: model.save('pneumonia_classifier_model.h5').

**Part 2: Python Code (Training & Deployment)**

This requires two separate Python files.

**File 1: train_model.py (The Model Training Script)**

In [9]:
import os
from PIL import Image

# Define paths
base_dir = 'chest_xray'
train_dir = os.path.join(base_dir, 'train')
val_dir = os.path.join(base_dir, 'val')
test_dir = os.path.join(base_dir, 'test')

# Create directories
os.makedirs(os.path.join(train_dir, 'NORMAL'), exist_ok=True)
os.makedirs(os.path.join(train_dir, 'PNEUMONIA'), exist_ok=True)
os.makedirs(os.path.join(val_dir, 'NORMAL'), exist_ok=True)
os.makedirs(os.path.join(val_dir, 'PNEUMONIA'), exist_ok=True)
os.makedirs(os.path.join(test_dir, 'NORMAL'), exist_ok=True)
os.makedirs(os.path.join(test_dir, 'PNEUMONIA'), exist_ok=True)

# Function to create dummy images (224x224, as expected by MobileNetV2)
def create_dummy_xray_image(path, color):
    img = Image.new('RGB', (224, 224), color=color)
    img.save(path)

# Create a few dummy images
create_dummy_xray_image(os.path.join(train_dir, 'NORMAL', 'dummy_norm_1.jpeg'), 'gray')
create_dummy_xray_image(os.path.join(train_dir, 'NORMAL', 'dummy_norm_2.jpeg'), 'gray')
create_dummy_xray_image(os.path.join(train_dir, 'PNEUMONIA', 'dummy_pneu_1.jpeg'), 'black')
create_dummy_xray_image(os.path.join(train_dir, 'PNEUMONIA', 'dummy_pneu_2.jpeg'), 'black')

create_dummy_xray_image(os.path.join(val_dir, 'NORMAL', 'dummy_norm_val_1.jpeg'), 'gray')
create_dummy_xray_image(os.path.join(val_dir, 'PNEUMONIA', 'dummy_pneu_val_1.jpeg'), 'black')

create_dummy_xray_image(os.path.join(test_dir, 'NORMAL', 'dummy_norm_test_1.jpeg'), 'gray')
create_dummy_xray_image(os.path.join(test_dir, 'PNEUMONIA', 'dummy_pneu_test_1.jpeg'), 'black')


print("Folder structure:")
print("./chest_xray")
print("  .../train/NORMAL")
print("  .../train/PNEUMONIA")
print("  .../val/NORMAL")
print("  .../val/PNEUMONIA")
print("  .../test/NORMAL")
print("  .../test/PNEUMONIA")

Folder structure:
./chest_xray
  .../train/NORMAL
  .../train/PNEUMONIA
  .../val/NORMAL
  .../val/PNEUMONIA
  .../test/NORMAL
  .../test/PNEUMONIA


In [11]:
print("Vikash Kumar")

Vikash Kumar


In [14]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout, Input
from tensorflow.keras.models import Model

TRAIN_DIR = 'chest_xray/train'
VAL_DIR = 'chest_xray/val'
TEST_DIR = 'chest_xray/test'
IMG_SIZE = (224, 224)
BATCH_SIZE = 32

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)
val_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    TRAIN_DIR,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary'
)
val_generator = val_datagen.flow_from_directory(
    VAL_DIR,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary',
    shuffle=False
)

# --- Model Building (Transfer Learning) ---

# Load base model
base_model = MobileNetV2(
    input_shape=(*IMG_SIZE, 3),
    include_top=False,
    weights='imagenet'
)
base_model.trainable = False # Freeze the base

# Add custom head
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(1, activation='sigmoid')(x) # Binary output

model = Model(inputs=base_model.input, outputs=predictions)

# --- Compile and Train ---
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("--- Starting Training ---")
history = model.fit(
    train_generator,
    epochs=10,
    validation_data=val_generator
)

# --- Save the Model ---
model.save('pneumonia_classifier_model.h5')
print("Model saved as 'pneumonia_classifier_model.h5'")

Found 3959 images belonging to 2 classes.
Found 2 images belonging to 2 classes.
--- Starting Training ---
Epoch 1/10
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m153s[0m 1s/step - accuracy: 0.7296 - loss: 0.5434 - val_accuracy: 0.5000 - val_loss: 1.1289
Epoch 2/10
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m102s[0m 825ms/step - accuracy: 0.9125 - loss: 0.2208 - val_accuracy: 0.5000 - val_loss: 1.2179
Epoch 3/10
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 838ms/step - accuracy: 0.9149 - loss: 0.2115 - val_accuracy: 0.5000 - val_loss: 0.8969
Epoch 4/10
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m101s[0m 814ms/step - accuracy: 0.9289 - loss: 0.1756 - val_accuracy: 0.5000 - val_loss: 0.8393
Epoch 5/10
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m96s[0m 773ms/step - accuracy: 0.9338 - loss: 0.1641 - val_accuracy: 0.5000 - val_loss: 0.7347
Epoch 6/10
[1m124/124[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 



Model saved as 'pneumonia_classifier_model.h5'


**File 2: app.py (The Streamlit Web App)**

(To run this: pip install streamlit and then streamlit run app.py)

In [16]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.51.0-py3-none-any.whl.metadata (9.5 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.51.0-py3-none-any.whl (10.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m82.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m132.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydeck, streamlit
Successfully installed pydeck-0.9.1 streamlit-1.51.0


In [17]:
import streamlit as st
import tensorflow as tf
from tensorflow.keras.models import load_model
from PIL import Image
import numpy as np

# --- Load the Trained Model ---
# Use st.cache_resource to load the model only once
@st.cache_resource
def load_pneumonia_model():
    model = load_model('pneumonia_classifier_model.h5')
    return model

model = load_pneumonia_model()

# ---  Helper Function for Preprocessing ---
def preprocess_image(image):
    # Convert to RGB (in case of RGBA or Grayscale)
    if image.mode != "RGB":
        image = image.convert("RGB")

    # Resize to the model's expected input size
    image = image.resize((224, 224))

    # Convert to numpy array and rescale
    image_array = np.asarray(image)
    image_array = image_array.astype('float32') / 255.0

    # Expand dimensions to create a "batch" of 1
    image_array = np.expand_dims(image_array, axis=0)
    return image_array

# ---  Streamlit App UI ---
st.title("Chest X-Ray Classifier 🩺")
st.write("Upload a chest X-ray image to classify it as **Normal** or **Pneumonia**.")

# File uploader widget
uploaded_file = st.file_uploader("Choose an X-ray image...", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    #  Display the uploaded image
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded X-Ray', use_column_width=True)

    #  Preprocess the image and make prediction
    st.write("Classifying...")
    processed_image = preprocess_image(image)

    #  Make prediction
    prediction = model.predict(processed_image)
    score = prediction[0][0] # Get the single prediction value

    #  Display the result
    if score > 0.5:
        st.error(f"**Result: Pneumonia** (Confidence: {score*100:.2f}%)")
    else:
        st.success(f"**Result: Normal** (Confidence: {(1-score)*100:.2f}%)")

2025-11-09 06:10:41.664 
  command:

    streamlit run /usr/local/lib/python3.12/dist-packages/colab_kernel_launcher.py [ARGUMENTS]


In [18]:
streamlit run /usr/local/lib/python3.12/dist-packages/colab_kernel_launcher.py

SyntaxError: invalid syntax (ipython-input-1609277159.py, line 1)