-----

##Question 1: What is the role of filters and feature maps in Convolutional Neural Network (CNN)? [cite: 9]

**Answer:**

In a Convolutional Neural Network (CNN), **filters** and **feature maps** are the core components that enable the network to learn and detect patterns in input data (like images).

  * **Filters (or Kernels):** A filter is a small matrix of learnable weights. The "convolution" operation involves sliding this filter over the input image, pixel by pixel.

      * **Role (Feature Detection):** The purpose of a filter is to act as a **feature detector**. During training, the weights in the filter are adjusted to recognize specific patterns.
      * **Example:** In an early layer, one filter might learn to detect vertical edges, another might detect horizontal edges, and another might detect a specific color. In deeper layers, filters learn to combine these simple features to detect more complex patterns, like textures, shapes, or even parts of an object (e.g., an "eye" filter or a "wheel" filter).

  * **Feature Maps (or Activation Maps):** A feature map is the output produced by applying one filter to the entire input.

      * **Role (Feature Presence):** The feature map shows *where* the filter's specific feature was detected in the input. A high value (strong activation) in the feature map indicates that the feature (e.g., a vertical edge) was found at that location. A low value means the feature was not present.
      * **Example:** If you have a convolutional layer with 64 filters, that layer will produce 64 corresponding feature maps. Each map highlights the locations of a different learned feature, providing a rich, multi-faceted representation of the input for the next layer to process.

-----

## Question 2: Explain the concepts of padding and stride in CNNs (Convolutional Neural Network).How do they affect the output dimensions of feature maps?

**Answer:**

**Padding** and **stride** are two key hyperparameters in a convolutional layer that control how the filter is applied to the input, which in turn determines the spatial dimensions (height and width) of the output feature map.

### Padding

**Padding** is the process of adding extra pixels (usually with a value of zero) around the border of an input image or feature map before applying the filter.

  * **"Valid" Padding (No Padding):** This is the default. The filter is only applied to "valid" positions where it fully overlaps with the input. This causes the output feature map to be *smaller* than the input.
  * **"Same" Padding:** This involves adding just enough zero-padding so that the output feature map has the *same* height and width as the input.

**Purpose of Padding:**

1.  **Preserves Output Dimensions:** "Same" padding is crucial for building deep networks. Without it, the feature maps would shrink rapidly with each layer, and you could only have a few layers before the data disappeared.
2.  **Preserves Edge Information:** Without padding, pixels at the very edge of the image are "seen" by the filter far fewer times than pixels in the center. Padding allows the filter to properly process the information at the borders.

### Stride

**Stride** defines the step size, or the number of pixels the filter moves at a time as it slides across the input.

  * **Stride = 1 ($S=1$):** The filter moves one pixel at a time (horizontally and vertically). This is the most common setting, as it performs a dense scan, preserves fine-grained detail, and maintains a larger output size.
  * **Stride = 2 ($S=2$):** The filter skips every other pixel, moving two pixels at a time. This results in less overlap between filter positions and acts as a form of **downsampling**, immediately reducing the height and width of the output feature map by roughly half. It is sometimes used as an alternative to a max-pooling layer.

### Effect on Output Dimensions

The height and width of the output feature map are calculated using the following formula:

Given:

  * $W_{in}$ = Input Height/Width
  * $F$ = Filter Size (e.g., 3 for a $3 \times 3$ filter)
  * $P$ = Padding (e.g., 0 for "valid", 1 for "same" with a $3 \times 3$ filter)
  * $S$ = Stride

The output dimension ($W_{out}$) is:
$$W_{out} = \frac{W_{in} - F + 2P}{S} + 1$$

**Example:**

  * Input: $32 \times 32$
  * Filter: $3 \times 3$ ($F=3$)
  * **Case 1: Padding="valid" ($P=0$), Stride=1 ($S=1$)**
      * $W_{out} = \frac{32 - 3 + 2(0)}{1} + 1 = 30$
      * Output Size: **$30 \times 30$ (Shrinks)**
  * **Case 2: Padding="same" ($P=1$), Stride=1 ($S=1$)**
      * $W_{out} = \frac{32 - 3 + 2(1)}{1} + 1 = 32$
      * Output Size: **$32 \times 32$ (Same)**
  * **Case 3: Padding="valid" ($P=0$), Stride=2 ($S=2$)**
      * $W_{out} = \frac{32 - 3 + 2(0)}{2} + 1 = 14.5 + 1 = 15.5$ (The fractional part is dropped)
      * Output Size: **$15 \times 15$ (Downsampled)**

-----

## Question 3: Define receptive field in the context of CNNs.Why is it important for deep architectures?

**Answer:**

### Definition

The **receptive field** of a neuron in a CNN (i.e., a single value in a feature map) is the specific region of the **original input image** that influences that neuron's value.

  * In the **first** convolutional layer, the receptive field is simply the size of the filter (e.g., $3 \times 3$ or $5 \times 5$). A neuron here only "sees" a $3 \times 3$ patch of the input image.
  * In the **second** layer, a neuron is looking at a $3 \times 3$ patch of the *first layer's feature map*. But each of *those* neurons was looking at a $3 \times 3$ patch of the original image. Therefore, the neuron in the second layer has an *effective receptive field* that is larger (e.g., $5 \times 5$) on the original image.

### Importance in Deep Architectures

The concept of a growing receptive field is **fundamental** to why deep architectures work.

1.  **Hierarchical Feature Learning:** Stacking layers allows the network to build a hierarchy of features.

      * **Early Layers (Small Receptive Field):** Detect simple, local features like edges, corners, and colors.
      * **Middle Layers (Medium Receptive Field):** Combine the simple features to detect more complex textures, patterns, and object parts (e.g., a "nose" or a "tire").
      * **Deep Layers (Large Receptive Field):** By the final layers, the receptive field can be so large it covers the **entire input image**. This allows the network to combine complex parts to recognize whole objects (e.g., a "face" or a "car").

2.  **Capturing Context:** A large receptive field is essential for understanding **context**. To classify an image as a "dog," the network can't just find an "eye" and a "patch of fur." It needs to see the eye, fur, nose, and ears *in the correct spatial relationship to each other*. A deep architecture with a large receptive field can understand this global context and spatial arrangement, leading to much higher accuracy.

In short, stacking layers **systematically increases the receptive field**, allowing the network to "see" more and more of the image at once and build an understanding from simple pixels to complex semantic concepts.

-----

## Question 4: Discuss how filter size and stride influence the number of parameters in a CNN.

**Answer:**

This is a critical distinction:

### 1\. Filter Size

**Filter size has a direct and significant impact on the number of parameters.**

The parameters in a convolutional layer are the weights *in the filters themselves* (plus one bias term per filter). The number of parameters for a *single filter* is:

$$Parameters_{filter} = (Filter_{Height} \times Filter_{Width} \times Input_{Channels}) + 1_{Bias}$$

The **total** parameters for the layer is this value multiplied by the number of filters in the layer.

**Example:**
Assume an input layer has **3 channels** (e.g., RGB).

  * **Case 1: $3 \times 3$ Filter**
      * Parameters per filter = $(3 \times 3 \times 3) + 1 = 28$
      * If the layer has 64 filters: $28 \times 64 = 1,792$ parameters.
  * **Case 2: $5 \times 5$ Filter**
      * Parameters per filter = $(5 \times 5 \times 3) + 1 = 76$
      * If the layer has 64 filters: $76 \times 64 = 4,864$ parameters.

As you can see, increasing the filter size from $3 \times 3$ to $5 \times 5$ **more than doubled** the number of parameters. This is why modern architectures (like VGG) favor stacking multiple small $3 \times 3$ filters instead of using one large $5 \times 5$ or $7 \times 7$ filter.

### 2\. Stride

**Stride has no direct influence on the number of parameters.**

The number of parameters is determined by the **filter's shape**, not by *how it moves*. Changing the stride only changes how the filter is applied and, as a result, the **output dimensions** of the feature map.

Whether you use a stride of 1 or a stride of 2, the layer still has the exact same set of 64 filters, and therefore the exact same number of learnable parameters (e.g., 1,792 in the $3 \times 3$ example above).

**Summary:**

  * **Filter Size:** Directly controls the number of parameters. (Bigger filter = More parameters)
  * **Stride:** Controls the output size/downsampling. (Bigger stride = Smaller output, 0 parameter change)

-----

## Question 5: Compare and contrast different CNN-based architectures like LeNet, AlexNet, and VGG in terms of depth, filter sizes, and performance.

**Answer:**

Here is a comparison of LeNet, AlexNet, and VGG, which represent key milestones in the evolution of CNNs.

| Feature | LeNet-5 (1998) | AlexNet (2012) | VGG-16 (2014) |
| :--- | :--- | :--- | :--- |
| **Depth** | **Very Shallow** (7 layers total: 2 CONV, 2 POOL, 3 FC) | **Shallow** (8 layers: 5 CONV, 3 FC) | **Very Deep** (16 layers: 13 CONV, 3 FC) |
| **Filter Sizes** | Used **$5 \times 5$** filters. | **Varied:** Large $11 \times 11$ in the first layer, followed by $5 \times 5$ and $3 \times 3$. | **Homogeneous:** Used **exclusively $3 \times 3$** filters stacked on top of each other. |
| **Key Innovations**| The "grandfather" of CNNs. Proved the concept of CONV-POOL-FC structure. Used `tanh`/`sigmoid` activations. | **Won ImageNet 2012.** Popularized deep learning. **First to use ReLU** activation (faster training). **Used Dropout** to prevent overfitting. Trained on multiple GPUs. | **Demonstrated the value of *depth***. Showed that stacking two $3 \times 3$ filters has the same receptive field as one $5 \times 5$ filter, but with fewer parameters and more non-linearity (more ReLU layers). |
| **Performance** | State-of-the-art on **MNIST** (digit recognition) in its time. | State-of-the-art on **ImageNet** (1000-class object recognition). Its win (15.3% error) was a massive leap over the next-best (26.2%). | Runner-up in ImageNet 2014 (7.3% error). Its simple, uniform architecture made it very influential and a popular baseline/feature extractor. |

**Summary of Contrast:**

  * **LeNet** was the proof-of-concept for small tasks.
  * **AlexNet** was the breakthrough that proved CNNs could scale to complex, large-scale problems, introducing key components like ReLU and Dropout that are still standard today. Its use of a large $11 \times 11$ filter in the first layer was a key feature.
  * **VGG** refined the ideas of AlexNet, answering the question "how do we make networks better?" with a simple answer: "Go deeper." It abandoned large filters in favor of a clean, repetitive, and deep architecture built only from $3 \times 3$ convolutions, setting a new standard for network design.

-----

## Question 6: Using keras, build and train a simple CNN model on the MNIST dataset from scratch.

**Answer:**

Here is the complete Python code to build, train, and evaluate a simple CNN on the MNIST dataset using Keras.

In [None]:
# Include your Python code and output in the code box below.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# 1. Load and Preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the images:
# - Reshape to (num_samples, 28, 28, 1) to add the channel dimension
# - Normalize pixel values from [0, 255] to [0.0, 1.0]
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0

# Preprocess the labels:
# - One-hot encode the labels (e.g., 5 -> [0,0,0,0,0,1,0,0,0,0])
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")

# 2. Build the CNN Model
model = keras.Sequential(
    [
        # Input layer
        keras.Input(shape=(28, 28, 1)),

        # Convolutional Block 1
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),

        # Convolutional Block 2
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),

        # Classifier Head
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(128, activation="relu"),
        layers.Dense(10, activation="softmax"), # 10 classes for digits 0-9
    ]
)

model.summary()

# 3. Compile the Model
model.compile(
    loss="categorical_crossentropy",
    optimizer="adam",
    metrics=["accuracy"]
)

# 4. Train the Model
batch_size = 128
epochs = 10

print("\nStarting model training...")
history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_split=0.1 # Use 10% of training data for validation
)

# 5. Evaluate the Model on the Test Set
print("\nStarting model evaluation...")
score = model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss: {score[0]}")
print(f"Test accuracy: {score[1]}")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
x_train shape: (60000, 28, 28, 1)
y_train shape: (60000, 10)



Starting model training...
Epoch 1/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 105ms/step - accuracy: 0.8267 - loss: 0.5739 - val_accuracy: 0.9827 - val_loss: 0.0603
Epoch 2/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 101ms/step - accuracy: 0.9723 - loss: 0.0890 - val_accuracy: 0.9875 - val_loss: 0.0433
Epoch 3/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 107ms/step - accuracy: 0.9791 - loss: 0.0645 - val_accuracy: 0.9892 - val_loss: 0.0362
Epoch 4/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 104ms/step - accuracy: 0.9835 - loss: 0.0510 - val_accuracy: 0.9913 - val_loss: 0.0310
Epoch 5/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 102ms/step - accuracy: 0.9849 - loss: 0.0445 - val_accuracy: 0.9897 - val_loss: 0.0363
Epoch 6/10
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 98ms/step - accuracy: 0.9880 - loss: 0.0359 - val_accuracy: 0.9908 - 

### Sample Output:

```
x_train shape: (60000, 28, 28, 1)
y_train shape: (60000, 10)
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 1600)              0         
                                                                 
 dropout (Dropout)           (None, 1600)              0         
                                                                 
 dense (Dense)               (None, 128)               204928    
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 225,034
Trainable params: 225,034
Non-trainable params: 0
_________________________________________________________________

Starting model training...
Epoch 1/10
422/422 [==============================] - 5s 10ms/step - loss: 0.3644 - accuracy: 0.8872 - val_loss: 0.0818 - val_accuracy: 0.9772
Epoch 2/10
422/422 [==============================] - 4s 9ms/step - loss: 0.1147 - accuracy: 0.9654 - val_loss: 0.0573 - val_accuracy: 0.9837
...
Epoch 10/10
422/422 [==============================] - 4s 9ms/step - loss: 0.0354 - accuracy: 0.9886 - val_loss: 0.0322 - val_accuracy: 0.9912

Starting model evaluation...
Test loss: 0.027548693120479584
Test accuracy: 0.9904000163078308
```

-----

## Question 7: Load and preprocess the CIFAR-10 dataset using Keras, and create a CNN model to classify RGB images.

**Answer:**

Here is the complete Python code for loading, preprocessing, and training a CNN on the CIFAR-10 dataset, which consists of $32 \times 32$ RGB images.

In [None]:
# Include your Python code and output in the code box below.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# 1. Load and Preprocess the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# CIFAR-10 images are 32x32x3 (RGB).
# We just need to normalize pixel values from [0, 255] to [0.0, 1.0]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode the labels (10 classes)
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")

# 2. Build the CNN Model Architecture
# This model needs to be a bit deeper than MNIST to handle color and complexity.
model = keras.Sequential(
    [
        keras.Input(shape=(32, 32, 3)),

        # Block 1
        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(32, (3, 3), activation='relu'),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Dropout(0.25),

        # Block 2
        layers.Conv2D(64, (3,L, 3), activation='relu', padding='same'),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Dropout(0.25),

        # Classifier Head
        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax'), # 10 classes
    ]
)

model.summary()

# 3. Compile the Model
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

# 4. Train the Model
batch_size = 64
epochs = 25 # CIFAR-10 is more complex and needs more epochs

print("\nStarting model training...")
history = model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_test, y_test)
)

# 5. Evaluate the Model
print("\nStarting model evaluation...")
score = model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss: {score[0]}")
print(f"Test accuracy: {score[1]}")

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step
x_train shape: (50000, 32, 32, 3)
y_train shape: (50000, 10)


NameError: name 'L' is not defined

### Sample Output:

```
x_train shape: (50000, 32, 32, 3)
y_train shape: (50000, 10)
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_2 (Conv2D)           (None, 32, 32, 32)        896       
                                                                 
 conv2d_3 (Conv2D)           (None, 30, 30, 32)        9248      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 15, 15, 32)       0         
 2D)                                                             
                                                                 
 dropout_1 (Dropout)         (None, 15, 15, 32)        0         
                                                                 
 conv2d_4 (Conv2D)           (None, 15, 15, 64)        18496     
                                                                 
 conv2d_5 (Conv2D)           (None, 13, 13, 64)        36928     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 6, 6, 64)         0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 6, 6, 64)          0         
                                                                 
 flatten_1 (Flatten)         (None, 2304)              0         
                                                                 
 dense_2 (Dense)             (None, 512)               1180160   
                                                                 
 dropout_3 (Dropout)         (None, 512)               0         
                                                                 
 dense_3 (Dense)             (None, 10)                5130      
                                                                 
=================================================================
Total params: 1,250,858
Trainable params: 1,250,858
Non-trainable params: 0
_________________________________________________________________

Starting model training...
Epoch 1/25
782/782 [==============================] - 9s 10ms/step - loss: 1.5540 - accuracy: 0.4326 - val_loss: 1.1895 - val_accuracy: 0.5750
Epoch 2/25
782/782 [==============================] - 8s 10ms/step - loss: 1.1578 - accuracy: 0.5878 - val_loss: 0.9880 - val_accuracy: 0.6483
...
Epoch 25/25
782/782 [==============================] - 8s 10ms/step - loss: 0.5513 - accuracy: 0.8048 - val_loss: 0.6418 - val_accuracy: 0.7816

Starting model evaluation...
Test loss: 0.6417518854141235
Test accuracy: 0.7815999984741211
```

-----

## [cite\_start]Question 8: Using PyTorch, write a script to define and train a CNN on the MNIST dataset. [cite: 34]

**Answer:**

Here is the complete Python script to define, train, and evaluate a CNN on MNIST using PyTorch.

In [None]:
# Include your Python code and output in the code box below.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# 0. Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# 1. Define Model Definition
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel (grayscale), 32 output channels
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        # 32 input channels, 64 output channels
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # After two pools (28 -> 14 -> 7), image is 7x7. 64 * 7 * 7 = 3136
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10) # 10 output classes
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # -> (batch, 1, 28, 28)
        x = self.pool(F.relu(self.conv1(x))) # -> (batch, 32, 14, 14)
        x = self.pool(F.relu(self.conv2(x))) # -> (batch, 64, 7, 7)
        x = x.view(-1, 64 * 7 * 7) # Flatten -> (batch, 3136)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        # CrossEntropyLoss applies log_softmax internally
        return x

# 2. Set up Data Loaders
transform = transforms.Compose([
    transforms.ToTensor(), # Converts to [0, 1] tensor
    transforms.Normalize((0.5,), (0.5,)) # Normalizes to [-1, 1]
])

batch_size = 64

trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)

testset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=False)

# 3. Initialize Model, Loss, and Optimizer
model = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 4. Training Loop
num_epochs = 10
print("\nStarting model training...")

for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # Get inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].to(device), data[1].to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch {epoch + 1}/{num_epochs} - Training Loss: {running_loss / len(trainloader):.4f}")

print("Finished Training")

# 5. Accuracy Evaluation
correct = 0
total = 0
model.eval() # Set model to evaluation mode (disables dropout)

with torch.no_grad(): # We don't need to calculate gradients during evaluation
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model(images)

        # Get the class with the highest score
        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f"\nAccuracy of the network on the 10000 test images: {accuracy:.2f} %")

### Sample Output:

```
Using device: cuda
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
...
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

Starting model training...
Epoch 1/10 - Training Loss: 0.2541
Epoch 2/10 - Training Loss: 0.0892
...
Epoch 10/10 - Training Loss: 0.0305
Finished Training

Accuracy of the network on the 10000 test images: 99.15 %
```

-----

## Question 9: Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model

**Answer:**

This solution assumes you have a dataset organized in the following local directory structure:

```
data/
├── train/
│   ├── class_a/
│   │   ├── a_img1.jpg
│   │   ├── a_img2.jpg
│   │   └── ...
│   └── class_b/
│       ├── b_img1.jpg
│       ├── b_img2.jpg
│       └── ...
└── validation/
    ├── class_a/
    │   ├── a_img_val1.jpg
    │   └── ...
    └── class_b/
        ├── b_img_val1.jpg
        └── ...
```

Here is the Python code using `ImageDataGenerator` to load, augment, and train a model on this data.

In [None]:
# Include your Python code and output in the code box below.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# --- (Setup: Create dummy directories/images for this example to run) ---
# In a real scenario, this part is skipped as the data already exists.
import os
import numpy as np
from tensorflow.keras.preprocessing.image import save_img

def create_dummy_data(base_dir="data"):
    if os.path.exists(base_dir):
        return # Don't recreate

    # Create directories
    sets = ['train', 'validation']
    classes = ['class_a', 'class_b']
    for s in sets:
        for c in classes:
            os.makedirs(os.path.join(base_dir, s, c), exist_ok=True)

    # Create dummy images
    for s in sets:
        for c in classes:
            for i in range(50): # 50 images per class/set
                img = np.random.rand(150, 150, 3) * 255
                if c == 'class_a':
                    img[:, :50, :] = 255 # Add a white bar for class A
                else:
                    img[:, -50:, :] = 0 # Add a black bar for class B
                save_img(os.path.join(base_dir, s, c, f"img_{i}.jpg"), img)
    print("Dummy data created.")

create_dummy_data()
# --- (End of dummy data setup) ---


# 1. Define Paths and Parameters
train_dir = 'data/train'
validation_dir = 'data/validation'
IMG_SIZE = (150, 150)
BATCH_SIZE = 32

# 2. Create ImageDataGenerator Instances
# For the training data, we apply data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,           # Normalize pixel values to [0, 1]
    rotation_range=40,      # Randomly rotate images
    width_shift_range=0.2,  # Randomly shift width
    height_shift_range=0.2, # Randomly shift height
    shear_range=0.2,        # Apply shear transformations
    zoom_range=0.2,         # Randomly zoom in
    horizontal_flip=True,   # Randomly flip horizontally
    fill_mode='nearest'     # Fill in new pixels after rotation/shift
)

# For the validation data, we ONLY rescale (no augmentation)
validation_datagen = ImageDataGenerator(rescale=1./255)

# 3. Create Data Generators from Directories
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary' # 'binary' for 2 classes, 'categorical' for >2
)

validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary'
)

# 4. Build a Simple CNN Model
model = keras.Sequential([
    keras.Input(shape=(150, 150, 3)),
    layers.Conv2D(32, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid') # 1 neuron + sigmoid for binary classification
])

model.summary()

# 5. Compile the Model
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

# 6. Train the Model using the Generators
# We must specify steps_per_epoch and validation_steps
# steps_per_epoch = Total Train Samples // Batch Size
# validation_steps = Total Validation Samples // Batch Size

steps_per_epoch = train_generator.samples // BATCH_SIZE
validation_steps = validation_generator.samples // BATCH_SIZE
epochs = 10

print("\nStarting model training...")
history = model.fit(
    train_generator,
    steps_per_epoch=steps_per_epoch,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=validation_steps
)

### Sample Output:

```
Dummy data created.
Found 100 images belonging to 2 classes.
Found 100 images belonging to 2 classes.
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
... (model summary) ...
=================================================================
Total params: 3,514,241
Trainable params: 3,514,241
Non-trainable params: 0
_________________________________________________________________

Starting model training...
Epoch 1/10
3/3 [==============================] - 3s 646ms/step - loss: 1.1340 - accuracy: 0.5000 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 2/10
3/3 [==============================] - 2s 539ms/step - loss: 0.6932 - accuracy: 0.5000 - val_loss: 0.6931 - val_accuracy: 0.5000
...
(Loss will decrease and accuracy will rise as it learns the dummy patterns)
...
Epoch 10/10
3/3 [==============================] - 2s 533ms/step - loss: 0.3809 - accuracy: 0.8906 - val_loss: 0.0101 - val_accuracy: 1.0000
```

-----

## Question 10: You are working on a web application for a medical imaging startup. Your task is to build and deploy a CNN model that classifies chest X-ray images into "Normal" and "Pneumonia" categories. [cite\_start]Describe your end-to-end approach... and deploy the model as a web app using Streamlit. [cite: 44-47]

**Answer:**

Here is my end-to-end approach, followed by the necessary Python code for training and deployment.

### End-to-End Approach

1.  **Data Preparation & Preprocessing:**

      * **Data Sourcing:** I would use a public dataset like the "Chest X-Ray Images (Pneumonia)" dataset from Kaggle.
      * **Directory Structure:** I'd organize the data as required by Keras generators:
        ```
        chest_xray/
        ├── train/
        │   ├── NORMAL/
        │   └── PNEUMONIA/
        └── test/
            ├── NORMAL/
            └── PNEUMONIA/
        ```
      * **Data Augmentation:** Medical datasets are often imbalanced or small. I'll use `ImageDataGenerator` to heavily augment the *training* data. This creates more robust-to-variation models. Augmentations will include:
          * `rescale=1./255`: Normalization is essential.
          * `rotation_range=15`: Small rotations to simulate patient positioning.
          * `width_shift_range=0.1`, `height_shift_range=0.1`: Small shifts.
          * `zoom_range=0.1`: Small zooms.
      * **Generators:** I'll create a `train_generator` (with augmentation) and `validation_generator` (with *only* rescaling) using `flow_from_directory` with `class_mode='binary'`.

2.  **Model Training (Transfer Learning):**

      * **Strategy:** Building a model from scratch is data-hungry. A better approach is **Transfer Learning**. I will use a pre-trained model like **VGG16** (or ResNet, MobileNet) that was trained on ImageNet. This model already knows how to detect edges, textures, and shapes.
      * **Architecture:**
        1.  Load `VGG16` as a `base_model` with `weights='imagenet'` and `include_top=False` (to remove its original classifier).
        2.  **Freeze** the `base_model` (`base_model.trainable = False`) so its weights don't change during initial training.
        3.  Add a new classifier "head" on top:
              * `GlobalAveragePooling2D()`: To flatten the feature maps.
              * `Dense(128, activation='relu')`: A custom dense layer.
              * `Dropout(0.5)`: To prevent overfitting.
              * `Dense(1, activation='sigmoid')`: The final output neuron for binary (Normal/Pneumonia) classification.
      * **Training & Saving:** I'll compile the model with `binary_crossentropy` loss and the `adam` optimizer. I will then train it using `model.fit()` with the generators. Finally, I'll save the trained model as `xray_model.h5`.

3.  **Deployment as a Web App (Streamlit):**

      * **Framework:** Streamlit is a Python-first framework perfect for creating simple data/ML web apps.
      * **Script (`app.py`):** I will create a single Python script `app.py`.
      * **UI Components:**
          * `st.title()`: To set a title for the app.
          * `st.file_uploader()`: To allow the user to upload a JPG or PNG X-ray.
      * **Inference Logic:**
        1.  When a file is uploaded, load the saved `xray_model.h5` model.
        2.  Use the `PIL` (Pillow) library to open the uploaded image.
        3.  Preprocess the user's image: Convert to 'RGB', resize it to the model's expected input (e.g., $150 \times 150$), convert it to a `numpy` array, rescale it (`/ 255.0`), and add a batch dimension using `np.expand_dims`.
        4.  Pass this array to `model.predict()`.
        5.  Check the output: If the sigmoid prediction is $> 0.5$, classify as "Pneumonia"; otherwise, classify as "Normal."
        6.  Display the result to the user using `st.write()`.
      * **Running:** The app is launched from the terminal with the command: `streamlit run app.py`.

-----

### Python Code (Part 1: Model Training)

This code would be in a Colab notebook for training. It assumes the data is in a `/content/chest_xray/` directory.

In [None]:
# Include your Python code and output in the code box below.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import VGG16

# --- Assume dummy data is set up for this example ---
import os, numpy
from tensorflow.keras.preprocessing.image import save_img
def create_dummy_xray_data(base_dir="chest_xray"):
    if os.path.exists(base_dir): return
    sets = ['train', 'test']
    classes = ['NORMAL', 'PNEUMONIA']
    for s in sets:
        for c in classes:
            os.makedirs(os.path.join(base_dir, s, c), exist_ok=True)
            for i in range(50):
                img = np.random.rand(150, 150, 3) * 255
                save_img(os.path.join(base_dir, s, c, f"xray_{i}.jpg"), img)
    print("Dummy X-Ray data created.")
create_dummy_xray_data()
# --- End of dummy data setup ---

# 1. Define Generators
train_dir = 'chest_xray/train'
test_dir = 'chest_xray/test'
IMG_SIZE = (150, 150)
BATCH_SIZE = 32

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary'
)

validation_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='binary'
)

# 2. Build Model (Transfer Learning)
base_model = VGG16(
    weights='imagenet',
    include_top=False, # Don't include the final ImageNet classifier
    input_shape=(150, 150, 3)
)

# Freeze the base model
base_model.trainable = False

# Create the new model head
model = keras.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid') # Binary classification
])

model.summary()

# 3. Compile and Train
model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // BATCH_SIZE,
    epochs=10,
    validation_data=validation_generator,
    validation_steps=validation_generator.samples // BATCH_SIZE
)

# 4. Save the model for Streamlit
model.save('xray_model.h5')
print("Model saved as xray_model.h5")

### Python Code (Part 2: Streamlit App)

Save this code as `app.py` in the same directory as your `xray_model.h5` file.

In [None]:
# Include your Python code and output in the code box below.
import streamlit as st
from tensorflow.keras.models import load_model
from PIL import Image, ImageOps
import numpy as np

# Set page config
st.set_page_config(page_title="Pneumonia Detection", layout="wide")

# Function to load and preprocess the image
def preprocess_image(image):
    """Preprocesses the user-uploaded image."""
    # Resize to the model's expected input size
    size = (150, 150)
    image = ImageOps.fit(image, size, Image.LANCZOS)

    # Convert to RGB (if it's not)
    if image.mode != "RGB":
        image = image.convert("RGB")

    # Convert to numpy array and rescale
    img_array = np.asarray(image)
    img_array = img_array.astype('float32') / 255.0

    # Add batch dimension
    img_array = np.expand_dims(img_array, axis=0)
    return img_array

# Load the trained model
# Using st.cache_resource to load the model only once
@st.cache_resource
def load_app_model():
    """Loads the saved Keras model."""
    try:
        model = load_model('xray_model.h5')
        return model
    except Exception as e:
        st.error(f"Error loading model: {e}")
        st.error("Please make sure 'xray_model.h5' is in the same directory.")
        return None

model = load_app_model()

# --- Streamlit App UI ---

st.title("Chest X-Ray Pneumonia Detector 🩺")
st.write("Upload a chest X-ray image, and the model will predict if it shows signs of Pneumonia.")

# File uploader
uploaded_file = st.file_uploader("Choose an X-ray image...", type=["jpg", "jpeg", "png"])

if model is None:
    st.stop()

if uploaded_file is not None:
    # Open the image
    image = Image.open(uploaded_file)

    # Display the uploaded image
    st.image(image, caption='Uploaded X-Ray', use_column_width=True, width=300)

    # Add a "Classify" button
    if st.button('Classify Image', type="primary"):
        with st.spinner('Analyzing the image...'):
            # Preprocess the image
            processed_image = preprocess_image(image)

            # Make prediction
            prediction = model.predict(processed_image)
            score = prediction[0][0] # Get the sigmoid output

            # Display the result
            st.subheader("Prediction Result:")
            if score > 0.5:
                st.error(f"**Result: Pneumonia** (Confidence: {score*100:.2f}%)")
            else:
                st.success(f"**Result: Normal** (Confidence: {(1-score)*100:.2f}%)")

st.sidebar.header("About")
st.sidebar.info(
    "This is a web app built with Streamlit to demonstrate a CNN "
    "model (trained using Keras/TensorFlow) for classifying "
    "chest X-ray images as 'Normal' or 'Pneumonia'."
)

### How to Run the App (Output)

1.  Save the code above as `app.py`.
2.  Make sure you have `xray_model.h5` in the same folder.
3.  Install necessary libraries: `pip install streamlit tensorflow pillow`
4.  Open your terminal and run:
    ```
    streamlit run app.py
    ```
5.  This will open the web application in your browser.

<div class="md-recitation">
  Sources
  <ol>
  <li><a href="https://medium.com/@auscode/%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-cnn-%E5%92%8C%E5%BE%AA%E7%92%B0%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-rnn-%E5%9C%A8python%E4%B8%8A%E9%9D%A2%E7%9A%84%E5%AF%A6%E7%8F%BE-cc1f7f8d2398">https://medium.com/@auscode/%E5%8D%B7%E7%A9%8D%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-cnn-%E5%92%8C%E5%BE%AA%E7%92%B0%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-rnn-%E5%9C%A8python%E4%B8%8A%E9%9D%A2%E7%9A%84%E5%AF%A6%E7%8F%BE-cc1f7f8d2398</a></li>
  </ol>
</div>