**Question 1: What is a Convolutional Neural Network (CNN), and how does it differ from traditional fully connected neural networks in terms of architecture and performance on image data?**

**What is a CNN?**

* A CNN is a type of neural network designed to automatically and adaptively learn spatial hierarchies of features through backpropagation by using:

**Convolutional layers**

**Pooling (subsampling) layers**

**Fully connected layers (at the end, for classification or regression)**

**Key Architectural Differences Between CNN and FCN**

**Feature**.........................................	**Fully Connected Network (FCN)**	......................................**Convolutional Neural Network (CNN)**


**Input Handling**............................	Input is flattened into a 1D vector (e.g., a 28×28 image becomes a 784-dimensional vector).......................................	Input retains its 2D or 3D spatial structure (e.g., height × width × channels)

**Layer Connections**......................................	Every neuron is connected to every neuron in the previous layer..........................	Uses local connections (filters) over small regions of the input

**Weight Sharing**	Each connection has a unique weight.........................	Filters (kernels) are shared across the input image, drastically reducing parameters

**Parameter Efficiency**.............................	Large number of parameters, prone to overfitting................................	Fewer parameters, more scalable to larger inputs

**Translation Invariance**	Poor (must learn each variation separately)....................................	Good due to convolution and pooling operations capturing local patterns

**Spatial Hierarchy**...........................	Does not preserve spatial relationships............................	Preserves and exploits spatial structure of the input


**Question 2: Discuss the architecture of LeNet-5 and explain how it laid the foundation for modern deep learning models in computer vision. Include references to its original research paper.**



*  The LeNet-5 architecture, proposed by Yann LeCun et al. in 1998, is one of the earliest and most influential Convolutional Neural Networks (CNNs). It was originally developed for handwritten digit recognition, particularly the MNIST dataset (digits 0–9), and its architecture introduced core design principles that underpin modern deep learning models in computer vision today.

**LeNet-5 Architecture Breakdown**

* LeNet-5 is a 7-layer network (excluding the input), composed of:

**1. Input Layer**

* Input: 32×32 grayscale image

* MNIST images (28×28) were zero-padded to 32×32 to preserve spatial resolution after convolutions.

**2. C1 – First Convolutional Layer**

* Type: Convolution

* Filter size: 5×5

* Number of filters: 6

* Output size: 28×28×6

* Activation: Tanh (used instead of ReLU)

* Trainable parameters: 6×(5×5 + 1) = 156

* Captures low-level features like edges and corners.

**3. S2 – Subsampling (Pooling) Layer**

* Type: Average pooling (not max pooling)

* Kernel size: 2×2

* Stride: 2

* Output size: 14×14×6

* Activation: Tanh

* Reduces spatial resolution, introduces translation invariance.

**4. C3 – Second Convolutional Layer**

* Filter size: 5×5

* Number of filters: 16

* Output size: 10×10×16

* Connectivity: Not all input maps are connected to all output maps (sparse connections to reduce computation and encourage feature diversity)

* Learns more complex features by combining input maps in selective patterns.

**5. S4 – Subsampling Layer**

* Same as S2

* Output size: 5×5×16

**6. C5 – Fully Connected Convolution Layer**

* Input size: 5×5×16 = 400

* Filter size: 5×5 (entire input map)

* Number of filters: 120

* Output size: 1×1×120 (essentially a dense layer)

* Activation: Tanh

* Transitions from spatial to abstract representation for classification.

**7. F6 – Fully Connected Layer**

* Input: 120

* Output: 84

* Activation: Tanh

* Inspired by biological visual systems (e.g., the number of neurons in some human visual regions).

**8. Output Layer**

* Input: 84

* Output: 10 (for digit classification 0–9)

* Activation: Typically a softmax for classification

**Reference**

* LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.

**Question 3: Compare and contrast AlexNet and VGGNet in terms of design principles, number of parameters, and performance. Highlight key innovations and limitations of each.**

**Architecture Comparison**

**AlexNet (2012)**

**Architecture:**

5 convolutional layers

3 fully connected layers

ReLU activation (first major use in CNNs)

Local Response Normalization (LRN)

Max pooling after some conv layers

Dropout in fully connected layers

**Design Highlights:**

Introduced ReLU → enabled faster training than sigmoid/tanh

GPU parallelism: Split model across 2 GPUs due to hardware limits

Dropout → reduced overfitting in dense layers

Trained on ImageNet (1.2M images)

**Limitations:**

Large fully connected layers → most parameters come from here

Irregular filter sizes (e.g., 11×11, 5×5)

Model size is large for modest depth

**VGGNet (2014)**

**Architecture:**

VGG-16: 13 convolutional + 3 fully connected layers

All conv layers use 3×3 filters

Max pooling every 2 or 3 conv layers

ReLU activation

No local response normalization (unlike AlexNet)

**Design Highlights:**

Deep but simple: Stacked many small filters instead of fewer large ones

Demonstrated that depth improves performance significantly

Used only 3×3 convs and 2×2 max pooling, making it clean and modular

**Limitations:**

Very high number of parameters (~138M) → computationally expensive

Not efficient for real-time or mobile applications

Fully connected layers are again parameter-heavy

**Number of Parameters**

Model......................	Parameters (approx.)

AlexNet....................	~60 million

VGG-16.....................	~138 million

VGG-19.....................	~144 million

**Performance (Top-5 Accuracy on ImageNet)**

Model.....................	Top-5 Accuracy

AlexNet...................	~83.6%

VGG-16....................	~92.7%

VGG-19....................	~92.8%


**Key Innovations**

**AlexNet**

ReLU Activation: Introduced ReLU to CNNs, accelerating convergence.

Dropout: Used dropout to combat overfitting in fully connected layers.

GPU Training: Trained on two GPUs in parallel (split model), enabling deep learning on large datasets.

Local Response Normalization (LRN): Introduced LRN for lateral inhibition (now rarely used).

**VGGNet**

Deep but Simple Design: Used a uniform 3×3 filter size throughout, demonstrating that depth improves performance if designed carefully.

Stacked Small Filters: Multiple 3×3 convs approximate the receptive field of larger filters (e.g., 5×5), but with fewer parameters and more non-linearities.

Modular Design: Easy to extend, which influenced later models like ResNet and EfficientNet.

**Limitations**

**AlexNet**

Large Filters: Initial 11×11 and 5×5 filters are less efficient at capturing fine details.

Shallow by Modern Standards: Only 8 layers; less expressive than deeper networks.

Split Model for GPUs: Architecture was constrained by hardware limits (split across two GPUs).

**VGGNet**

Very Large Model Size: Over 500MB for VGG-16; not practical for deployment on resource-constrained devices.

Training Time: Slow to train due to depth and high number of parameters.

No Batch Normalization: Lacked techniques that later became standard (e.g., batch norm in ResNet).

**Question 4: What is transfer learning in the context of image classification? Explain how it helps in reducing computational costs and improving model performance with limited data.**

**In the Context of Image Classification**

**In image classification, transfer learning involves:**

Starting with a CNN pre-trained on a large dataset (e.g., ImageNet with 1.2M images and 1000 classes).

Adapting it to a new task, such as classifying medical images, satellite imagery, or custom categories (e.g., cats vs. dogs).

Either:

Fine-tuning the whole model on the new dataset, or

Freezing earlier layers and only training the final few layers (often the classification head).

**How It Works: Two Main Strategies**

1. Feature Extraction

Freeze all convolutional layers (they act as a generic feature extractor).

Replace the final classification layer (e.g., Dense(1000)) with one suited for your task (e.g., Dense(5) for 5 classes).

Only train the final layer(s).

2. Fine-tuning

Unfreeze some of the later layers of the pre-trained model.

Train them (often at a lower learning rate) to adapt to the new data.

This improves performance when your dataset is moderately sized and similar in domain to the pretraining data.

**Benefits of Transfer Learning**

1. Reduced Computational Cost

No need to train from scratch: Training a large CNN (e.g., ResNet, VGG) from scratch can take days or weeks.

Transfer learning uses the heavy lifting already done by pretraining, saving time and compute.

2. Better Performance with Limited Data

Pre-trained models have already learned rich, general features (edges, shapes, textures).

These features can be reused, especially in early layers, leading to higher accuracy even with small datasets (few thousand or even hundreds of images).

3. Faster Convergence

Training starts from an already-optimized set of weights.

Fewer epochs are needed to reach high performance.

**Question 5: Describe the role of residual connections in ResNet architecture. How do they address the vanishing gradient problem in deep CNNs?**

**Role of Residual Connections in ResNet**

Residual connections—the key innovation in ResNet (Residual Network)—are a simple yet powerful technique that allows very deep neural networks to be trained effectively. Introduced by He et al. in 2015 in the paper:

"Deep Residual Learning for Image Recognition"
📄 He, K., Zhang, X., Ren, S., & Sun, J. (2015)

ResNet won the ImageNet 2015 challenge and enabled networks as deep as 152 layers to outperform shallower ones — something previously difficult due to training challenges.

What Are Residual Connections?

A residual connection (or skip connection) bypasses one or more layers by directly adding the input of a layer to its output:

Output = 𝐹(𝑥)+ 𝑥

Where:

𝑥
is the input,

𝐹
(
𝑥
) is the residual mapping (the transformation the layers are trying to learn),

The result is the element-wise addition of the input and the output of the residual block.

**Purpose of Residual Connections**

1. Easier Optimization of Deep Networks

Without residual connections, very deep CNNs tend to saturate or degrade in performance — accuracy gets worse as depth increases.

Residual blocks allow the network to learn the "difference" (residual) from the identity mapping rather than the full transformation.

If learning an identity mapping is optimal, the residual function can learn to output zeros, making the block effectively skip itself.

2. Mitigating the Vanishing Gradient Problem

In deep networks, during backpropagation:

Gradients can become very small (vanish) as they propagate backward through many layers.

This slows or prevents effective weight updates in early layers.

With residual connections:

The gradient has a shortcut path to flow directly backward through the identity connection.

This helps preserve gradient strength, enabling effective training of very deep networks (50, 101, 152+ layers).

Mathematically:
Gradient flows through both:

∂
𝐹
(
𝑥
)
∂
𝑥
— through the normal layers

∂
𝑥
∂
𝑥
= 1 — through the identity shortcut

This ensures that some gradient always flows, avoiding complete vanishing.

3. Improved Generalization and Training Speed

Residual connections make it easier for the network to converge during training.

Deeper ResNets often generalize better because they can represent complex functions without increasing training error.

Residual Block (Basic Unit of ResNet)

A basic residual block looks like this:

    Input → [Conv → BN → ReLU → Conv → BN] → Add(Input) → ReLU → Output


**Question 6: Implement the LeNet-5 architectures using Tensorflow or PyTorch to classify the MNIST dataset. Report the accuracy and training time.**


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import time

# Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# LeNet-5 Model Definition
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv3 = nn.Conv2d(16, 120, kernel_size=5)
        self.fc1 = nn.Linear(120, 84)
        self.fc2 = nn.Linear(84, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.pool1(x)
        x = self.relu(self.conv2(x))
        x = self.pool2(x)
        x = self.relu(self.conv3(x))
        x = x.view(-1, 120)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Data preprocessing and loading
transform = transforms.Compose([
    transforms.Pad(2),  # Convert 28x28 to 32x32 as expected by LeNet-5
    transforms.ToTensor()
])

train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# Initialize model, loss, optimizer
model = LeNet5().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
start_time = time.time()
epochs = 5

for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}")

training_time = time.time() - start_time

# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total

print(f"\nTest Accuracy: {accuracy:.2f}%")
print(f"Training Time: {training_time:.2f} seconds")


100%|██████████| 9.91M/9.91M [00:00<00:00, 56.1MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.49MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 13.2MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 9.66MB/s]


Epoch [1/5], Loss: 0.3420
Epoch [2/5], Loss: 0.0944
Epoch [3/5], Loss: 0.0677
Epoch [4/5], Loss: 0.0519
Epoch [5/5], Loss: 0.0430

Test Accuracy: 98.43%
Training Time: 150.53 seconds


**Question 7: Use a pre-trained VGG16 model (via transfer learning) on a small custom dataset (e.g., flowers or animals). Replace the top layers and fine-tune the model. Include your code and result discussion.**

Step-by-Step: Transfer Learning with VGG16

Install and Import Dependencies

    pip install torchvision


    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torchvision import datasets, transforms, models
    from torch.utils.data import DataLoader
    import time
    


In [3]:
# Prepare the Dataset (Example: Flowers)

# Transform: Resize and normalize (as VGG expects ImageNet input)
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

# Download Oxford Flowers dataset (or use your own)
train_dataset = datasets.Flowers102(root='./data', split='train', transform=transform, download=True)
val_dataset = datasets.Flowers102(root='./data', split='val', transform=transform, download=True)
test_dataset = datasets.Flowers102(root='./data', split='test', transform=transform, download=True)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)


100%|██████████| 345M/345M [00:09<00:00, 37.2MB/s]
100%|██████████| 502/502 [00:00<00:00, 1.50MB/s]
100%|██████████| 15.0k/15.0k [00:00<00:00, 24.5MB/s]


In [None]:
# Load VGG16 and Replace Classifier

# Load pre-trained VGG16
model = models.vgg16(pretrained=True)

# Freeze feature extractor
for param in model.features.parameters():
    param.requires_grad = False

# Replace the classifier
model.classifier = nn.Sequential(
    nn.Linear(25088, 4096),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(4096, 102),  # 102 flower classes
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)


In [None]:
# Training the Model

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

# Training loop
epochs = 5
start_time = time.time()

for epoch in range(epochs):
    model.train()
    running_loss = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}/{epochs}, Loss: {running_loss/len(train_loader):.4f}")

training_time = time.time() - start_time


**Evaluate Accuracy**

In [None]:
def evaluate(loader):
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    return 100 * correct / total

val_acc = evaluate(val_loader)
test_acc = evaluate(test_loader)

print(f"\nValidation Accuracy: {val_acc:.2f}%")
print(f"Test Accuracy: {test_acc:.2f}%")
print(f"Training Time: {training_time:.2f} seconds")


**Example Output (You May See Results Like):**

In [None]:
Epoch 1/5, Loss: 3.1152
Epoch 2/5, Loss: 2.3524
...
Validation Accuracy: 74.85%
Test Accuracy: 73.20%
Training Time: 310.45 seconds


**Result Discussion**

Observations:

VGG16, even with a few training epochs and frozen base layers, achieves good accuracy on a small dataset.

Fine-tuning only the classifier layers gives a strong starting point.

You can unfreeze some deeper layers to further improve performance (with more training time and data).

**Question 8: Write a program to visualize the filters and feature maps of the first convolutional layer of AlexNet on an example input image.**

Goal

Load a pre-trained AlexNet model.

Feed an input image into the network.

Visualize:

The filters (kernels) of the first convolutional layer.

The feature maps (activations) produced by that layer.

Prerequisites

Install PyTorch and torchvision if not already:

     pip install torch torchvision matplotlib


In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
from torchvision import models
from PIL import Image

# Device configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load pre-trained AlexNet
alexnet = models.alexnet(pretrained=True).to(device)
alexnet.eval()

# Load and preprocess an input image
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],  # ImageNet mean
        std=[0.229, 0.224, 0.225]    # ImageNet std
    )
])

# Load a sample image (replace with your own image path)
image_path = 'sample.jpg'  # Ensure this image exists
image = Image.open(image_path).convert('RGB')
input_tensor = transform(image).unsqueeze(0).to(device)

# ================================
# 🔍 1. Visualize Filters of First Conv Layer
# ================================

def visualize_filters(layer, num_filters=16):
    filters = layer.weight.data.cpu()
    fig, axes = plt.subplots(1, num_filters, figsize=(20, 5))
    for i in range(num_filters):
        f = filters[i]
        f = (f - f.min()) / (f.max() - f.min())  # Normalize
        axes[i].imshow(f.permute(1, 2, 0))
        axes[i].axis('off')
    plt.suptitle("Filters from First Conv Layer")
    plt.show()

first_conv = alexnet.features[0]
visualize_filters(first_conv, num_filters=8)  # Show 8 filters

# ================================
# 🔍 2. Visualize Feature Maps (Activations)
# ================================

# Hook to extract feature maps
activation = {}

def hook_fn(module, input, output):
    activation["conv1"] = output.detach()

# Register hook
alexnet.features[0].register_forward_hook(hook_fn)

# Forward pass
_ = alexnet(input_tensor)

# Visualize feature maps
feature_maps = activation["conv1"].squeeze().cpu()
num_maps = 8  # Limit for visualization

fig, axes = plt.subplots(1, num_maps, figsize=(20, 5))
for i in range(num_maps):
    axes[i].imshow(feature_maps[i], cmap='viridis')
    axes[i].axis('off')
plt.suptitle("Feature Maps from First Conv Layer")
plt.show()


**Question 9: Train a GoogLeNet (Inception v1) or its variant using a standard dataset like CIFAR-10. Plot the training and validation accuracy over epochs and analyze overfitting or underfitting.**

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

# Device config
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Data transforms
transform_train = transforms.Compose([
    transforms.Resize(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

transform_test = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Datasets and loaders
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Load GoogLeNet (Inception v1)
model = models.googlenet(pretrained=True)
model.fc = nn.Linear(1024, 10)  # 10 classes in CIFAR-10
model = model.to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
epochs = 10
train_acc_history = []
val_acc_history = []

for epoch in range(epochs):
    model.train()
    correct_train = 0
    total_train = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs, aux1, aux2 = model(images)
        loss = criterion(outputs, labels) + 0.3 * criterion(aux1, labels) + 0.3 * criterion(aux2, labels)
        loss.backward()
        optimizer.step()

        _, predicted = torch.max(outputs, 1)
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    train_acc = 100 * correct_train / total_train
    train_acc_history.append(train_acc)

    # Validation
    model.eval()
    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_acc = 100 * correct_val / total_val
    val_acc_history.append(val_acc)

    print(f"Epoch {epoch+1}/{epochs} - Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%")

# Plot results
plt.figure(figsize=(10, 5))
plt.plot(train_acc_history, label='Train Accuracy')
plt.plot(val_acc_history, label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.title('GoogLeNet on CIFAR-10: Accuracy over Epochs')
plt.legend()
plt.grid(True)
plt.show()


**Question 10: You are working in a healthcare AI startup. Your team is tasked with developing a system that automatically classifies medical X-ray images into normal, pneumonia, and COVID-19. Due to limited labeled data, what approach would you suggest using among CNN architectures discussed (e.g., transfer learning with ResNet or Inception variants)? Justify your approach and outline a deployment strategy for production use.**



In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
import os
import matplotlib.pyplot as plt

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Path to dataset (assumes structure: /train/class_name/, /val/class_name/)
data_dir = "./chest_xray_data"
train_dir = os.path.join(data_dir, "train")
val_dir = os.path.join(data_dir, "val")

# Transforms
train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

val_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

# Datasets and loaders
train_dataset = datasets.ImageFolder(train_dir, transform=train_transform)
val_dataset = datasets.ImageFolder(val_dir, transform=val_transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Load ResNet-50
model = models.resnet50(pretrained=True)

# Freeze early layers
for param in model.parameters():
    param.requires_grad = False

# Replace the classifier
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 3)  # 3 classes: Normal, Pneumonia, COVID-19
model = model.to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

# Training
train_acc_list = []
val_acc_list = []
num_epochs = 5

for epoch in range(num_epochs):
    model.train()
    correct = 0
    total = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

    train_acc = 100 * correct / total
    train_acc_list.append(train_acc)

    # Validation
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    val_acc = 100 * correct / total
    val_acc_list.append(val_acc)

    print(f"Epoch {epoch+1}: Train Acc = {train_acc:.2f}%, Val Acc = {val_acc:.2f}%")

# Plot training and validation accuracy
plt.figure(figsize=(8,5))
plt.plot(train_acc_list, label="Train Accuracy")
plt.plot(val_acc_list, label="Validation Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy (%)")
plt.title("Training vs Validation Accuracy")
plt.legend()
plt.grid(True)
plt.show()
