1.What is a Convolutional Neural Network (CNN), and how does it differ from
traditional fully connected neural networks in terms of architecture and performance on
image data?
- A Convolutional Neural Network (CNN) is a type of deep learning model specifically designed to process grid-like data, such as images. CNNs automatically learn spatial features (edges, textures, shapes, objects) by applying convolution operations using learnable filters.

- A Fully Connected Neural Network (also called Multilayer Perceptron) connects every neuron in one layer to every neuron in the next.

2.Discuss the architecture of LeNet-5 and explain how it laid the foundation
for modern deep learning models in computer vision. Include references to its original
research paper.

- LeNet-5 is one of the earliest and most influential Convolutional Neural Networks (CNNs), proposed by Yann LeCun et al. in the late 1990s. It was primarily designed for handwritten digit recognition (such as ZIP code and bank cheque recognition) and laid the conceptual and architectural foundation for modern deep learning models in computer vision.

- Yann LeCun, L√©on Bottou, Yoshua Bengio, Patrick Haffner (1998)
‚ÄúGradient-Based Learning Applied to Document Recognition‚Äù

3.Compare and contrast AlexNet and VGGNet in terms of design principles,
number of parameters, and performance. Highlight key innovations and limitations of
each.

- AlexNet and VGGNet are two landmark convolutional neural network architectures that significantly influenced the evolution of deep learning in computer vision. While both were designed for large-scale image classification on the ImageNet dataset, they differ notably in their design philosophy, complexity, and performance.
- AlexNet, introduced in 2012 by Alex Krizhevsky and colleagues, was the first deep CNN to demonstrate a dramatic improvement in image classification accuracy. Its architecture consists of eight layers, including five convolutional layers followed by three fully connected layers. AlexNet employs relatively large convolution filters such as 11√ó11 and 5√ó5 in the initial layers, which allow rapid reduction of spatial dimensions. A major design principle of AlexNet was to make deep learning feasible on available hardware, achieved through GPU-based training and architectural simplifications.
- VGGNet, proposed in 2014 by Simonyan and Zisserman, was designed with the principle that increasing network depth using small, consistent convolution filters leads to better performance. VGGNet uses only 3√ó3 convolution kernels stacked sequentially, which effectively increases the receptive field while keeping the architecture simple and uniform. Popular versions such as VGG-16 and VGG-19 contain 16 and 19 weight layers respectively, making them much deeper than AlexNet.

4.What is transfer learning in the context of image classification? Explain
how it helps in reducing computational costs and improving model performance with
limited data.
- Transfer learning in the context of image classification is a deep learning technique in which a model trained on a large and general image dataset is reused for a new but related task. Instead of training a convolutional neural network (CNN) from scratch, a pretrained model such as VGG, ResNet, Inception, or MobileNet‚Äîalready trained on large datasets like ImageNet‚Äîis adapted to classify images from a new target dataset.
- In image classification, the early layers of a CNN learn generic visual features such as edges, corners, textures, and shapes, while the deeper layers learn task-specific features. Transfer learning leverages this property by reusing the learned weights of early layers and modifying or retraining only the final layers for the new classification problem.

5.Describe the role of residual connections in ResNet architecture. How do
they address the vanishing gradient problem in deep CNNs?
- Residual connections are the core idea behind the ResNet (Residual Network) architecture, introduced by He et al. in 2015, and they enable the successful training of very deep convolutional neural networks (50, 101, or even 152 layers).
- Mathematically, instead of learning a function
ùêª
(
ùë•
)
H(x), the network learns
ùêπ
(
ùë•
)
=
ùêª
(
ùë•
)
‚àí
ùë•
F(x)=H(x)‚àíx, and the final output becomes
ùë¶
=
ùêπ
(
ùë•
)
+
ùë•
y=F(x)+x. This makes it easier for the network to learn identity mappings when deeper layers are not needed.

6.Implement the LeNet-5 architectures using Tensorflow or PyTorch to
classify the MNIST dataset. Report the accuracy and training time.

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models
import time

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Preprocessing
x_train = x_train.reshape(-1, 28, 28, 1).astype("float32") / 255.0
x_test = x_test.reshape(-1, 28, 28, 1).astype("float32") / 255.0

# Pad images to 32x32 as required by LeNet-5
x_train = tf.pad(x_train, [[0,0],[2,2],[2,2],[0,0]])
x_test = tf.pad(x_test, [[0,0],[2,2],[2,2],[0,0]])

# Build LeNet-5 model
model = models.Sequential([
    layers.Conv2D(6, kernel_size=5, activation='tanh', input_shape=(32,32,1)),
    layers.AveragePooling2D(),

    layers.Conv2D(16, kernel_size=5, activation='tanh'),
    layers.AveragePooling2D(),

    layers.Conv2D(120, kernel_size=5, activation='tanh'),
    layers.Flatten(),

    layers.Dense(84, activation='tanh'),
    layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train model and measure time
start_time = time.time()

history = model.fit(
    x_train, y_train,
    epochs=10,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)

training_time = time.time() - start_time

# Evaluate model
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)

print(f"\nTest Accuracy: {test_accuracy * 100:.2f}%")
print(f"Training Time: {training_time:.2f} seconds")


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


TypeError: AveragePooling2D.__init__() missing 1 required positional argument: 'pool_size'

7.Use a pre-trained VGG16 model (via transfer learning) on a small custom
dataset (e.g., flowers or animals). Replace the top layers and fine-tune the model.
Include your code and result discussion.

In [None]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import time

# Image parameters
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
NUM_CLASSES = 4

# Data generators
train_gen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True
)

val_gen = ImageDataGenerator(rescale=1./255)

train_data = train_gen.flow_from_directory(
    "flowers/train",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="categorical"
)

val_data = val_gen.flow_from_directory(
    "flowers/validation",
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode="categorical"
)

# Load pre-trained VGG16 (without top layers)
base_model = VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(224, 224, 3)
)

# Freeze base model
base_model.trainable = False

# Custom classifier
model = models.Sequential([
    base_model,
    layers.Flatten(),
    layers.Dense(256, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(NUM_CLASSES, activation="softmax")
])

# Compile model
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

# Train model
start_time = time.time()

history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=10
)

training_time = time.time() - start_time

# Fine-tuning: unfreeze top VGG16 layers
base_model.trainable = True

for layer in base_model.layers[:-4]:
    layer.trainable = False

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

history_finetune = model.fit(
    train_data,
    validation_data=val_data,
    epochs=5
)

print(f"Total Training Time: {training_time:.2f} seconds")


8.Write a program to visualize the filters and feature maps of the first
convolutional layer of AlexNet on an example input image.

In [None]:
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

# Load pretrained AlexNet
alexnet = models.alexnet(pretrained=True)
alexnet.eval()

# Extract first convolutional layer
first_conv = alexnet.features[0]

# -------------------------------
# 1. Visualize Filters (Kernels)
# -------------------------------
filters = first_conv.weight.data.clone()

# Normalize for visualization
filters = (filters - filters.min()) / (filters.max() - filters.min())

plt.figure(figsize=(8, 8))
for i in range(16):  # show first 16 filters
    plt.subplot(4, 4, i+1)
    kernel = filters[i]
    kernel = kernel.permute(1, 2, 0)  # CxHxW ‚Üí HxWxC
    plt.imshow(kernel)
    plt.axis("off")

plt.suptitle("AlexNet First Convolution Layer Filters")
plt.show()

# -------------------------------
# 2. Visualize Feature Maps
# -------------------------------

# Load example image
image = Image.open("example.jpg").convert("RGB")

# Preprocessing (AlexNet input format)
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

input_tensor = transform(image).unsqueeze(0)

# Forward pass through first conv layer
with torch.no_grad():
    feature_maps = first_conv(input_tensor)

# Convert to numpy
feature_maps = feature_maps.squeeze(0)

plt.figure(figsize=(8, 8))
for i in range(16):  # show first 16 feature maps
    plt.subplot(4, 4, i+1)
    plt.imshow(feature_maps[i], cmap="gray")
    plt.axis("off")

plt.suptitle("Feature Maps from First Conv Layer")
plt.show()


9.Train a GoogLeNet (Inception v1) or its variant using a standard dataset
like CIFAR-10. Plot the training and validation accuracy over epochs and analyze
overfitting or underfitting.


In [None]:
def GoogLeNet_CIFAR10():
    inputs = layers.Input(shape=(32, 32, 3))

    x = layers.Conv2D(64, (3,3), padding='same', activation='relu')(inputs)
    x = layers.MaxPooling2D((2,2))(x)

    x = inception_module(x, 32, 48, 64, 8, 16, 16)
    x = inception_module(x, 64, 64, 96, 16, 32, 32)

    x = layers.MaxPooling2D((2,2))(x)

    x = inception_module(x, 96, 64, 96, 16, 32, 32)

    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.4)(x)

    outputs = layers.Dense(10, activation='softmax')(x)

    return models.Model(inputs, outputs)

model = GoogLeNet_CIFAR10()
model.summary()


10.You are working in a healthcare AI startup. Your team is tasked with
developing a system that automatically classifies medical X-ray images into normal,
pneumonia, and COVID-19. Due to limited labeled data, what approach would you
suggest using among CNN architectures discussed (e.g., transfer learning with ResNet
or Inception variants)? Justify your approach and outline a deployment strategy for
production use

-  Recommended Approach
- Model Choice Justification
- Data Preparation
- Explainability
- Continuous Learning in Production
-