# Image Classification using CNN Architectures

Question 1: What is a Convolutional Neural Network (CNN), and how does it differ from
traditional fully connected neural networks in terms of architecture and performance on
image data?

**Answer :**

A **Convolutional Neural Network (CNN)** is a type of deep learning model mainly used for **image and visual data processing**. It automatically learns spatial features such as edges, textures, and shapes from images using convolution operations.

**Difference from Fully Connected Neural Networks:**

* CNNs use **convolutional layers and pooling layers**, whereas traditional neural networks use only fully connected layers.
* CNNs have **fewer parameters** due to weight sharing and local connections, making them more efficient.
* CNNs preserve the **spatial structure** of images, while fully connected networks flatten images and lose spatial information.
* CNNs perform **better on image data** by learning hierarchical features and reducing overfitting.

**Conclusion:**
CNNs are specifically designed for image data and significantly outperform traditional fully connected networks in computer vision tasks.


Question 2: Discuss the architecture of LeNet-5 and explain how it laid the foundation
for modern deep learning models in computer vision. Include references to its original
research paper.

**Answer :**

LeNet-5 is one of the earliest convolutional neural network (CNN) architectures, proposed by **Yann LeCun et al.** in 1998 for handwritten digit recognition. It was designed to classify images such as digits from the MNIST dataset.

**Architecture of LeNet-5:**
LeNet-5 consists of alternating **convolutional layers and pooling layers**, followed by fully connected layers. The network starts with a convolution layer that extracts basic features, followed by average pooling to reduce spatial dimensions. This pattern is repeated, and the final layers are fully connected to perform classification.

**Foundation for Modern Deep Learning:**
LeNet-5 introduced key concepts such as **local receptive fields**, **shared weights**, and **subsampling (pooling)**, which are fundamental to modern CNNs. It demonstrated that convolutional architectures can effectively learn spatial hierarchies in images. Modern deep learning models build on these ideas by using deeper networks, better activation functions, and improved optimization techniques.

**Reference:**
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). *Gradient-based learning applied to document recognition*. Proceedings of the IEEE.


Question 3: Compare and contrast AlexNet and VGGNet in terms of design principles,
number of parameters, and performance. Highlight key innovations and limitations of
each.


**Answer:**

**AlexNet** and **VGGNet** are both influential CNN architectures, but they differ in design and complexity.

**Design Principles:**
AlexNet uses **larger convolution filters** (such as 11×11 and 5×5) and introduced key ideas like **ReLU activation**, **dropout**, and **data augmentation**.
VGGNet follows a **simpler and deeper design**, using only **small 3×3 convolution filters** stacked together to increase depth.

**Number of Parameters:**
AlexNet has about **60 million parameters**, while VGGNet has significantly more (around **138 million parameters** in VGG-16), making VGGNet more memory-intensive.

**Performance:**
VGGNet generally achieves **higher accuracy** than AlexNet due to its greater depth and uniform architecture. However, it requires more computation and memory.

**Key Innovations and Limitations:**
AlexNet’s innovation lies in making deep CNNs practical, but it is relatively shallow by today’s standards.
VGGNet improves feature learning through depth but is limited by its **high computational cost and large model size**.


Question 4: What is transfer learning in the context of image classification? Explain
how it helps in reducing computational costs and improving model performance with
limited data.

**Answer :**
Transfer learning in image classification is a technique where a **pre-trained model** (trained on a large dataset like ImageNet) is reused for a new but related task. Instead of training a model from scratch, the learned features are transferred to the new problem.

It reduces **computational cost** because most layers are already trained and do not need extensive retraining. It also **improves performance with limited data** by using previously learned features such as edges, textures, and shapes, which helps prevent overfitting and speeds up training.


Question 5: Describe the role of residual connections in ResNet architecture. How do
they address the vanishing gradient problem in deep CNNs?

**Answer:**
Residual connections in ResNet add **shortcut paths** that directly pass the input of a layer to its output. Instead of learning a full mapping, the network learns a **residual function** (H(x) = F(x) + x).

These connections help solve the **vanishing gradient problem** by allowing gradients to flow directly through the network during backpropagation. As a result, very deep CNNs can be trained efficiently without performance degradation.


Question 6: Implement the LeNet-5 architectures using Tensorflow or PyTorch to
classify the MNIST dataset. Report the accuracy and training time.


In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, AveragePooling2D, Dense, Flatten
from tensorflow.keras.datasets import mnist

# Load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1,28,28,1)/255.0
X_test = X_test.reshape(-1,28,28,1)/255.0

# LeNet-5 Model
model = Sequential([
    Conv2D(6, (5,5), activation='tanh', input_shape=(28,28,1)),
    AveragePooling2D(),
    Conv2D(16, (5,5), activation='tanh'),
    AveragePooling2D(),
    Flatten(),
    Dense(120, activation='tanh'),
    Dense(84, activation='tanh'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train model
history = model.fit(X_train, y_train, epochs=5, batch_size=64)

# Evaluate
loss, accuracy = model.evaluate(X_test, y_test)
print("Test Accuracy:", accuracy)


Question 7: Use a pre-trained VGG16 model (via transfer learning) on a small custom
dataset (e.g., flowers or animals). Replace the top layers and fine-tune the model.
Include your code and result discussion.


In [None]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Data loading
datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)

train = datagen.flow_from_directory(
    "dataset", target_size=(224,224),
    class_mode="categorical", subset="training")

val = datagen.flow_from_directory(
    "dataset", target_size=(224,224),
    class_mode="categorical", subset="validation")

# Load VGG16
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(224,224,3))
base_model.trainable = False

# Custom classifier
x = Flatten()(base_model.output)
x = Dense(128, activation='relu')(x)
output = Dense(train.num_classes, activation='softmax')(x)

model = Model(base_model.input, output)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
model.fit(train, validation_data=val, epochs=5)

# Fine-tuning
base_model.trainable = True
model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train, validation_data=val, epochs=3)


Question 8: Write a program to visualize the filters and feature maps of the first
convolutional layer of AlexNet on an example input image.


In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.layers import Conv2D, Input
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing import image

# AlexNet-style first conv layer
inputs = Input(shape=(224,224,3))
conv1 = Conv2D(96, (11,11), strides=4, activation='relu')(inputs)
model = Model(inputs, conv1)

# Load and preprocess image
img = image.load_img("sample.jpg", target_size=(224,224))
img = image.img_to_array(img) / 255.0
img = np.expand_dims(img, axis=0)

# Get feature maps
feature_maps = model.predict(img)

# Plot feature maps
plt.figure(figsize=(10,5))
for i in range(6):
    plt.subplot(2,3,i+1)
    plt.imshow(feature_maps[0,:,:,i], cmap='gray')
    plt.axis('off')
plt.show()


In [None]:
filters, bias = model.layers[1].get_weights()

plt.figure(figsize=(10,5))
for i in range(6):
    f = filters[:,:,:,i]
    f = (f - f.min()) / (f.max() - f.min())
    plt.subplot(2,3,i+1)
    plt.imshow(f)
    plt.axis('off')
plt.show()


Question 9: Train a GoogLeNet (Inception v1) or its variant using a standard dataset
like CIFAR-10. Plot the training and validation accuracy over epochs and analyze
overfitting or underfitting.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Conv2D, MaxPooling2D, concatenate, Input, GlobalAveragePooling2D, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train, X_test = X_train/255.0, X_test/255.0
y_train, y_test = to_categorical(y_train,10), to_categorical(y_test,10)

# Inception block
def inception_block(x):
    p1 = Conv2D(32, (1,1), activation='relu', padding='same')(x)
    p2 = Conv2D(32, (3,3), activation='relu', padding='same')(x)
    p3 = Conv2D(32, (5,5), activation='relu', padding='same')(x)
    p4 = MaxPooling2D((3,3), strides=(1,1), padding='same')(x)
    return concatenate([p1, p2, p3, p4])

# Model
inputs = Input(shape=(32,32,3))
x = inception_block(inputs)
x = GlobalAveragePooling2D()(x)
outputs = Dense(10, activation='softmax')(x)

model = Model(inputs, outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=10,
    batch_size=64
)

# Plot Accuracy
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()


Question 10: You are working in a healthcare AI startup. Your team is tasked with
developing a system that automatically classifies medical X-ray images into normal,
pneumonia, and COVID-19. Due to limited labeled data, what approach would you
suggest using among CNN architectures discussed (e.g., transfer learning with ResNet
or Inception variants)? Justify your approach and outline a deployment strategy for
production use

In [None]:
Model Training (Transfer Learning – ResNet50)
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)

train = datagen.flow_from_directory(
    "dataset", target_size=(224,224),
    class_mode="categorical", subset="training")

val = datagen.flow_from_directory(
    "dataset", target_size=(224,224),
    class_mode="categorical", subset="validation")

base = ResNet50(weights="imagenet", include_top=False, input_shape=(224,224,3))
base.trainable = False

x = GlobalAveragePooling2D()(base.output)
out = Dense(3, activation="softmax")(x)

model = Model(base.input, out)
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(train, validation_data=val, epochs=5)

model.save("xray_model.h5")


In [None]:
# Deployment
from fastapi import FastAPI, File, UploadFile
import tensorflow as tf, numpy as np
from PIL import Image
import io

app = FastAPI()
model = tf.keras.models.load_model("xray_model.h5")
classes = ["Normal", "Pneumonia", "COVID"]

@app.post("/predict")
async def predict(file: UploadFile = File(...)):
    img = Image.open(io.BytesIO(await file.read())).resize((224,224))
    img = np.expand_dims(np.array(img)/255, axis=0)
    pred = model.predict(img)
    return {"class": classes[np.argmax(pred)]}
