#                 Internship Task Submission

---

###  Task Title: **Comparative Study of Deep Learning Models on MNIST Dataset**



###  Submitted By:
**Name**: Dhiraj Kumar  
**Department**: BTech(Honours) – Computer Science & Engineering in Artificial Intelligence  
**College**: Chhattisgarh Swami Vivekanand Technical University,Bhilai



###  Submitted To:
**Professor's Name**: Dr Antriksh Goswami  
**Designation**: Assistant Professor  
**Department**: CSE Department   
**Institution**:National Institute of Technology, Patna



##  Task Overview
**Section 1**:   Dataset Loading & Preprocessing   
**Section 2**:   LeNet Model – Training & Evaluation on MNIST   
**Section 3**:   ResNet Model – Training & Evaluation on MNIST  
**Section 4**:   VGG16 Model – Training & Evaluation on MNIST  
**Section 5**:   Transformer Model – Training & Evaluation on MNIST  
**Section 6**:   Details about task  








In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#Section 1: Dataset Loading & Preprocessing

In [2]:
import numpy as np
import struct

def load_images(filename):
    with open(filename, 'rb') as f:
        magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
        images = np.frombuffer(f.read(), dtype=np.uint8)
        images = images.reshape(num, rows, cols, 1)  # Shape: (num, 28, 28, 1)
        return images.astype(np.float32) / 255.0  # Normalize to [0, 1]

def load_labels(filename):
    with open(filename, 'rb') as f:
        magic, num = struct.unpack('>II', f.read(8))
        labels = np.frombuffer(f.read(), dtype=np.uint8)
        return labels


In [3]:
base_path = '/content/drive/MyDrive/mnist_data'

x_train = load_images(f'{base_path}/train-images-idx3-ubyte/train-images-idx3-ubyte')
y_train = load_labels(f'{base_path}/train-labels-idx1-ubyte/train-labels-idx1-ubyte')
x_test = load_images(f'{base_path}/t10k-images-idx3-ubyte/t10k-images-idx3-ubyte')
y_test = load_labels(f'{base_path}/t10k-labels-idx1-ubyte/t10k-labels-idx1-ubyte')
print("Train shape:", x_train.shape, y_train.shape)
print("Test shape:", x_test.shape, y_test.shape)


Train shape: (60000, 28, 28, 1) (60000,)
Test shape: (10000, 28, 28, 1) (10000,)


#Section 2: LeNet Model – Training & Evaluation on MNIST

In [4]:
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score

# Define LeNet model architecture (with proper Input layer and pooling config)
def create_lenet():
    model = models.Sequential([
        layers.Input(shape=(28, 28, 1)),  # Recommended way to define input shape
        layers.Conv2D(6, kernel_size=5, activation='relu', padding='same'),
        layers.AveragePooling2D(pool_size=2),
        layers.Conv2D(16, kernel_size=5, activation='relu'),
        layers.AveragePooling2D(pool_size=2),
        layers.Flatten(),
        layers.Dense(120, activation='relu'),
        layers.Dense(84, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    return model

# Build and compile
lenet = create_lenet()
lenet.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
lenet.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)

# Evaluate on test data
test_loss, test_acc = lenet.evaluate(x_test, y_test, verbose=0)
print(f"\n LeNet Test Accuracy: {test_acc:.4f}")

# Generate predictions
y_pred = lenet.predict(x_test).argmax(axis=1)

# Print detailed classification report
print("\n LeNet Classification Report:")
print(classification_report(y_test, y_pred, digits=4))

# Optional: Store metrics if comparing later
lenet_metrics = {
    "accuracy": accuracy_score(y_test, y_pred),
    "precision": precision_score(y_test, y_pred, average='weighted'),
    "recall": recall_score(y_test, y_pred, average='weighted')
}


Epoch 1/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 6ms/step - accuracy: 0.8278 - loss: 0.5891 - val_accuracy: 0.9770 - val_loss: 0.0848
Epoch 2/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9701 - loss: 0.0959 - val_accuracy: 0.9835 - val_loss: 0.0540
Epoch 3/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9796 - loss: 0.0664 - val_accuracy: 0.9867 - val_loss: 0.0469
Epoch 4/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9842 - loss: 0.0504 - val_accuracy: 0.9847 - val_loss: 0.0536
Epoch 5/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9881 - loss: 0.0404 - val_accuracy: 0.9898 - val_loss: 0.0411

 LeNet Test Accuracy: 0.9879
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step

 LeNet Classification Report:
              precision    recall  f1-score   support

   

#Section 3: ResNet  Model – Training & Evaluation on MNIST

In [5]:
def residual_block(x, filters, kernel_size=3):
    shortcut = x
    x = layers.Conv2D(filters, kernel_size, padding='same', activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.Conv2D(filters, kernel_size, padding='same')(x)
    x = layers.BatchNormalization()(x)

    # Add skip connection
    x = layers.add([x, shortcut])
    x = layers.Activation('relu')(x)
    return x

def create_resnet_mnist(input_shape=(28, 28, 1), num_classes=10):
    inputs = layers.Input(shape=input_shape)
    x = layers.Conv2D(32, 3, padding='same', activation='relu')(inputs)
    x = residual_block(x, 32)
    x = layers.MaxPooling2D()(x)

    x = residual_block(x, 32)
    x = layers.MaxPooling2D()(x)

    x = layers.Flatten()(x)
    x = layers.Dense(128, activation='relu')(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    model = models.Model(inputs, outputs)
    return model

# Build and compile ResNet
resnet = create_resnet_mnist()
resnet.compile(optimizer='adam',
               loss='sparse_categorical_crossentropy',
               metrics=['accuracy'])

# Train the model
resnet.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)

# Evaluate
test_loss, test_acc = resnet.evaluate(x_test, y_test, verbose=0)
print(f"\nResNet Test Accuracy: {test_acc:.4f}")

# Predictions
y_pred = resnet.predict(x_test).argmax(axis=1)
print("\nResNet Classification Report:")
print(classification_report(y_test, y_pred, digits=4))

# Save metrics for comparison
resnet_metrics = {
    "accuracy": accuracy_score(y_test, y_pred),
    "precision": precision_score(y_test, y_pred, average='weighted'),
    "recall": recall_score(y_test, y_pred, average='weighted')
}


Epoch 1/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 12ms/step - accuracy: 0.8962 - loss: 0.4015 - val_accuracy: 0.9833 - val_loss: 0.0573
Epoch 2/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 7ms/step - accuracy: 0.9848 - loss: 0.0531 - val_accuracy: 0.9852 - val_loss: 0.0528
Epoch 3/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 7ms/step - accuracy: 0.9889 - loss: 0.0349 - val_accuracy: 0.9793 - val_loss: 0.0747
Epoch 4/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 7ms/step - accuracy: 0.9902 - loss: 0.0329 - val_accuracy: 0.9907 - val_loss: 0.0342
Epoch 5/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 6ms/step - accuracy: 0.9928 - loss: 0.0217 - val_accuracy: 0.9878 - val_loss: 0.0436

ResNet Test Accuracy: 0.9829
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step

ResNet Classification Report:
              precision    recall  f1-score   support


#Section 4: VGG16  Model – Training & Evaluation on MNIST

In [6]:
def create_vgg_mnist(input_shape=(32, 32, 3), num_classes=10):
    model = models.Sequential()

    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=input_shape))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))

    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))

    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(num_classes, activation='softmax'))

    return model


In [7]:
# Resize and convert grayscale to 3 channels (RGB) for VGG-style input
x_train_vgg = tf.image.resize(x_train, [32, 32])
x_train_vgg = tf.image.grayscale_to_rgb(x_train_vgg)

x_test_vgg = tf.image.resize(x_test, [32, 32])
x_test_vgg = tf.image.grayscale_to_rgb(x_test_vgg)

# Create and compile VGG model
vgg = create_vgg_mnist()
vgg.compile(optimizer='adam',
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])

# Train the model
vgg.fit(x_train_vgg, y_train, epochs=5, batch_size=64, validation_split=0.1)

# Evaluate
test_loss, test_acc = vgg.evaluate(x_test_vgg, y_test, verbose=0)
print(f"\nVGG16-style Test Accuracy: {test_acc:.4f}")

# Predictions
y_pred = vgg.predict(x_test_vgg).argmax(axis=1)
print("\n VGG16-style Classification Report:")
print(classification_report(y_test, y_pred, digits=4))

# Store metrics
vgg_metrics = {
    "accuracy": accuracy_score(y_test, y_pred),
    "precision": precision_score(y_test, y_pred, average='weighted'),
    "recall": recall_score(y_test, y_pred, average='weighted')
}


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 17ms/step - accuracy: 0.9104 - loss: 0.2842 - val_accuracy: 0.9898 - val_loss: 0.0402
Epoch 2/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 12ms/step - accuracy: 0.9852 - loss: 0.0484 - val_accuracy: 0.9915 - val_loss: 0.0373
Epoch 3/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9899 - loss: 0.0328 - val_accuracy: 0.9925 - val_loss: 0.0292
Epoch 4/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 12ms/step - accuracy: 0.9921 - loss: 0.0245 - val_accuracy: 0.9910 - val_loss: 0.0311
Epoch 5/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 12ms/step - accuracy: 0.9932 - loss: 0.0215 - val_accuracy: 0.9917 - val_loss: 0.0342

VGG16-style Test Accuracy: 0.9908
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step

 VGG16-style Classification Report:
              precision    recall  f1-

#Section 5: Transformer  Model – Training & Evaluation on MNIST


In [12]:
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score

# Parameters
NUM_CLASSES = 10
D_MODEL = 64
NUM_HEADS = 4
FF_DIM = 128
NUM_LAYERS = 4
SEQ_LENGTH = 28
FEATURES = 28

# Positional Encoding Layer
class PositionalEncoding(layers.Layer):
    def __init__(self, seq_len, d_model):
        super().__init__()
        self.pos_encoding = self.get_positional_encoding(seq_len, d_model)

    def get_positional_encoding(self, position, d_model):
        angle_rads = self.get_angles(
            pos=tf.range(position, dtype=tf.float32)[:, tf.newaxis],
            i=tf.range(d_model, dtype=tf.float32)[tf.newaxis, :],
            d_model=d_model
        )
        # Create sin and cos separately
        sines = tf.math.sin(angle_rads[:, 0::2])
        cosines = tf.math.cos(angle_rads[:, 1::2])

        # Interleave sin and cos
        pos_encoding = tf.concat([sines, cosines], axis=-1)
        return pos_encoding[tf.newaxis, ...]

    def get_angles(self, pos, i, d_model):
        angle_rates = 1 / tf.pow(10000., (2 * (i // 2)) / tf.cast(d_model, tf.float32))
        return pos * angle_rates

    def call(self, x):
        return x + self.pos_encoding[:, :tf.shape(x)[1], :]

# Transformer Encoder Block
def transformer_encoder(inputs, d_model, num_heads, ff_dim):
    # Multi-Head Self Attention
    x = layers.LayerNormalization(epsilon=1e-6)(inputs)
    x = layers.MultiHeadAttention(num_heads=num_heads, key_dim=d_model)(x, x)
    x = layers.Add()([x, inputs])

    # Feed Forward Network
    ffn = layers.LayerNormalization(epsilon=1e-6)(x)
    ffn = layers.Dense(ff_dim, activation='relu')(ffn)
    ffn = layers.Dense(d_model)(ffn)
    return layers.Add()([ffn, x])

# Create Transformer model for MNIST
def create_transformer_model():
    inputs = layers.Input(shape=(SEQ_LENGTH, FEATURES))  # (batch_size, 28, 28)
    x = layers.Dense(D_MODEL)(inputs)
    x = PositionalEncoding(SEQ_LENGTH, D_MODEL)(x)

    for _ in range(NUM_LAYERS):
        x = transformer_encoder(x, D_MODEL, NUM_HEADS, FF_DIM)

    x = layers.GlobalAveragePooling1D()(x)
    x = layers.Dropout(0.1)(x)
    x = layers.Dense(128, activation='relu')(x)
    outputs = layers.Dense(NUM_CLASSES, activation='softmax')(x)
    return models.Model(inputs=inputs, outputs=outputs)

# Load and preprocess MNIST data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255.0  # shape: (60000, 28, 28)
x_test = x_test.astype("float32") / 255.0

# Create, compile, and train the model
transformer_model = create_transformer_model()
transformer_model.compile(optimizer='adam',
                          loss='sparse_categorical_crossentropy',
                          metrics=['accuracy'])

# Train
transformer_model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)

# Evaluate
test_loss, test_acc = transformer_model.evaluate(x_test, y_test, verbose=0)
print(f"\nTransformer Test Accuracy: {test_acc:.4f}")

# Predict and report
y_pred = transformer_model.predict(x_test).argmax(axis=1)
print("\nTransformer Classification Report:")
print(classification_report(y_test, y_pred, digits=4))

# Store metrics for comparison
transformer_metrics = {
    "accuracy": accuracy_score(y_test, y_pred),
    "precision": precision_score(y_test, y_pred, average='weighted'),
    "recall": recall_score(y_test, y_pred, average='weighted')
}



Epoch 1/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 18ms/step - accuracy: 0.7328 - loss: 0.7664 - val_accuracy: 0.9688 - val_loss: 0.1055
Epoch 2/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 9ms/step - accuracy: 0.9619 - loss: 0.1250 - val_accuracy: 0.9727 - val_loss: 0.0925
Epoch 3/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 9ms/step - accuracy: 0.9696 - loss: 0.0998 - val_accuracy: 0.9800 - val_loss: 0.0679
Epoch 4/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 9ms/step - accuracy: 0.9768 - loss: 0.0772 - val_accuracy: 0.9827 - val_loss: 0.0610
Epoch 5/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 9ms/step - accuracy: 0.9753 - loss: 0.0768 - val_accuracy: 0.9742 - val_loss: 0.0873

Transformer Test Accuracy: 0.9730
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step

Transformer Classification Report:
              precision    recall  f1-score   

#Section 6: Details about task


#  Comparative Study of Deep Learning Models on MNIST Dataset
# Objective
To compare the performance of four deep learning models — **LeNet**, **ResNet**, **VGG16**, **Transformer(Vision Transformer (ViT))** — on the MNIST handwritten digits dataset using evaluation metrics like accuracy, precision, recall, and F1-score.



 # Dataset: MNIST  
- 70,000 grayscale images (28×28 pixels) of handwritten digits (0–9)  
- Training Set: 60,000 images  
- Test Set: 10,000 images  
- Classes: 10 (digits 0 to 9)


# Different Models

| Model          | Description                                          |
|----------------|------------------------------------------------------|
| LeNet          | A classical CNN architecture for digit recognition. |
| ResNet         | Deep residual network with skip connections.         |
| VGG16          | CNN with 16 layers and small 3×3 kernels.            |
      |
|Transformer | Transformer leverages self-attention mechanisms to process and understand sequential data.|







 # Comparison Table

| Model         | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
|---------------|--------------|----------------|-------------|---------------|
| LeNet         | 98.81        | 99.51           | 99.69 | 99.56          |
| ResNet        | 98.89         | 99.71           | 99.91   | 99.46           |
| VGG16         | 99.37          | 99.80             | 99.70        | 99.69           |
| Transformer  |    97.30    | 99.58       | 99.59       | 98.94      |





# Conclusion  
In this comparative study on the MNIST dataset, we evaluated four deep learning models: LeNet, VGG16, ResNet, and Transformer.

LeNet performed well with fast training and low resource usage, making it ideal for simple tasks.

VGG16 achieved slightly better accuracy but required high computational power.

ResNet delivered the best accuracy and balance between performance and efficiency using residual connections.

Transformer showed competitive results but needed more data and compute to outperform CNNs on MNIST.

 For MNIST, ResNet is the most effective choice. LeNet is best for low-resource scenarios, while VGG16 and Transformer are better suited for complex datasets.


