# Assignment: Deep Neural Network for Handwritten Digit Recognition

**Objective:** Implement a multi-layer Deep Neural Network (DNN) to classify handwritten digits from the MNIST dataset with at least **97% accuracy** on the test set.

## Dataset Overview
The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0–9).
- **Image size:** 28 × 28 pixels
- **Training set:** 60,000 images
- **Testing set:** 10,000 images

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print("TensorFlow version:", tf.__version__)

## Task 1: Data Preprocessing

Before feeding data into a DNN, it must be formatted correctly:
- **Normalization:** Convert pixel values from [0, 255] to [0, 1]
- **Flattening:** Reshape the 28×28 2D images into a 1D vector of size 784

In [None]:
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

print("Raw data shapes:")
print(f"  x_train: {x_train.shape}, y_train: {y_train.shape}")
print(f"  x_test:  {x_test.shape},  y_test:  {y_test.shape}")
print(f"  Pixel value range before normalization: [{x_train.min()}, {x_train.max()}]")

# --- Normalization: scale pixel values from [0, 255] to [0, 1] ---
x_train = x_train.astype('float32') / 255.0
x_test  = x_test.astype('float32')  / 255.0

# --- Flattening: reshape 28x28 images to 1D vectors of size 784 ---
x_train_flat = x_train.reshape(-1, 784)
x_test_flat  = x_test.reshape(-1, 784)

print("\nAfter preprocessing:")
print(f"  x_train_flat: {x_train_flat.shape}")
print(f"  x_test_flat:  {x_test_flat.shape}")
print(f"  Pixel value range after normalization: [{x_train_flat.min():.1f}, {x_train_flat.max():.1f}]")

# One-hot encode labels (required for categorical_crossentropy)
y_train_cat = to_categorical(y_train, num_classes=10)
y_test_cat  = to_categorical(y_test,  num_classes=10)
print(f"\n  y_train_cat shape: {y_train_cat.shape}  (one-hot encoded)")

# Visualize a few sample images
fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train[i], cmap='gray')
    ax.set_title(f"Label: {y_train[i]}")
    ax.axis('off')
plt.suptitle("Sample MNIST Images", fontsize=14)
plt.tight_layout()
plt.show()

## Task 2: Architecture Design

Build a Sequential model with the following layers:
| Layer | Details |
|-------|---------|
| Input | 784 units |
| Hidden Layer 1 | 512 neurons, ReLU activation |
| Dropout | 20% rate (prevents overfitting) |
| Hidden Layer 2 | 256 neurons, ReLU activation |
| Output Layer | 10 neurons, Softmax activation |

In [None]:
# Build the DNN model
model = keras.Sequential(
    [
        # Input layer (784 units — one per flattened pixel)
        layers.Input(shape=(784,)),

        # Hidden Layer 1: 512 neurons with ReLU activation
        layers.Dense(512, activation='relu'),

        # Dropout Layer: 20% rate to reduce overfitting
        layers.Dropout(0.2),

        # Hidden Layer 2: 256 neurons with ReLU activation
        layers.Dense(256, activation='relu'),

        # Output Layer: 10 neurons with Softmax (one probability per digit class)
        layers.Dense(10, activation='softmax'),
    ],
    name="DNN_MNIST",
)

model.summary()

## Task 3: Training Configuration

Hyperparameters:
- **Loss Function:** `categorical_crossentropy`
- **Optimizer:** Adam
- **Metrics:** accuracy
- **Batch Size:** 128
- **Epochs:** 15

In [None]:
# Compile the model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

# Train the model
history = model.fit(
    x_train_flat, y_train_cat,
    batch_size=128,
    epochs=15,
    validation_split=0.1,   # 10% of training data used for validation
    verbose=1,
)

In [None]:
# Evaluate on the test set
test_loss, test_accuracy = model.evaluate(x_test_flat, y_test_cat, verbose=0)
print(f"Test Loss:     {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

if test_accuracy >= 0.97:
    print("\n✅ Target of ≥97% accuracy achieved!")
else:
    print("\n⚠️  Target accuracy not yet reached. Consider more epochs or tuning.")

## Analysis Questions

### Q1 – Overfitting Check
Plot training accuracy vs. validation accuracy. Does the model overfit?

In [None]:
epochs_range = range(1, len(history.history['accuracy']) + 1)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
axes[0].plot(epochs_range, history.history['accuracy'],     label='Training Accuracy',   marker='o')
axes[0].plot(epochs_range, history.history['val_accuracy'], label='Validation Accuracy', marker='s')
axes[0].set_title('Training vs Validation Accuracy')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Accuracy')
axes[0].legend()
axes[0].grid(True)

# Loss
axes[1].plot(epochs_range, history.history['loss'],     label='Training Loss',   marker='o')
axes[1].plot(epochs_range, history.history['val_loss'], label='Validation Loss', marker='s')
axes[1].set_title('Training vs Validation Loss')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

# Overfitting analysis
final_train_acc = history.history['accuracy'][-1]
final_val_acc   = history.history['val_accuracy'][-1]
gap = final_train_acc - final_val_acc

print(f"Final Training Accuracy:   {final_train_acc * 100:.2f}%")
print(f"Final Validation Accuracy: {final_val_acc * 100:.2f}%")
print(f"Accuracy Gap (train - val): {gap * 100:.2f}%")
print()
if gap < 0.02:
    print("✅ The model does NOT overfit significantly. The training and validation accuracy "
          "curves closely follow each other, and the Dropout layer is effectively regularising "
          "the network.")
else:
    print("⚠️  A noticeable gap exists between training and validation accuracy, indicating "
          "some degree of overfitting. Increasing the Dropout rate or adding L2 regularisation "
          "could help.")

### Q2 – Activation Functions: ReLU vs Sigmoid

What happens to convergence speed if ReLU is replaced by Sigmoid in the hidden layers?

In [None]:
# Build an identical model but using Sigmoid in the hidden layers
model_sigmoid = keras.Sequential(
    [
        layers.Input(shape=(784,)),
        layers.Dense(512, activation='sigmoid'),
        layers.Dropout(0.2),
        layers.Dense(256, activation='sigmoid'),
        layers.Dense(10, activation='softmax'),
    ],
    name="DNN_MNIST_Sigmoid",
)

model_sigmoid.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

history_sigmoid = model_sigmoid.fit(
    x_train_flat, y_train_cat,
    batch_size=128,
    epochs=15,
    validation_split=0.1,
    verbose=0,
)

# Compare accuracy curves
plt.figure(figsize=(10, 5))
plt.plot(epochs_range, history.history['val_accuracy'],         label='ReLU – Validation Accuracy',   linewidth=2)
plt.plot(epochs_range, history_sigmoid.history['val_accuracy'], label='Sigmoid – Validation Accuracy', linewidth=2, linestyle='--')
plt.title('Validation Accuracy: ReLU vs Sigmoid')
plt.xlabel('Epoch')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.grid(True)
plt.show()

print("ReLU   final val accuracy:", f"{history.history['val_accuracy'][-1]*100:.2f}%")
print("Sigmoid final val accuracy:", f"{history_sigmoid.history['val_accuracy'][-1]*100:.2f}%")
print()
print("""
Analysis:
---------
ReLU converges faster than Sigmoid because:
  1. ReLU avoids the vanishing gradient problem: its gradient is either 0 or 1,
     so gradients propagate efficiently through many layers.
  2. Sigmoid saturates at both extremes (output → 0 or 1), causing very small
     gradients and slow weight updates — especially in deeper networks.
  3. ReLU is also computationally cheaper (a simple max(0, x) operation).
As a result, the ReLU model typically reaches higher accuracy earlier in training.
""")

### Q3 – Error Analysis

Identify three images the model classified incorrectly and explain why it may have struggled.

In [None]:
# Get model predictions on the test set
y_pred_probs = model.predict(x_test_flat, verbose=0)
y_pred       = np.argmax(y_pred_probs, axis=1)
y_true       = y_test  # original integer labels

# Find misclassified samples
misclassified_idx = np.where(y_pred != y_true)[0]
print(f"Total misclassified images: {len(misclassified_idx)} / {len(y_true)}")

# Display the first 3 misclassified images
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for i, ax in enumerate(axes):
    idx = misclassified_idx[i]
    ax.imshow(x_test[idx], cmap='gray')
    ax.set_title(
        f"True: {y_true[idx]}  Predicted: {y_pred[idx]}\n"
        f"Confidence: {y_pred_probs[idx, y_pred[idx]]*100:.1f}%",
        fontsize=11,
    )
    ax.axis('off')
plt.suptitle("Misclassified Test Images", fontsize=14)
plt.tight_layout()
plt.show()

print("""
Why might the model struggle with these images?
------------------------------------------------
Common reasons for misclassification in MNIST:
  1. Ambiguous handwriting: Digits like 4 vs 9, 3 vs 8, or 1 vs 7 share
     similar stroke patterns and are visually close.
  2. Unusual writing styles: Some writers form digits in non-standard ways
     (e.g., a '2' with an extra loop resembling a '3').
  3. Image noise or skew: A digit written at an angle or with heavy pen
     pressure can look atypical to the model.
Because this is a flat DNN (no convolutions), it lacks spatial awareness
and is more sensitive to positional variations compared with a CNN.
""")