# Three-Layer Neural Network for MNIST Classification

This notebook implements a **three-layer neural network** using the **TensorFlow** library (without Keras) to classify handwritten digits from the **MNIST dataset**.

## **Approach**
- **Feedforward Propagation**: Computes activations using weights and biases.
- **Backpropagation**: Updates weights using gradients to minimize classification error.

## **Network Architecture**
1. **Input Layer**: 784 neurons (since images are 28x28 pixels)
2. **Hidden Layer 1**: 128 neurons (with ReLU activation)
3. **Hidden Layer 2**: 64 neurons (with ReLU activation)
4. **Output Layer**: 10 neurons (for 10 digit classes, using softmax activation)

## **Training Details**
- **Loss Function**: Cross-Entropy Loss
- **Optimization Algorithm**: Gradient Descent Optimizer
- **Evaluation**: Accuracy calculation on test data


In [2]:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Load MNIST data using tf.keras.datasets (for convenience)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize and reshape
x_train = x_train.reshape(-1, 784).astype(np.float32) / 255.0
x_test = x_test.reshape(-1, 784).astype(np.float32) / 255.0

# Convert labels to one-hot
y_train = tf.one_hot(y_train, depth=10)
y_test = tf.one_hot(y_test, depth=10)

# Parameters
input_size = 784
hidden1_size = 128
hidden2_size = 64
output_size = 10
learning_rate = 0.01
epochs = 10
batch_size = 100

# Initialize weights and biases
W1 = tf.Variable(tf.random.normal([input_size, hidden1_size], stddev=0.1))
b1 = tf.Variable(tf.zeros([hidden1_size]))

W2 = tf.Variable(tf.random.normal([hidden1_size, hidden2_size], stddev=0.1))
b2 = tf.Variable(tf.zeros([hidden2_size]))

W3 = tf.Variable(tf.random.normal([hidden2_size, output_size], stddev=0.1))
b3 = tf.Variable(tf.zeros([output_size]))

# Feed-forward function
def forward_pass(x):
    z1 = tf.matmul(x, W1) + b1
    a1 = tf.nn.relu(z1)

    z2 = tf.matmul(a1, W2) + b2
    a2 = tf.nn.relu(z2)

    logits = tf.matmul(a2, W3) + b3
    return logits

# Loss function
def compute_loss(logits, labels):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

# Accuracy function
def compute_accuracy(logits, labels):
    preds = tf.argmax(logits, axis=1)
    actual = tf.argmax(labels, axis=1)
    return tf.reduce_mean(tf.cast(tf.equal(preds, actual), tf.float32))

# Training loop
num_batches = x_train.shape[0] // batch_size

for epoch in range(epochs):
    epoch_loss = 0.0
    for i in range(num_batches):
        start = i * batch_size
        end = start + batch_size
        x_batch = x_train[start:end]
        y_batch = y_train[start:end]

        # Backpropagation using GradientTape
        with tf.GradientTape() as tape:
            logits = forward_pass(x_batch)
            loss = compute_loss(logits, y_batch)

        grads = tape.gradient(loss, [W1, b1, W2, b2, W3, b3])
        # Update weights manually
        W1.assign_sub(learning_rate * grads[0])
        b1.assign_sub(learning_rate * grads[1])
        W2.assign_sub(learning_rate * grads[2])
        b2.assign_sub(learning_rate * grads[3])
        W3.assign_sub(learning_rate * grads[4])
        b3.assign_sub(learning_rate * grads[5])

        epoch_loss += loss.numpy()

    # Evaluate on test set
    test_logits = forward_pass(x_test)
    test_acc = compute_accuracy(test_logits, y_test)
    print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}, Test Accuracy: {test_acc:.4f}")


Epoch 1/10, Loss: 795.5376, Test Accuracy: 0.8138
Epoch 2/10, Loss: 328.9186, Test Accuracy: 0.8749
Epoch 3/10, Loss: 249.5390, Test Accuracy: 0.8938
Epoch 4/10, Loss: 217.2374, Test Accuracy: 0.9063
Epoch 5/10, Loss: 197.7055, Test Accuracy: 0.9141
Epoch 6/10, Loss: 183.6171, Test Accuracy: 0.9186
Epoch 7/10, Loss: 172.4661, Test Accuracy: 0.9222
Epoch 8/10, Loss: 163.1165, Test Accuracy: 0.9246
Epoch 9/10, Loss: 155.0205, Test Accuracy: 0.9264
Epoch 10/10, Loss: 147.8261, Test Accuracy: 0.9279
