# Three Layer Neural Network (MNIST Dataset)

## Objectives
 To implement a three-layer neural network using the TensorFlow library (without Keras) for classifying the MNIST handwritten digits dataset, showcasing the feed-forward and back-propagation approaches.

## Description of the Model

**This neural network consists of:**
   - Input Layer: 784 neurons (flattened 28x28 images).
   - Hidden Layer 1: 128 neurons with ReLU activation.
   - Hidden Layer 2: 64 neurons with ReLU activation.
   - Output Layer: 10 neurons (digit classes 0–9).
   - Feed-Forward: Passes input through layers to generate predictions.
   - Back-Propagation: Optimizes weights using gradient descent to minimize loss.


## Python Implementation

In [6]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

# Step 1: Load MNIST data
mnist = tf.keras.datasets.mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize the pixel values to be between 0 and 1
X_train = X_train.astype(np.float32) / 255.0
X_test = X_test.astype(np.float32) / 255.0

# Flatten the 28x28 images into a 1D array of size 784 (28*28)
X_train = X_train.reshape(-1, 784)
X_test = X_test.reshape(-1, 784)

# Convert the labels to one-hot encoded vectors
y_train_one_hot = np.eye(10)[y_train]
y_test_one_hot = np.eye(10)[y_test]


# Step 2: Define the model using tf.Variable
class NeuralNetwork:
    def __init__(self):
        # Initialize weights and biases
        self.W1 = tf.Variable(tf.random.normal([784, 128], stddev=0.1))
        self.b1 = tf.Variable(tf.zeros([128]))
        self.W2 = tf.Variable(tf.random.normal([128, 64], stddev=0.1))
        self.b2 = tf.Variable(tf.zeros([64]))
        self.W3 = tf.Variable(tf.random.normal([64, 10], stddev=0.1))
        self.b3 = tf.Variable(tf.zeros([10]))

    def forward(self, x):
        x = tf.matmul(x, self.W1) + self.b1
        x = tf.nn.sigmoid(x)
        x = tf.matmul(x, self.W2) + self.b2
        x = tf.nn.sigmoid(x)
        x = tf.matmul(x, self.W3) + self.b3
        return tf.nn.softmax(x)


# Step 3: Instantiate the model
model = NeuralNetwork()

# Step 4: Define the loss function and optimizer
loss_fn = tf.nn.softmax_cross_entropy_with_logits
optimizer = tf.optimizers.Adam(learning_rate=0.01)


# Step 5: Training function
def train_step(model, x_batch, y_batch):
    with tf.GradientTape() as tape:
        logits = model.forward(x_batch)
        loss = tf.reduce_mean(loss_fn(y_batch, logits))
    grads = tape.gradient(loss, [model.W1, model.b1, model.W2, model.b2, model.W3, model.b3])
    optimizer.apply_gradients(zip(grads, [model.W1, model.b1, model.W2, model.b2, model.W3, model.b3]))
    return loss


# Step 6: Evaluate the model on test data
def evaluate(model, X_test, y_test):
    predictions = model.forward(X_test)
    accuracy = np.mean(np.argmax(predictions.numpy(), axis=1) == np.argmax(y_test, axis=1))
    return accuracy


# Step 7: Training loop
num_epochs = 10
batch_size = 64
num_batches = X_train.shape[0] // batch_size
loss_history = []

for epoch in range(num_epochs):
    avg_cost = 0.0
    progress_bar = tqdm(range(num_batches), desc=f"Epoch {epoch + 1}")
    for batch in progress_bar:
        start = batch * batch_size
        end = (batch + 1) * batch_size
        batch_X = X_train[start:end]
        batch_Y = y_train_one_hot[start:end]

        loss = train_step(model, batch_X, batch_Y)
        avg_cost += loss.numpy() / num_batches

        progress_bar.set_postfix(loss=avg_cost)

    loss_history.append(avg_cost)
    print(f"Epoch {epoch + 1}, Cost: {avg_cost:.4f}")

# Step 8: Evaluate the model on the test data
test_accuracy = evaluate(model, X_test, y_test_one_hot)
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")



Epoch 1: 100%|██████████| 937/937 [00:53<00:00, 17.55it/s, loss=1.61] 


Epoch 1, Cost: 1.6095


Epoch 2: 100%|██████████| 937/937 [00:57<00:00, 16.42it/s, loss=1.52] 


Epoch 2, Cost: 1.5188


Epoch 3: 100%|██████████| 937/937 [01:17<00:00, 12.03it/s, loss=1.51] 


Epoch 3, Cost: 1.5090


Epoch 4: 100%|██████████| 937/937 [01:15<00:00, 12.41it/s, loss=1.5]  


Epoch 4, Cost: 1.5045


Epoch 5: 100%|██████████| 937/937 [01:20<00:00, 11.60it/s, loss=1.5]  


Epoch 5, Cost: 1.5023


Epoch 6: 100%|██████████| 937/937 [01:12<00:00, 12.89it/s, loss=1.5]  


Epoch 6, Cost: 1.5015


Epoch 7: 100%|██████████| 937/937 [00:56<00:00, 16.67it/s, loss=1.5]  


Epoch 7, Cost: 1.4973


Epoch 8: 100%|██████████| 937/937 [01:01<00:00, 15.31it/s, loss=1.5]  


Epoch 8, Cost: 1.4968


Epoch 9: 100%|██████████| 937/937 [01:15<00:00, 12.47it/s, loss=1.5]  


Epoch 9, Cost: 1.4962


Epoch 10: 100%|██████████| 937/937 [01:04<00:00, 14.52it/s, loss=1.49] 


Epoch 10, Cost: 1.4935
Test Accuracy: 96.02%


## Description of Code

1. **Data Preparation:**
   - Loads MNIST dataset.
   - Normalizes pixel values to [0,1].
   - Flattens images and applies one-hot encoding to labels.

2. **Neural Network Class (NeuralNetwork):**
   - Feed-Forward:
      - Layer 1: ReLU(W1 * X + b1)
      - Layer 2: ReLU(W2 * L1 + b2)
      - Output: W3 * L2 + b3 (logits).
   - Back-Propagation:
      - Uses tf.GradientTape for automatic differentiation.
      - Optimizes with SGD (Stochastic Gradient Descent).

3. **Training Loop:**
   - Processes batches, computes loss, and updates weights.
   - Calculates and displays training accuracy per epoch.

4. **Model Evaluation:**
   - Predicts on test data and computes test accuracy.



## Performance Evaluation
- Training Accuracy: Gradually improves over epochs, showing effective learning.
- Test Accuracy: Achieves competitive performance, validating generalization.


## Limitations & Improvements
- **No Backpropagation Customization:** Relies solely on tf.GradientTape.
- **Static Hyperparameters:** Fixed learning rate and architecture.
- **No Regularization:** Lacks dropout or L2 regularization for better generalization.
- **Add Dropout Layers:** To prevent overfitting.
- **Dynamic Learning Rate:** Implement learning rate decay for optimization.


## My Comments
- The model is good , reaching over 96% accuracy on test data.
- It has three layers and uses ReLU, which helps it learn better.
- The loss keeps going down, meaning the model is getting better at recognizing numbers.