### WAP to implement a three-layer neural network using Tensor flow library (only, no keras) to classify MNIST handwritten digits dataset. Demonstrate the implementation of feed-forward and back-propagation approaches.

### Description of the code

* Tensorflow library provides interface to artificial neural network.
* The MNIST dataset is loaded.
* Feature engineering is done as normalization.
* An input layer with 784 neurons (flattened 28x28 images)
* Two hidden layers with 128 and 64 neurons, using ReLU activation
* An output layer with 10 neurons (corresponding to digit classes), using softmax activation
* Epoch: If the model keeps improving, it is advisable to try a higher number of epochs. If the model stopped improving way before the final epoch, it is advisable to try a lower number of epochs.
* Batch size: Batch size refers to the number of samples used in one iteration.
* Optimization via Stochastic Gradient Descent (SGD). Optimizer helps us to correctly modify the weights of the neurons and the learning rate to reduce the loss and improve accuracy.
* Softmax activation function that we used for our output layer returns the predictions as a vector of probabilities.
* Loss function: Softmax cross entropy is used 

In [14]:
import tensorflow as tf
import numpy as np
from tensorflow.python.framework import ops

# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize input data
x_train, x_test = x_train / 255.0, x_test / 255.0

# Reshape input data (flattening 28x28 images into 784-dimensional vectors)
x_train = x_train.reshape(-1, 784).astype(np.float32)
x_test = x_test.reshape(-1, 784).astype(np.float32)

# Convert labels to one-hot encoding
y_train = np.eye(10)[y_train]
y_test = np.eye(10)[y_test]

# Define network parameters
input_size = 784
hidden1_size = 128
hidden2_size = 64
output_size = 10
learning_rate = 0.1
epochs = 50
batch_size = 100

# Initialize weights and biases
W1 = tf.Variable(tf.random.normal([input_size, hidden1_size], stddev=0.1))
b1 = tf.Variable(tf.zeros([hidden1_size]))
W2 = tf.Variable(tf.random.normal([hidden1_size, hidden2_size], stddev=0.1))
b2 = tf.Variable(tf.zeros([hidden2_size]))
W3 = tf.Variable(tf.random.normal([hidden2_size, output_size], stddev=0.1))
b3 = tf.Variable(tf.zeros([output_size]))

def forward_pass(x):
    hidden1 = tf.nn.relu(tf.matmul(x, W1) + b1)
    hidden2 = tf.nn.relu(tf.matmul(hidden1, W2) + b2)
    output = tf.matmul(hidden2, W3) + b3
    return output

def compute_loss(logits, labels):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

# Optimizer
optimizer = tf.optimizers.SGD(learning_rate)

def train_step(x_batch, y_batch):
    with tf.GradientTape() as tape:
        logits = forward_pass(x_batch)
        loss = compute_loss(logits, y_batch)
    gradients = tape.gradient(loss, [W1, b1, W2, b2, W3, b3])
    optimizer.apply_gradients(zip(gradients, [W1, b1, W2, b2, W3, b3]))
    return loss

# Training loop
num_batches = x_train.shape[0] // batch_size
for epoch in range(epochs):
    avg_loss = 0
    for i in range(num_batches):
        batch_x = x_train[i * batch_size:(i + 1) * batch_size]
        batch_y = y_train[i * batch_size:(i + 1) * batch_size]
        loss = train_step(batch_x, batch_y)
        avg_loss += loss / num_batches
    print(f"Epoch {epoch + 1}, Loss: {avg_loss:.4f}")

# Evaluate model
logits_test = forward_pass(x_test)
predictions = tf.argmax(logits_test, axis=1)
y_true = tf.argmax(y_test, axis=1)
accuracy = tf.reduce_mean(tf.cast(tf.equal(predictions, y_true), tf.float32))
print(f"Test Accuracy: {accuracy.numpy() * 100:.2f}%")

Epoch 1, Loss: 0.4721
Epoch 2, Loss: 0.2131
Epoch 3, Loss: 0.1595
Epoch 4, Loss: 0.1290
Epoch 5, Loss: 0.1085
Epoch 6, Loss: 0.0931
Epoch 7, Loss: 0.0811
Epoch 8, Loss: 0.0713
Epoch 9, Loss: 0.0631
Epoch 10, Loss: 0.0560
Epoch 11, Loss: 0.0499
Epoch 12, Loss: 0.0447
Epoch 13, Loss: 0.0399
Epoch 14, Loss: 0.0356
Epoch 15, Loss: 0.0317
Epoch 16, Loss: 0.0282
Epoch 17, Loss: 0.0251
Epoch 18, Loss: 0.0222
Epoch 19, Loss: 0.0198
Epoch 20, Loss: 0.0175
Epoch 21, Loss: 0.0156
Epoch 22, Loss: 0.0138
Epoch 23, Loss: 0.0123
Epoch 24, Loss: 0.0110
Epoch 25, Loss: 0.0098
Epoch 26, Loss: 0.0088
Epoch 27, Loss: 0.0079
Epoch 28, Loss: 0.0071
Epoch 29, Loss: 0.0064
Epoch 30, Loss: 0.0058
Epoch 31, Loss: 0.0053
Epoch 32, Loss: 0.0048
Epoch 33, Loss: 0.0044
Epoch 34, Loss: 0.0041
Epoch 35, Loss: 0.0037
Epoch 36, Loss: 0.0034
Epoch 37, Loss: 0.0032
Epoch 38, Loss: 0.0030
Epoch 39, Loss: 0.0027
Epoch 40, Loss: 0.0026
Epoch 41, Loss: 0.0024
Epoch 42, Loss: 0.0022
Epoch 43, Loss: 0.0021
Epoch 44, Loss: 0.00

### Description of the code

 **Data Preprocessing**:
   - Loads the MNIST dataset.
   - Normalizes pixel values range 0 to 1 so that input values are small, model can perform better.
   - Reshape images into vectors and convert labels into one-hot encoding.

 **Network Initialization**:
   - Defines the structure with two hidden layers.
   - Initializes weights and biases.
      * W1, W2, b1 and b2 are weights and biases of the hidden layer.
      * W3 and b3 are weights and biases of the output layer.
      * Weights are initialized randomly between -1 and 1. 
