EXPERIMENT 3

OBJECTIVE : WAP to implement a three-layer neural network using Tensor flow library (only, no keras) to classify MNIST handwritten digits dataset. Demonstrate the implementation of feed-forward and back-propagation approaches. 

DESCRIPTION OF MODEL
Model is a fully connected (feedforward) neural network with three layers, designed to classify handwritten digits (0-9) 
from the MNIST dataset.

Model architecture
- Input layer : Each MNIST image is 28 × 28 pixels (784 pixels)--> 784 input size
- Hidden Layer 1 : 128 neurons , Activation Function: Sigmoid
- Hidden Layer 2 : 64 neurons ,Activation Function: Sigmoid
- Output Layer: 10 neurons (For each digit from 0 to 9) , No activation because it will be handled by softmax in the loss function.

Model parameters(hyperparameters)
- Number of epochs : 10
- Learning rate : 0.01
- Batch size : 100
- Loss function : Softmax Cross-Entropy
- Optimiser : Adam

-Model follows a structured approach using forward propagation, loss calculation, backpropagation, and optimization
to learn the correct classification.

PYTHON IMPLEMENTATION

In [2]:
import tensorflow as tf
import tensorflow_datasets as tfds

# Load and preprocess the MNIST dataset
def preprocess(image, label):
    image = tf.cast(image, tf.float32) / 255.0  # Normalize to [0,1]
    image = tf.reshape(image, [-1])  # Flatten to 784
    label = tf.one_hot(label, depth=10)  # Convert to one-hot encoding
    return image, label

# Load dataset and apply preprocessing
mnist_dataset = tfds.load("mnist", split=["train", "test"], as_supervised=True)
train_data = mnist_dataset[0].map(preprocess).shuffle(10000).batch(100)
test_data = mnist_dataset[1].map(preprocess).batch(100)

# Define neural network parameters
input_size = 784
hidden_layer1_size = 128
hidden_layer2_size = 64
output_size = 10
learning_rate = 0.01
epochs = 10

# Initialize weights and biases
W1 = tf.Variable(tf.random.normal([input_size, hidden_layer1_size]))
b1 = tf.Variable(tf.zeros([hidden_layer1_size]))

W2 = tf.Variable(tf.random.normal([hidden_layer1_size, hidden_layer2_size]))
b2 = tf.Variable(tf.zeros([hidden_layer2_size]))

W_out = tf.Variable(tf.random.normal([hidden_layer2_size, output_size]))
b_out = tf.Variable(tf.zeros([output_size]))

# Forward pass function
def forward_pass(X):
    layer1 = tf.nn.sigmoid(tf.matmul(X, W1) + b1)
    layer2 = tf.nn.sigmoid(tf.matmul(layer1, W2) + b2)
    output_layer = tf.matmul(layer2, W_out) + b_out  # No activation (logits)
    return output_layer

# Define loss function
def loss_fn(logits, labels):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

# Define optimizer
optimizer = optimizer = tf.optimizers.Adam(learning_rate)

# Training step function
def train_step(X, Y):
    with tf.GradientTape() as tape:
        logits = forward_pass(X)
        loss = loss_fn(logits, Y)
    gradients = tape.gradient(loss, [W1, b1, W2, b2, W_out, b_out])
    optimizer.apply_gradients(zip(gradients, [W1, b1, W2, b2, W_out, b_out]))
    return loss

# Compute accuracy
def compute_accuracy(dataset):
    total_correct = 0
    total_samples = 0
    for X, Y in dataset:
        logits = forward_pass(X)
        correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))
        total_correct += tf.reduce_sum(tf.cast(correct_pred, tf.float32))
        total_samples += X.shape[0]
    return total_correct / total_samples

# Training loop
for epoch in range(epochs):
    avg_loss = 0
    total_batches = 0

    for batch_x, batch_y in train_data:
        loss = train_step(batch_x, batch_y)
        avg_loss += loss
        total_batches += 1

    avg_loss /= total_batches
    train_acc = compute_accuracy(train_data)
    print(f"Epoch {epoch+1}, Loss: {avg_loss:.4f}, Training Accuracy: {train_acc:.4f}")

# Test the model
test_acc = compute_accuracy(test_data)
print(f"Test Accuracy: {test_acc:.4f}")


Epoch 1, Loss: 0.4513, Training Accuracy: 0.9366
Epoch 2, Loss: 0.1944, Training Accuracy: 0.9579
Epoch 3, Loss: 0.1431, Training Accuracy: 0.9670
Epoch 4, Loss: 0.1221, Training Accuracy: 0.9692
Epoch 5, Loss: 0.1043, Training Accuracy: 0.9718
Epoch 6, Loss: 0.0924, Training Accuracy: 0.9775
Epoch 7, Loss: 0.0797, Training Accuracy: 0.9785
Epoch 8, Loss: 0.0768, Training Accuracy: 0.9791
Epoch 9, Loss: 0.0732, Training Accuracy: 0.9760
Epoch 10, Loss: 0.0730, Training Accuracy: 0.9844
Test Accuracy: 0.9655


DESCRIPTION OF CODE
- We are using TensorFlow . TensorFlow  is an open-source machine learning library developed by Google. It is used to build and train AI models, especially deep learning models like neural networks.
- Imported tensorflow , tensorflow_datasets : Helps in loading prebuilt datasets like MNIST.

def preprocess() -->
- Dataset is loaded and preprocessing is done . Pixel values are normalised between [0,1] .
- Dataset is converted to a 1D array .
- One hot encoding is done to convert labels to binary format.
- mnist dataset is split into train and test dataset each containing tuple (image , label) , both are TensorFlow dataset objects.
- map(preprocess) : Applies preprocessing to every image in the dataset.
- shuffle(10000) : Randomly shuffles 10,000 images to improve model generalization.
- batch(100) : Divides dataset into mini-batches of 100 for training. For 100 forward passes , backward propagation will be performed once.

Neural network parameters are defined : 
- input_size = 784
- hidden_layer1_size = 128
- hidden_layer2_size = 64
- output_size = 10
- learning_rate = 0.01
- epochs = 10

Weight and bias initialisation :
- Weights (W1, W2, W_out) : Randomly initialized
- Biases (b1, b2, b_out) : Initialized as zero

def forward_pass() -->
- Layer 1 output : A1 = sigmoid(XW1 + b1)
- Layer 2 output : A2 = sigmoid(A1W2 + b2)
- Output Layer (Logits): Output(Z) = (A2W_out + b_out)
- No activation in the last layer because it will be handled by softmax in the loss function.

def loss_fn() -->
- Softmax Cross-Entropy Loss: Measures how different the predictions are from true labels.
- Loss = −∑ (Yi) log(softmax(Zi))
- tf.reduce_mean(): Computes average loss across all batches.

Adam Optimizer : Optimizer adjusts the weights and biases to minimize the loss function during training.

def train_step() -->
- tf.GradientTape(): Computes gradients automatically.
- optimizer.apply_gradients(): Updates weights using the calculated gradients.

def compute_accuracy() --> 
- Predictions (argmax): Returns the index of the highest probability class.
- Correct Predictions (equal()): Compares predictions with true labels.
- Function helps compute accuracy : correct predictions/total predictions .

Training loop :
- Loops over 10 epochs.
- Calls train_step() on each batch.
- Computes training loss and accuracy.

Finally the test accuracy has been calculated which comes out to be 96.55% .


MY COMMENTS(limitations and scope of improvement)
-  The model has been trained to classify the mnist dataset and achived a training accuracy of 98.44% and final test accuracy of 96.55%.
-  Accuracy could be improved by using an activation function other than Sigmoid as it is prone to vanishing gardient. ReLU activation
  function could have been used.
- Accuracy could have been improved by increasing the number of epochs , changing the number of hidden neurons , using other optimiser
   or another loss function.