Objective:-  WAP to implement a three-layer neural network using Tensor flow library (only, no keras) to classify MNIST handwritten digits dataset. Demonstrate the implementation of feed-forward and back-propagation approaches.

Description of the model:-  This is a three-layer neural network implemented using TensorFlow (without Keras) for classifying handwritten digits from the MNIST dataset.

The model consists:-
-> Input Layer (784 neurons\nodes):- Takes flattened 28x28 pixel images.
-> Hidden Layer 1 (128 neurons\nodes):- Sigmoid activation function.
-> Hidden Layer 2 (64 neurons\nodes):- Sigmoid activation function.
-> Output Layer (10 neurons\nodes):- Uses softmax activation for multi-class classification digits (0-9).
-> Loss Function:- softmax Categorical cross-entropy.
-> Optimizer:-  Adam Optimizer.
-> Epochs:- 10
-> Learning Rate:- 0.01
-> Batch size = 100

In [58]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np

In [None]:
dataset, info = tfds.load("mnist", as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset["train"], dataset["test"]

BATCH_SIZE = 100

train_dataset


<_PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>

In [48]:
def preprocess(image, label):
    image = tf.cast(image, tf.float32) / 255.0  # Normalize (0 to 1)
    image = tf.reshape(image, [-1])  # Flatten (28x28 → 784)
    label = tf.one_hot(label, depth=10)  # One-hot encode labels
    return image, label


In [49]:
train_dataset = train_dataset.map(preprocess).shuffle(10000).batch(BATCH_SIZE)
test_dataset = test_dataset.map(preprocess).batch(BATCH_SIZE)


In [5]:
input_dim = 784
hidden_dim1 = 128
hidden_dim2 = 64
output_dim = 10


W1 = tf.Variable(tf.random.normal([input_dim, hidden_dim1], stddev=0.1))
b1 = tf.Variable(tf.zeros([hidden_dim1]))
W2 = tf.Variable(tf.random.normal([hidden_dim1, hidden_dim2], stddev=0.1))
b2 = tf.Variable(tf.zeros([hidden_dim2]))
W3 = tf.Variable(tf.random.normal([hidden_dim2, output_dim], stddev=0.1))
b3 = tf.Variable(tf.zeros([output_dim]))


In [28]:
def model(x):
    hidden_layer1 = tf.sigmoid(tf.matmul(x, W1) + b1)  # First Hidden Layer
    hidden_layer2 = tf.sigmoid(tf.matmul(hidden_layer1, W2) + b2)  # Second Hidden Layer
    logits = tf.matmul(hidden_layer2, W3) + b3  # Output layer (logits)
    return logits


In [29]:
def compute_loss(logits, labels):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))


In [30]:
def compute_accuracy(dataset):
    correct_preds, total_samples = 0, 0
    for images, labels in dataset:
        logits = model(images)
        correct_preds += tf.reduce_sum(tf.cast(tf.equal(tf.argmax(logits, axis=1), tf.argmax(labels, axis=1)), tf.float32)).numpy()
        total_samples += images.shape[0]
    return correct_preds / total_samples


In [31]:
optimizer = tf.optimizers.Adam(learning_rate=0.01)

# tf.optimizers.Adam: Initializes the Adam optimizer, which is an advanced gradient descent method that adapts the learning rate during training.


In [32]:
def train_step(images, labels):
    with tf.GradientTape() as tape:
        logits = model(images)
        loss = compute_loss(logits, labels)
    gradients = tape.gradient(loss, [W1, b1, W2, b2, W3, b3])
    optimizer.apply_gradients(zip(gradients, [W1, b1, W2, b2, W3, b3]))
    return loss


tf.GradientTape(): Tracks the operations performed on the tensors in the forward pass to calculate gradients during backpropagation.

tape.gradient(loss, ...): Computes the gradients of the loss with respect to the model parameters (weights and biases).

optimizer.apply_gradients: Applies the gradients to update the model parameters.

In [33]:
epochs = 10
for epoch in range(epochs):
    total_loss = 0.0
    for images, labels in train_dataset:
        loss = train_step(images, labels)
        total_loss += loss.numpy()
    
    print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")


Epoch 1, Loss: 53.7979
Epoch 2, Loss: 35.1091
Epoch 3, Loss: 27.0394
Epoch 4, Loss: 22.7045
Epoch 5, Loss: 22.5453
Epoch 6, Loss: 20.0231
Epoch 7, Loss: 19.7101
Epoch 8, Loss: 15.4758
Epoch 9, Loss: 16.7661
Epoch 10, Loss: 16.9674


Epoch loop: The model is trained for 10 epochs. In each epoch:
    
The model processes all batches of training data.

The train_step function is called to perform a forward pass, calculate the loss, and apply gradients.

The loss is accumulated for the epoch and printed at the end of each epoch.


In [34]:
train_accuracy = compute_accuracy(train_dataset)
test_accuracy = compute_accuracy(test_dataset)

print(f"Final Training Accuracy (Adam): {train_accuracy:.4f}")
print(f"Final Test Accuracy (Adam): {test_accuracy:.4f}")


Final Training Accuracy (Adam): 0.9940
Final Test Accuracy (Adam): 0.9761


Description of Code:

Load Data:- Loads the MNIST dataset (a collection of handwritten digits) using tensorflow_datasets.

PreProcess Data:- preprocesses the MNIST dataset by normalizing the images , flattening them, and convert labels to one-hot encoding. 

Initialize Model Parameters:- 
    Weights (W1, W2, W3) initialized with  random values and biases (b1, b2, b3) initialized as zeros.

Feed-Forward Propagation:-
    hidden_Layer1 = sigmoid(X * W1 + b1).
    hidden_layer2  = sigmoid(a1 * W2 + b2).
    Output Layer:-  logits = softmax(a2 * W3 + b3).

Loss Calculation:-
   tf.nn.softmax_cross_entropy_with_logits: Computes the softmax cross-entropy loss, which is appropriate for multi-class classification problems like MNIST.
   Softmax: Converts the raw logits into probability distributions.
   Cross-Entropy: Measures the difference between the predicted probabilities and the actual labels (one-hot encoded).

Backpropagation & Optimization
   Uses AdamOptimizer() and Updates weights and biases.

Training:- Iterates through 10 epochs with batch size = 100.
           Prints training loss & accuracy.
           The model is trained using the Adam optimizer and the softmax cross-entropy loss function. The model’s parameters are updated during each training step.

Evaluation:-  After training, the model’s accuracy on the training and test datasets is computed and printed.

This code implements a basic neural network from scratch using TensorFlow and trains it to classify handwritten digits from the MNIST dataset.

MYy Comments: -

Use Relu Activation function or Variants of relu for Faster Convergence than sigmoid.Set low Learning rate when you use relu. 
Use Softmax in output layer for multi-class classification.

Use Sparse categorical cross entropy instead of using categorical cross entropy loss function because no need to do one hot encoding in Sparse categorical cross entropy.

Adam optimizer's learning rate must be low.It works better with greater batch size -Faster Compared to SGD.

Training Accuracy: 0.9940 

Test Accuracy: 0.9761