OBJECTIVE : WAP to implement a three-layer neural network using Tensor flow library (only, no keras) to classify MNIST handwritten digits dataset. Demonstrate the implementation of feed-forward and back-propagation approaches.

Model Description: Architecture: Input Layer (784 neurons) → Takes flattened 28×28 grayscale images. Hidden Layer 1 (128 neurons) → Sigmoid activation. Hidden Layer 2 (64 neurons) → Sigmoid activation. Output Layer (10 neurons) → Produces logits, passed to Softmax for classification. Hyperparameters: Loss Function: Softmax Cross-Entropy Optimizer: Adam Learning Rate: 0.01 Batch Size: 100 Epochs: 10



In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

In [5]:
dataset, info = tfds.load("mnist", as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset["train"], dataset["test"]

BATCH_SIZE = 100



Loads the MNIST dataset (a collection of handwritten digits).

as_supervised=True: Returns the dataset as pairs of (image, label).

with_info=True: Also returns metadata (info) about the dataset, including the dataset’s shape, number of classes, etc.


Defines the batch size for training and testing. In each training step, 100 samples will be processed in parallel (this helps improve computational efficiency).

In [6]:
def preprocess(image, label):
    image = tf.cast(image, tf.float32) / 255.0  # Normalize (0 to 1)
    image = tf.reshape(image, [-1])  # Flatten (28x28 → 784)
    label = tf.one_hot(label, depth=10)  # One-hot encode labels
    return image, label


image = tf.cast(image, tf.float32) / 255.0:

Converts the image to float32 type and normalizes the pixel values to the range [0, 1] (since pixel values in MNIST are originally between 0 and 255).
image = tf.reshape(image, [-1]):

Flattens the 28x28 image into a vector of length 784 (28 * 28 = 784), which is a common practice when feeding images into fully connected layers.

For example, if you have a tensor of shape (28, 28), which represents an image of size 28x28 (28 rows and 28 columns of pixel values), and you want to reshape it to a one-dimensional vector, you don’t necessarily need to explicitly specify the new dimension size (in this case, 784). You can use -1 to allow TensorFlow to automatically compute the correct value for that dimension.

In [7]:
train_dataset = train_dataset.map(preprocess).shuffle(10000).batch(BATCH_SIZE)
test_dataset = test_dataset.map(preprocess).batch(BATCH_SIZE)


train_dataset.map(preprocess): Applies the preprocess function to each image and label pair in the train_dataset.

.shuffle(10000): Shuffles the training data before batching. This ensures that the model doesn't memorize the order of the data, improving generalization.

.batch(BATCH_SIZE): Divides the data into batches of size 100.

For test dataset: The same preprocessing (map(preprocess)) is applied, but no shuffling is performed (since it's for evaluation), and the data is batched.

In [8]:
input_dim = 784
hidden_dim1 = 128
hidden_dim2 = 64
output_dim = 10


W1 = tf.Variable(tf.random.normal([input_dim, hidden_dim1], stddev=0.1))
b1 = tf.Variable(tf.zeros([hidden_dim1]))
W2 = tf.Variable(tf.random.normal([hidden_dim1, hidden_dim2], stddev=0.1))
b2 = tf.Variable(tf.zeros([hidden_dim2]))
W3 = tf.Variable(tf.random.normal([hidden_dim2, output_dim], stddev=0.1))
b3 = tf.Variable(tf.zeros([output_dim]))


In [9]:
def model(x):
    hidden_layer1 = tf.sigmoid(tf.matmul(x, W1) + b1)  # First Hidden Layer
    hidden_layer2 = tf.sigmoid(tf.matmul(hidden_layer1, W2) + b2)  # Second Hidden Layer
    logits = tf.matmul(hidden_layer2, W3) + b3  # Output layer (logits)
    return logits


In [10]:
def compute_loss(logits, labels):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))


tf.nn.softmax_cross_entropy_with_logits: Computes the softmax cross-entropy loss, which is appropriate for multi-class classification problems like MNIST.

Softmax: Converts the raw logits into probability distributions.

Cross-Entropy: Measures the difference between the predicted probabilities and the actual labels (one-hot encoded).


In [11]:
def compute_accuracy(dataset):
    correct_preds, total_samples = 0, 0
    for images, labels in dataset:
        logits = model(images)
        correct_preds += tf.reduce_sum(tf.cast(tf.equal(tf.argmax(logits, axis=1), tf.argmax(labels, axis=1)), tf.float32)).numpy()
        total_samples += images.shape[0]
    return correct_preds / total_samples


tf.argmax(logits, axis=1): Finds the class with the highest probability in the output logits.

tf.equal(...): Compares the predicted class with the actual class (the one-hot encoded labels).

tf.reduce_sum(...): Counts the number of correct predictions in the batch.

compute_accuracy: Iterates through the dataset and computes the fraction of correct predictions.

In [12]:
optimizer = tf.optimizers.Adam(learning_rate=0.01)

# tf.optimizers.Adam: Initializes the Adam optimizer, which is an advanced gradient descent method that adapts the learning rate during training.


In [13]:
def train_step(images, labels):
    with tf.GradientTape() as tape:
        logits = model(images)
        loss = compute_loss(logits, labels)
    gradients = tape.gradient(loss, [W1, b1, W2, b2, W3, b3])
    optimizer.apply_gradients(zip(gradients, [W1, b1, W2, b2, W3, b3]))
    return loss


tf.GradientTape(): Tracks the operations performed on the tensors in the forward pass to calculate gradients during backpropagation.

tape.gradient(loss, ...): Computes the gradients of the loss with respect to the model parameters (weights and biases).

optimizer.apply_gradients: Applies the gradients to update the model parameters.

In [14]:
epochs = 10
for epoch in range(epochs):
    total_loss = 0.0
    for images, labels in train_dataset:
        loss = train_step(images, labels)
        total_loss += loss.numpy()
    
    print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")


Epoch 1, Loss: 170.0491
Epoch 2, Loss: 67.4476
Epoch 3, Loss: 50.1181
Epoch 4, Loss: 39.9755
Epoch 5, Loss: 35.3683
Epoch 6, Loss: 31.6751
Epoch 7, Loss: 26.0732
Epoch 8, Loss: 24.7058
Epoch 9, Loss: 23.7283
Epoch 10, Loss: 24.6713


Epoch loop: The model is trained for 10 epochs. In each epoch:
    
The model processes all batches of training data.

The train_step function is called to perform a forward pass, calculate the loss, and apply gradients.

The loss is accumulated for the epoch and printed at the end of each epoch.


In [15]:
train_accuracy = compute_accuracy(train_dataset)
test_accuracy = compute_accuracy(test_dataset)

print(f"Final Training Accuracy (Adam): {train_accuracy:.4f}")
print(f"Final Test Accuracy (Adam): {test_accuracy:.4f}")


Final Training Accuracy (Adam): 0.9848
Final Test Accuracy (Adam): 0.9696


Summary:

Data Loading and Preprocessing: Loads and preprocesses the MNIST dataset using tensorflow_datasets, normalizing the images, flattening them, and one-hot encoding the labels.

Model Definition: A 3-layer neural network with two hidden layers and an output layer.

Training: The model is trained using the Adam optimizer and the softmax cross-entropy loss function. The model’s parameters are updated during each training step.

Evaluation: After training, the model’s accuracy on the training and test datasets is computed and printed.

This code implements a basic neural network from scratch using TensorFlow and trains it to classify handwritten digits from the MNIST dataset.

MY COMMENTS: -

Adam optimizer's learning rate must be low. -Use ReLU instead of Sigmoid. -It works better with greater batch size -Faster Compared to SGD

Training Accuracy: 0.9906 

Test Accuracy: 0.9738