# Evaluation of a Three-Layer Neural Network in TensorFlow

## Objective
To evaluate the performance of a three-layer neural network with variations in activation functions, size of hidden layers, learning rate, batch size, and number of epochs using TensorFlow.

## Description of the Model
This neural network consists of:
- Input layer with 784 neurons (28x28 images)
- Two hidden layers (sizes vary in experiments)
- Output layer with 10 neurons (one per class)

Experiments vary activation functions (`ReLU`, `Sigmoid`), hidden layer sizes, learning rate, batch size, and number of epochs to evaluate performance impact.

In [1]:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype(np.float32) / 255.0
x_test = x_test.reshape(-1, 784).astype(np.float32) / 255.0
y_train = tf.one_hot(y_train, depth=10)
y_test = tf.one_hot(y_test, depth=10)


In [2]:

def train_model(hidden1=128, hidden2=64, activation=tf.nn.relu, lr=0.01, epochs=10, batch_size=100):
    input_size = 784
    output_size = 10

    W1 = tf.Variable(tf.random.normal([input_size, hidden1], stddev=0.1))
    b1 = tf.Variable(tf.zeros([hidden1]))
    W2 = tf.Variable(tf.random.normal([hidden1, hidden2], stddev=0.1))
    b2 = tf.Variable(tf.zeros([hidden2]))
    W3 = tf.Variable(tf.random.normal([hidden2, output_size], stddev=0.1))
    b3 = tf.Variable(tf.zeros([output_size]))

    def forward(x):
        z1 = tf.matmul(x, W1) + b1
        a1 = activation(z1)
        z2 = tf.matmul(a1, W2) + b2
        a2 = activation(z2)
        return tf.matmul(a2, W3) + b3

    def compute_loss(logits, labels):
        return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

    def compute_accuracy(logits, labels):
        preds = tf.argmax(logits, axis=1)
        actual = tf.argmax(labels, axis=1)
        return tf.reduce_mean(tf.cast(tf.equal(preds, actual), tf.float32))

    num_batches = x_train.shape[0] // batch_size
    for epoch in range(epochs):
        avg_loss = 0
        for i in range(num_batches):
            start, end = i * batch_size, (i + 1) * batch_size
            x_batch = x_train[start:end]
            y_batch = y_train[start:end]
            with tf.GradientTape() as tape:
                logits = forward(x_batch)
                loss = compute_loss(logits, y_batch)
            grads = tape.gradient(loss, [W1, b1, W2, b2, W3, b3])
            for var, grad in zip([W1, b1, W2, b2, W3, b3], grads):
                var.assign_sub(lr * grad)
            avg_loss += loss.numpy()

        test_logits = forward(x_test)
        test_acc = compute_accuracy(test_logits, y_test)
        print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}, Accuracy: {test_acc.numpy():.4f}")


In [4]:

# Try different activation functions and hidden layer sizes
train_model(hidden1=128, hidden2=64, activation=tf.nn.relu, lr=0.01, epochs=10, batch_size=100)
train_model(hidden1=256, hidden2=128, activation=tf.nn.relu, lr=0.005, epochs=10, batch_size=64)



Epoch 1/10, Loss: 829.9273, Accuracy: 0.8280
Epoch 2/10, Loss: 322.9185, Accuracy: 0.8793
Epoch 3/10, Loss: 247.2450, Accuracy: 0.8952
Epoch 4/10, Loss: 216.1263, Accuracy: 0.9055
Epoch 5/10, Loss: 196.8300, Accuracy: 0.9113
Epoch 6/10, Loss: 182.5364, Accuracy: 0.9183
Epoch 7/10, Loss: 171.0022, Accuracy: 0.9229
Epoch 8/10, Loss: 161.2271, Accuracy: 0.9257
Epoch 9/10, Loss: 152.7163, Accuracy: 0.9292
Epoch 10/10, Loss: 145.1577, Accuracy: 0.9316
Epoch 1/10, Loss: 923.6167, Accuracy: 0.8587
Epoch 2/10, Loss: 417.6165, Accuracy: 0.8958
Epoch 3/10, Loss: 339.0749, Accuracy: 0.9097
Epoch 4/10, Loss: 300.9982, Accuracy: 0.9166
Epoch 5/10, Loss: 275.8647, Accuracy: 0.9227
Epoch 6/10, Loss: 256.8271, Accuracy: 0.9271
Epoch 7/10, Loss: 241.4531, Accuracy: 0.9305
Epoch 8/10, Loss: 228.4697, Accuracy: 0.9340
Epoch 9/10, Loss: 217.1358, Accuracy: 0.9367
Epoch 10/10, Loss: 207.0874, Accuracy: 0.9384


## Performance Evaluation
Each experiment logs the test accuracy and loss per epoch. You can visualize or extend this to include confusion matrix or loss curve plots.

## My Comments
- Accuracy improves with more neurons and proper activation.
- Sigmoid gives lower performance compared to ReLU due to vanishing gradients.
- Learning rate and batch size significantly affect convergence speed.
- Can be further improved by adding dropout or batch normalization.