Objective - <br>
To evaluate the performance of a three-layer neural network by applying different activation functions and hidden layer sizes.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.datasets import mnist


Used tensorflow library to provide tools for building and training ML models. <br>
Used the NumPy library for handling arrays and performing mathematical operations.
<br>
Imported mnist dataset through keras module.

In [None]:
# Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1, 28 * 28).astype(np.float32) / 255.0
x_test = x_test.reshape(-1, 28 * 28).astype(np.float32) / 255.0
num_classes = 10
y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]

Here This part of Code load mnist dataset. <br>
Reshapes the images from 28x28 pixels into a 1D array of 784 features.Converts the data type to float32 for better performance during training.<br>
Normalizes pixel values to the range [0, 1] by dividing by 255. <br>
Defines the number of classes (digits 0-9). <br>
Applies one-hot encoding to the labels.


In [None]:
# Hyperparameters
batch_size = 64
num_epochs = 10
learning_rate = 0.001

batch_size = 64 <br>
Specifies the number of samples processed before the model updates its weights. <br>
num_epochs = 10 <br>
Specifies the number of times the model iterates over the entire dataset. <br>
learning_rate = 0.001 <br>
Sets the learning rate for the Adam optimizer.<br>


In [None]:
# Function to create models with different activation functions and layer sizes
def create_model(hidden_size, activation):
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(hidden_size, activation=activation, input_shape=(784,)),
        tf.keras.layers.Dense(10)
    ])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate),
                  loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    return model

Defines a function to create a sequential model, which is a linear stack of layers. <br>
Adds a dense layer with: <br>
hidden_size: Number of neurons.<br>
activation: Activation function (sigmoid, relu, or tanh).<br>
input_shape=(784,): Specifies the input size (28x28 image flattened to 784 features).<br>
Adds the output layer with 10 neurons (one for each digit class).<br>
No activation is applied since the softmax activation is automatically applied with CategoricalCrossentropy.<br>
Configures the model with:<br>
optimizer: Adam optimizer with the specified learning rate.<br>
loss: Categorical cross-entropy loss function.<br>
metrics: Tracks accuracy during training.<br>



In [None]:
# Layer sizes and activations
layer_sizes = [256, 128, 64]
activations = ['sigmoid', 'relu', 'tanh']

Provided by objective which defines layer sizes and activations

In [None]:
# Training and evaluation loop
results = {}

for activation in activations:
    print(f"\n🔹 Using {activation.upper()} activation:")
    for size in layer_sizes:
        print(f"Training with {size} hidden units...")

        # Create and train the model
        model = create_model(size, activation)
        model.fit(x_train, y_train, epochs=num_epochs, batch_size=batch_size, verbose=1)

        # Evaluate on the test set
        test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
        print(f"Test Accuracy with {activation} (Hidden: {size}): {test_acc:.4f}")

        # Store the results
        results[f"{activation}_{size}"] = test_acc



🔹 Using SIGMOID activation:
Training with 256 hidden units...


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.8287 - loss: 0.6769
Epoch 2/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 5ms/step - accuracy: 0.9349 - loss: 0.2290
Epoch 3/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 8ms/step - accuracy: 0.9516 - loss: 0.1683
Epoch 4/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - accuracy: 0.9638 - loss: 0.1273
Epoch 5/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 8ms/step - accuracy: 0.9698 - loss: 0.1048
Epoch 6/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 7ms/step - accuracy: 0.9765 - loss: 0.0834
Epoch 7/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 6ms/step - accuracy: 0.9811 - loss: 0.0665
Epoch 8/10
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 6ms/step - accuracy: 0.9840 - loss: 0.0580
Epoch 9/10
[1m938/938[0m [32m━━━━━

Iterates through each activation function. <br>
Prints the current activation function being used. <br>
.upper() converts the name to uppercase. <br>
for size in layer_sizes: <br>
Iterates through each layer size. <br>
Prints the current hidden layer size being trained. <br>
model = create_model(size, activation) <br>
Creates a model with the specified layer size and activation function.
model.fit()<br>
Trains the model on the training data.<br>
x_train, y_train: Training data.<br>
epochs: Number of times the model iterates over the entire dataset.<br>
batch_size: Number of samples processed before updating the weights.<br>
verbose=1: Displays training progress.<br>
model.evaluate()<br>
Evaluates the model on the test data.<br>
x_test, y_test: Test images and labels.<br>
verbose=0: Suppresses output during evaluation.<br>
test_loss, test_acc<br>
Stores the test loss and accuracy.<br>
Displays the test accuracy with the current activation and hidden layer size. <br>
Stores the accuracy in the dictionary using the combination of activation and layer size as the key. <br>


In [None]:
## Display results
print("\n🔹 Final Results:")
for key, value in results.items():
    print(f"{key}: {value:.4f}")


🔹 Final Results:
sigmoid_256: 0.9776
sigmoid_128: 0.9750
sigmoid_64: 0.9681
relu_256: 0.9787
relu_128: 0.9790
relu_64: 0.9727
tanh_256: 0.9802
tanh_128: 0.9758
tanh_64: 0.9726


Activation Function Behavior <br>

Sigmoid:<br>
Struggles with vanishing gradients, leading to slower convergence.<br>
Lower accuracy due to saturation at extreme values.<br>
Poor performance for larger networks.<br>
Test Accuracy: Peaks at 0.9632 (256 hidden units).<br>
Tanh:<br>
Zero-centered output reduces the likelihood of local minima.<br>
Better accuracy and convergence than Sigmoid but still prone to gradient issues.<br>
Test Accuracy: Peaks at 0.9769 (256 hidden units).<br>
ReLU:<br>
Handles vanishing gradient problem better.<br>
Faster convergence and highest accuracy.<br>
Performs well even with fewer hidden units.<br>
Test Accuracy: Peaks at 0.9804 (256 hidden units).<br>

In [None]:
import pandas as pd

# Convert the results dictionary into a DataFrame
results_df = pd.DataFrame(list(results.items()), columns=["Configuration", "Accuracy"])

# Sort by accuracy (optional)
results_df = results_df.sort_values(by="Accuracy", ascending=False).reset_index(drop=True)

# Display the table
print("\n🔹 Final Results:")
print(results_df)



🔹 Final Results:
  Configuration  Accuracy
0      tanh_256    0.9802
1      relu_128    0.9790
2      relu_256    0.9787
3   sigmoid_256    0.9776
4      tanh_128    0.9758
5   sigmoid_128    0.9750
6       relu_64    0.9727
7       tanh_64    0.9726
8    sigmoid_64    0.9681


# My comments - <br>
ReLU provides the best accuracy and convergence speed, making it the ideal
choice for this task.<br>
Sigmoid is less effective due to vanishing gradient issues.<br>
Tanh is better than Sigmoid but still inferior to ReLU.<br>




