# Tutorial 3 ANN

Step 1: Importing Necessary Libraries

First, import the required libraries for building and training the neural network. These include NumPy for numerical operations, Matplotlib for data visualization, and modules from TensorFlow/Keras to handle the dataset, model creation, and layers.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam

Step 2: Loading and Preprocessing the MNIST Dataset

This step involves loading the MNIST dataset and preparing it for the neural network. It consists of three parts: loading the data, normalizing the pixel values from a range of 0-255 to 0-1, and one-hot encoding the integer labels into a binary matrix format. For example, the label '3' is converted to

[0, 0, 0, 1, 0, 0, 0, 0, 0, 0].

In [None]:
# Load the MNIST dataset, splitting it into training and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values to the range [0, 1] for stable and efficient training
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode the labels (0-9) for categorical classification
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

Step 3: Building the Neural Network Model

We'll build a sequential neural network model. The architecture includes a
Flatten layer to convert the 2D image data into a 1D vector, two Dense hidden layers with a ReLU activation function, and a final Dense output layer with a softmax activation for multi-class classification. The model summary provides a breakdown of the layers, their output shapes, and the number of parameters.

In [None]:
# Create a sequential neural network model
model = Sequential([
    # Flatten the 28x28 pixel images into a 1D vector of 784 features
    Flatten(input_shape=(28, 28)),
    # A hidden layer with 128 neurons and ReLU activation function
    Dense(128, activation='relu'),
    # A second hidden layer with 64 neurons and ReLU activation
    Dense(64, activation='relu'),
    # The output layer with 10 neurons (one for each digit) and softmax activation for probabilities
    Dense(10, activation='softmax')
])

# Print the model summary to see the architecture and parameter count
model.summary()

Step 4: Compiling the Model

Before training, the model must be compiled. We specify the
optimizer, the loss function, and the metrics to monitor. The
Adam optimizer is chosen for its adaptive learning rate, categorical cross-entropy is used as the loss function, which is suitable for multi-class classification problems, and accuracy is used as the metric to track performance.

In [None]:
# Compile the model by specifying the optimizer, loss function, and evaluation metrics
model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Step 5: Training the Model

The model is now trained using the prepared training data. Training involves passing the data through the network for a specified number of
epochs (full passes through the training data), in small batches of a defined batch size. A portion of the training data is also set aside as a
validation split to monitor the model's performance on unseen data during training.

In [None]:
# Train the model on the training data
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Step 6: Evaluating the Model

After training, the model's performance is evaluated on the test dataset to see how well it generalizes to new, unseen data. The model.evaluate function returns the loss and accuracy on the test set.

In [None]:
# Evaluate the model on the test dataset
test_loss, test_accuracy = model.evaluate(x_test, y_test)
# Print the test accuracy with 4 decimal places
print(f"Test Accuracy: {test_accuracy:.4f}")

Step 7: Visualizing Training and Validation Performance

To understand the training process and identify potential issues like overfitting or underfitting, we visualize the training and validation accuracy and loss over the epochs. This is done by plotting the values stored in the
history object returned from the model.fit method.

In [None]:
# Create a figure with two subplots side-by-side
plt.figure(figsize=(12, 5))

# Plot training & validation accuracy values
plt.subplot(1, 2, 1) # Create the first subplot for accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.title('Model Accuracy') # Set the title of the plot
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend() # Display the legend
plt.grid() # Add a grid to the plot

# Plot training & validation loss values
plt.subplot(1, 2, 2) # Create the second subplot for loss
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Model Loss') # Set the title of the plot
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend() # Display the legend
plt.grid() # Add a grid to the plot

# Adjust subplots to give a tight layout and display the plots
plt.tight_layout()
plt.show()

Step 8: Making Predictions and Visualizing Results

Finally, we use the trained model to make predictions on the test set. The predicted labels are then compared with the true labels by visualizing a few sample images from the test set. We display each image along with its true and predicted labels to see how the model performed on individual examples.

In [None]:
# Make predictions on the test set using the trained model
predictions = model.predict(x_test)

# Display the first test image and its predicted label
plt.figure(figsize=(5, 5))
# Show the first test image in grayscale
plt.imshow(x_test[0], cmap='gray')
# Display the true and predicted labels
plt.title(f"True Label: {np.argmax(y_test[0])}, Predicted: {np.argmax(predictions[0])}")
plt.axis('off') # Hide the axis
plt.show()

# Display a grid of images with their true and predicted labels
num_images = 9 # Number of images to display
plt.figure(figsize=(10, 10))
for i in range(num_images):
    # Create a 3x3 grid of subplots
    plt.subplot(3, 3, i + 1)
    # Show each test image in grayscale
    plt.imshow(x_test[i], cmap='gray')
    # Display the true and predicted labels for each image
    plt.title(f"True: {np.argmax(y_test[i])}, Predicted: {np.argmax(predictions[i])}")
    plt.axis('off') # Hide the axis
plt.tight_layout() # Adjust subplots to fit into figure area
plt.show()

Task 1: Experiment with Different Architectures

To experiment with different neural network architectures, we can modify the number of layers and neurons or change the activation functions. The code below provides implementations for three different architectures and compares their performance against the original model.

In [None]:
#Task 1: Experiment with Different Architectures

# Load and preprocess the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

def create_and_train_model(layers, activations, title):
    """
    Creates, compiles, and trains a new model with the specified architecture.

    Args:
        layers (list): A list of neuron counts for each hidden layer.
        activations (list): A list of activation functions for each hidden layer.
        title (str): The title for the performance plot.
    """
    model = Sequential([Flatten(input_shape=(28, 28))])
    for num_neurons, activation_func in zip(layers, activations):
        model.add(Dense(num_neurons, activation=activation_func))
    model.add(Dense(10, activation='softmax'))

    model.compile(optimizer='Adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    print(f"\n--- Training Model: {title} ---")
    history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2, verbose=0)

    test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
    print(f"Test Accuracy: {test_accuracy:.4f}")

    # Plotting the performance
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='Train Accuracy')
    plt.plot(history.history['val_accuracy'], label='Val Accuracy')
    plt.title(f'Model Accuracy: {title}')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid()

    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Train Loss')
    plt.plot(history.history['val_loss'], label='Val Loss')
    plt.title(f'Model Loss: {title}')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid()

    plt.tight_layout()
    plt.show()

# Original Model (from the tutorial)
create_and_train_model(layers=[128, 64], activations=['relu', 'relu'], title="Original Model (128, 64 neurons)")

# Experiment 1: More neurons in each layer
create_and_train_model(layers=[256, 128], activations=['relu', 'relu'], title="More Neurons (256, 128 neurons)")

# Experiment 2: Deeper network (more layers)
create_and_train_model(layers=[128, 64, 32], activations=['relu', 'relu', 'relu'], title="Deeper Network (128, 64, 32 neurons)")

# Experiment 3: Different activation functions
create_and_train_model(layers=[128, 64], activations=['tanh', 'tanh'], title="Tanh Activation Function")

# Experiment 4: Deeper network with different activation functions
create_and_train_model(layers=[128, 64, 32], activations=['sigmoid', 'sigmoid', 'sigmoid'], title="Sigmoid Activation Function")

Based on the outputs and plots from the previous cell:

Original Model (128, 64 neurons): Achieved a test accuracy of 0.9761. The plots show good learning with some gap between training and validation accuracy/loss towards the end, suggesting slight overfitting.

  More Neurons (256, 128 neurons): Achieved a test accuracy of 0.9784. This model performed ***slightly better than the original***, likely due to the increased capacity. The plots show similar trends to the original model, perhaps with a bit more overfitting.

  Deeper Network (128, 64, 32 neurons): Achieved the highest test accuracy at 0.9796. ***Adding another layer with fewer neurons seems to have improved performance slightly, suggesting the extra depth helped capture more complex patterns.*** The plots show a similar pattern of slight overfitting.

  Tanh Activation Function: Achieved a test accuracy of 0.9732. *Using Tanh instead of ReLU resulted in slightly lower accuracy compared to the ReLU models, indicating that ReLU might be a better choice for this specific task and architecture.* The plots show a slower initial learning phase.

  Sigmoid Activation Function: Achieved a test accuracy of 0.9743. Similar to Tanh, Sigmoid also resulted in slightly lower accuracy than the ReLU models. Sigmoid can suffer from the vanishing gradient problem, which might explain the slightly lower performance. The plots show a smoother but potentially slower learning curve compared to ReLU.

**In summary, for this specific MNIST classification task, increasing the number of neurons or adding another layer with ReLU activation slightly improved performance. Using Tanh or Sigmoid activation functions resulted in slightly lower accuracy compared to ReLU. The deeper network with ReLU achieved the best performance among the experimented architectures**

Task 2: Change the Optimizer

The optimizer is a crucial part of the training process, as it dictates how the model's weights are updated. Here we will use Stochastic Gradient Descent (SGD) and RMSprop as alternatives to the Adam optimizer to see how they impact the model's convergence and final accuracy.

In [None]:

from tensorflow.keras.optimizers import SGD, RMSprop

# Load and preprocess the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

def train_with_optimizer(optimizer, optimizer_name):
    """
    Creates, compiles, and trains a model with a specified optimizer.

    Args:
        optimizer: The Keras optimizer object to use.
        optimizer_name (str): The name of the optimizer for the plot titles.
    """
    model = Sequential([
        Flatten(input_shape=(28, 28)),
        Dense(128, activation='relu'),
        Dense(64, activation='relu'),
        Dense(10, activation='softmax')
    ])

    # Compile the model with the specified optimizer
    model.compile(optimizer=optimizer,
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    print(f"\n--- Training with {optimizer_name} Optimizer ---")
    history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2, verbose=0)

    test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
    print(f"Test Accuracy: {test_accuracy:.4f}")

    # Plotting the performance
    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='Train Accuracy')
    plt.plot(history.history['val_accuracy'], label='Val Accuracy')
    plt.title(f'Model Accuracy: {optimizer_name}')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid()

    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Train Loss')
    plt.plot(history.history['val_loss'], label='Val Loss')
    plt.title(f'Model Loss: {optimizer_name}')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid()

    plt.tight_layout()
    plt.show()

# Experiment 1: Adam Optimizer (Baseline)
train_with_optimizer(Adam(), "Adam")

# Experiment 2: SGD Optimizer
train_with_optimizer(SGD(), "SGD")

# Experiment 3: RMSprop Optimizer
train_with_optimizer(RMSprop(), "RMSprop")

Task 3: Solve Overfitting

Overfitting occurs when a model learns the training data too well, including its noise and outliers, which negatively impacts its performance on new, unseen data. We can mitigate this using Dropout regularization, a technique that randomly "drops out" neurons during training, forcing the network to learn more robust features.

In [None]:

from tensorflow.keras.layers import Dropout

# Load and preprocess the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Create a model with Dropout layers to combat overfitting
model_with_dropout = Sequential([
    # Flatten the 28x28 images to a 1D vector
    Flatten(input_shape=(28, 28)),
    # First hidden layer with 128 neurons and ReLU activation
    Dense(128, activation='relu'),
    # Dropout layer, dropping 50% of the neurons during training
    Dropout(0.5),
    # Second hidden layer with 64 neurons and ReLU activation
    Dense(64, activation='relu'),
    # Dropout layer, dropping 30% of the neurons
    Dropout(0.3),
    # Output layer with 10 neurons and softmax activation
    Dense(10, activation='softmax')
])

# Compile the new model
model_with_dropout.compile(optimizer='Adam',
                           loss='categorical_crossentropy',
                           metrics=['accuracy'])

# Train the model with Dropout
print("\n--- Training Model with Dropout for Overfitting Mitigation ---")
history_dropout = model_with_dropout.fit(x_train, y_train,
                                         epochs=20, # Increase epochs to see the effect of dropout
                                         batch_size=32,
                                         validation_split=0.2,
                                         verbose=0)

# Evaluate the model
test_loss_dropout, test_accuracy_dropout = model_with_dropout.evaluate(x_test, y_test, verbose=0)
print(f"Test Accuracy with Dropout: {test_accuracy_dropout:.4f}")

# Plotting the performance to visualize the effect of dropout
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history_dropout.history['accuracy'], label='Train Accuracy (with Dropout)')
plt.plot(history_dropout.history['val_accuracy'], label='Val Accuracy (with Dropout)')
plt.title('Model Accuracy with Dropout')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid()

plt.subplot(1, 2, 2)
plt.plot(history_dropout.history['loss'], label='Train Loss (with Dropout)')
plt.plot(history_dropout.history['val_loss'], label='Val Loss (with Dropout)')
plt.title('Model Loss with Dropout')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid()

plt.tight_layout()
plt.show()

Task 3: with Dropout and EarlyStopping

In [None]:

from tensorflow.keras.layers import Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Load and preprocess the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Create a model with Dropout layers to combat overfitting
model_with_dropout = Sequential([
    # Flatten the 28x28 images to a 1D vector
    Flatten(input_shape=(28, 28)),
    # First hidden layer with 128 neurons and ReLU activation
    Dense(128, activation='relu'),
    # Dropout layer, dropping 50% of the neurons during training
    Dropout(0.5),
    # Second hidden layer with 64 neurons and ReLU activation
    Dense(64, activation='relu'),
    # Dropout layer, dropping 30% of the neurons
    Dropout(0.3),
    # Output layer with 10 neurons and softmax activation
    Dense(10, activation='softmax')
])

# Compile the new model
model_with_dropout.compile(optimizer='Adam',
                           loss='categorical_crossentropy',
                           metrics=['accuracy'])

# Implement Early Stopping to prevent overfitting
# Monitor validation loss, and stop if it doesn't improve for 5 epochs
early_stopping = EarlyStopping(monitor='val_loss', patience=5, verbose=1)

# Train the model with both Dropout and Early Stopping
print("\n--- Training Model with Dropout and Early Stopping ---")
history_dropout = model_with_dropout.fit(x_train, y_train,
                                         epochs=50, # Set a high number of epochs; Early Stopping will halt training
                                         batch_size=32,
                                         validation_split=0.2,
                                         callbacks=[early_stopping], # Pass the early stopping callback
                                         verbose=1)

# Evaluate the final model
test_loss_dropout, test_accuracy_dropout = model_with_dropout.evaluate(x_test, y_test, verbose=0)
print(f"\nTest Accuracy with Dropout and Early Stopping: {test_accuracy_dropout:.4f}")

# Plotting the performance to visualize the effect of regularization
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history_dropout.history['accuracy'], label='Train Accuracy (with Dropout)')
plt.plot(history_dropout.history['val_accuracy'], label='Val Accuracy (with Dropout)')
plt.title('Model Accuracy with Dropout')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid()

plt.subplot(1, 2, 2)
plt.plot(history_dropout.history['loss'], label='Train Loss (with Dropout)')
plt.plot(history_dropout.history['val_loss'], label='Val Loss (with Dropout)')
plt.title('Model Loss with Dropout')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid()

plt.tight_layout()
plt.show()

Looking at the plots from the last two coding blocks:

Model with Dropout, 20 epochs:

 Overfitting/Underfitting: You can observe a gap between the training accuracy and validation accuracy, and between the training loss and validation loss, especially in the later epochs. The training accuracy continues to increase while the validation accuracy plateaus or even slightly decreases. Similarly, the training loss continues to decrease while the validation loss plateaus or increases. This indicates that the model is overfitting to the training data.
 Effect of Epochs: With 20 epochs, the model has had enough time to learn the training data extensively, leading to overfitting. The validation performance starts to degrade after a certain number of epochs where the model is no longer generalizing well to unseen data.

Model with Dropout and Early Stopping, up to 50 epochs:

  Overfitting/Underfitting: Compared to the previous plot, the gap between the training and validation curves is smaller. The validation loss also stops decreasing and starts to increase, which is the point where early stopping intervenes. This indicates that the model is still showing some signs of overfitting, but Early Stopping has helped to mitigate it by stopping the training before severe overfitting occurs.
  Effect of Epochs: By setting a high number of epochs (50) and using Early Stopping, we allow the model to train until the validation loss stops improving for a specified number of epochs (patience=5). This prevents the model from training for too long and overfitting excessively. The training stops at Epoch 17, indicating that further training would likely lead to worse performance on unseen data.

In summary, the plots clearly show that without regularization and early stopping, the model tends to overfit. The introduction of Dropout helps to reduce overfitting, and Early Stopping further prevents it by stopping the training process at the optimal point based on the validation performance.