# MNIST Neural Network Training with TensorFlow

## Introduction

This notebook was created by [Jupyter AI](https://github.com/jupyterlab/jupyter-ai) with the following prompt:

> /generate a simple example of training a neural network on the MNIST digits dataset. 

## Summary

This Jupyter notebook provides a comprehensive, step-by-step guide to training a simple neural network on the MNIST digits dataset using Python and TensorFlow. It begins with importing necessary libraries such as TensorFlow and Keras, followed by loading and preprocessing the MNIST dataset, including normalization of images and other required steps. The guide then details how to build the neural network model using Keras' Sequential API, specifying layers, activation functions, and configurations. It includes instructions to compile the model by defining the optimizer, loss function, and performance metrics. The training section covers how to train the model using the training data, specifying parameters like the number of epochs, batch size, and any callbacks. After training, the notebook explains how to evaluate the model's performance using test data and print relevant metrics. Finally, it demonstrates how to use the trained model to make predictions on new data, including visualizing some predictions alongside their true labels.

## Load and Preprocess Data

In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist

2024-05-23 16:36:12.080613: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


In [3]:
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

In [4]:
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

In [5]:
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

In [6]:
print(f'x_train shape: {x_train.shape}')
print(f'y_train shape: {y_train.shape}')
print(f'x_test shape: {x_test.shape}')
print(f'y_test shape: {y_test.shape}')

x_train shape: (60000, 28, 28, 1)
y_train shape: (60000, 10)
x_test shape: (10000, 28, 28, 1)
y_test shape: (10000, 10)


## Build the Neural Network Model

In [7]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

In [8]:
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

  super().__init__(**kwargs)


In [9]:
model.compile(optimizer=Adam(), 
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

In [10]:
model.summary()

## Compile the Model

In [11]:
from tensorflow.keras.optimizers import Adam

In [12]:
optimizer = Adam(learning_rate=0.001)

In [13]:
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [14]:
model.summary()

## Train the Model

In [15]:
import tensorflow as tf
import matplotlib.pyplot as plt

Matplotlib is building the font cache; this may take a moment.


In [16]:
# Define the number of epochs and batch size
num_epochs = 10
batch_size = 32

In [17]:
# Define callbacks
callbacks = [
    tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
]

In [20]:
# Train the model
history = model.fit(
    x_train,
    y_train,
    epochs=num_epochs,
    batch_size=batch_size,
    validation_data=(x_test, y_test),
    callbacks=callbacks
)

Epoch 1/10


ValueError: Argument `output` must have rank (ndim) `target.ndim - 1`. Received: target.shape=(32, 10), output.shape=(32, 10)

In [None]:
# Plot training & validation accuracy and loss values
plt.figure(figsize=(12, 4))

In [None]:
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(loc='upper left')

In [None]:
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(loc='upper left')

In [None]:
plt.show()

## Evaluate the Model

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import load_model
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

In [None]:
# Load the MNIST dataset
(_, _), (x_test, y_test) = mnist.load_data()

In [None]:
# Preprocess the test data
x_test = x_test.astype('float32') / 255.0
x_test = x_test.reshape(-1, 28, 28, 1)

In [None]:
# Load the trained model
model = load_model('trained_mnist_model.h5')

In [None]:
# Evaluate the model on the test data
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=2)

In [None]:
# Print the evaluation metrics
print(f'Test loss: {test_loss}')
print(f'Test accuracy: {test_accuracy}')

In [None]:
# Predict the labels for the test data
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)

In [None]:
# Print classification report
print(classification_report(y_test, y_pred_classes))

In [None]:
# Print confusion matrix
print(confusion_matrix(y_test, y_pred_classes))

## Make Predictions

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
predictions = model.predict(x_test)

In [None]:
def plot_image(predictions_array, true_label, img):
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(img, cmap=plt.cm.binary)
    predicted_label = np.argmax(predictions_array)
    true_label_max = np.argmax(true_label)
    color = 'blue' if predicted_label == true_label_max else 'red'
    plt.xlabel(f"{class_names[predicted_label]} ({class_names[true_label_max]})", color=color)

In [None]:
def plot_value_array(predictions_array, true_label):
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array, color="#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(predictions_array)
    true_label_max = np.argmax(true_label)
    thisplot[predicted_label].set_color('red')
    thisplot[true_label_max].set_color('blue')

In [None]:
num_rows = 5
num_cols = 3
num_images = num_rows * num_cols
plt.figure(figsize=(2 * 2 * num_cols, 2 * num_rows))

In [None]:
for i in range(num_images):
    plt.subplot(num_rows, 2 * num_cols, 2 * i + 1)
    plot_image(predictions[i], y_test[i], x_test[i])
    plt.subplot(num_rows, 2 * num_cols, 2 * i + 2)
    plot_value_array(predictions[i], y_test[i])

In [None]:
plt.tight_layout()
plt.show()