Beginner MNIST Neural Net Project

Goal: To train a Deep Neural Network on the MNIST handwritten image dataset with above 98% accuracy.



## Load a recent version of TensorFlow

In [0]:
# Install TensorFlow using Colab's tensorflow_version command
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

## Import Libraries

Import TensorFlow, Numpy, Matplotlib libraries.

Also import the TensorFlow datasets library so we can use the MNIST dataset.


In [28]:
# Import libraries
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib as plt

# Import TensorFlow Datasets
import tensorflow_datasets as tfds
tfds.disable_progress_bar()

# Print version of TensorFlow currently imported
print(tf.__version__)

2.2.0


Problem: Human handwriting is imperfect compared to computerized text. Train a model to correctly predict the English characters within a set of handwritten images.

## Load MNIST
Load with the following arguments:


*   shuffle_files: The MNIST data is only stored in a single file, but for larger datasets with multiple files on disk, it's good practice to shuffle them when training.
*   as_supervised: Returns tuple (img, label) instead of dict {'image': img, 'label': label}

In [13]:
# Load the mnist dataset using tdfs
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
    try_gcs=True
)

# Output one example to verify dataset has been imported
print(ds_train)

<DatasetV1Adapter shapes: ((28, 28, 1), ()), types: (tf.uint8, tf.int64)>


## Next up we build the training pipeline.

We apply the following transormations to the dataset.


*   ds.map: TFDS provides the images as tf.uint8 encoding, while the model expects tf.float32, so normalize the images to float32.
*   ds.cache As the dataset fits into memory, cache it before shuffling for better performance. Note: Random transformations should be applied after caching.
*   ds.shuffle: For true randomness, set the shuffle buffer to the full size of the dataset.
Note: For bigger datasets which do not fit in memory, a standard value is 1000 if your system allows it.
*   ds.batch: Batch after shuffling to get unique batches at each epoch.
*   ds.prefetch: It is good practice to end the pipeline by prefetching for performance.



In [0]:
def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

## Build evaluation pipeline
The Testing pipeline is similar to the training pipeline, with small tweaks:


*   There is no ds.shuffle() call
*   Caching is done after batching (as batches can be the same between epoch)


In [0]:
ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

# Define the plotting curve
This will be used to graph model accuracy

In [25]:
#@title Define the plotting function
def plot_curve(epochs, hist, list_of_metrics):
  """Plot a curve of one or more classification metrics vs. epoch."""  
  # list_of_metrics should be one of the names shown in:
  # https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#define_the_model_and_metrics  

  plt.figure()
  plt.xlabel("Epoch")
  plt.ylabel("Value")

  for m in list_of_metrics:
    x = hist[m]
    plt.plot(epochs[1:], x[1:], label=m)

  plt.legend()

print("Loaded the plot_curve function.")

Loaded the plot_curve function.


## Build and train the model

Input the training pipeline built above into Keras and define model parameters

In [35]:
# These variables are the hyperparameters for the model.
learning_rate = 0.001
epochs = 10
batch_size = 4000


def create_model(learning_rate):

  # Use an input shape of 28x28 because the images in MNIST are 28x28 pixels
  # reLU has been effective in most use cases including this one, add a relatively dense layer with 128 neurons
  # Use softmax activation function because we want a probability distribution

  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
  ])

  # Compile model with SCC loss function, Adam optimizer and Accuracy metric
  # SCC is used when you have two or more label classes
  # Adam optimizer uses Stochastic Gradient Descent and is efficient when dealing with large datasets
  # Accuracy metric is used to calculate how often predictions from the model are equal to the real labels

  model.compile(
      optimizer=tf.keras.optimizers.Adam(learning_rate),
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'],
  )

  return model

def train_model(model, epochs, batch_size):


  # Fit the model using the training split, then validate using the test split
  history = model.fit(
      ds_train,
      batch_size=batch_size,
      epochs=epochs,
      validation_data=ds_test
  )

  # To track the progress of training, take a snapshot of the model metrics at each epoch and add it to a dataframe.
  epochs = history.epoch
  hist = pd.DataFrame(history.history)

  # return epochs, hist 



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [38]:
# Plot the collected metrics over time
list_of_metrics_to_plot = ['accuracy']
plot_curve(epochs, hist list_of_metrics_to_plot)


# model.evaluate(x=x_test_normalized, y=y_test, batch_size=batch_size)

TypeError: ignored