##**Deep Neural Networks for MNIST Classification**

Pipeline ▶
  - Prepare/Preprocess Data ▶ Split Data ▶ Train, Validate, Test.
  - Outline model, choose activation functions.
  - Set the appropriate advanced optimizers and loss function.
  - Make it learn/Train.
  - Test accuracy of the model.

In [1]:
# import relevant packages
import numpy as np
import pandas as pd
import tensorflow as tf

import tensorflow_datasets as tfds

### Acquire Data

In [2]:
# load dataset 
# as_supervised, loads into two tuple structure [input, target]
# with_info, provides tuple with info about version, features, number of samples
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

[1mDownloading and preparing dataset mnist/3.0.1 (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /root/tensorflow_datasets/mnist/3.0.1...[0m


local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead pass
`try_gcs=True` to `tfds.load` or set `data_dir=gs://tfds-data/datasets`.



Dl Completed...:   0%|          | 0/4 [00:00<?, ? file/s]


[1mDataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m


In [3]:
# look at dataset
mnist_dataset

{'test': <PrefetchDataset shapes: ((28, 28, 1), ()), types: (tf.uint8, tf.int64)>,
 'train': <PrefetchDataset shapes: ((28, 28, 1), ()), types: (tf.uint8, tf.int64)>}

In [4]:
# get info
mnist_info

tfds.core.DatasetInfo(
    name='mnist',
    version=3.0.1,
    description='The MNIST database of handwritten digits.',
    homepage='http://yann.lecun.com/exdb/mnist/',
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    total_num_examples=70000,
    splits={
        'test': 10000,
        'train': 60000,
    },
    supervised_keys=('image', 'label'),
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
    redistribution_info=,
)

### Prepare Data

In [5]:
mnist_train = mnist_dataset['train']
mnist_test = mnist_dataset['test']

In [6]:
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples

# convert num_validation_samples into a tensorflow integers
num_validation_samples = tf.cast(num_validation_samples, tf.int64) 

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64) 

In [7]:
# create a function to scale our data

# the function below replicates a tensorflow function dataset.map(*function*)

# image and label are both inputs
def scale(image, label):
  '''
  This function takes in an image and label, converts them to floats, scales them on a scale from 0 to 1 
  by dividing by 255 (0 to 256, the # of shades of gray)
  '''
  image = tf.cast(image, tf.float32)
  image /= 255. # signifies we want image to be float

  return image, label


In [8]:
scaled_train_and_validation_data = mnist_train.map(scale)

scaled_test_data = mnist_test.map(scale)

**Shuffle Data**

In [9]:
# shuffle data for algorithm
# shuffle 10_000 values at once
BUFFER_SIZE = 10_000


shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

**Batching**
- Why? To increment performance and decrease memory usage

In [10]:
BATCH_SIZE = 100

# assign batch size for backpropagation, also creates a column in the tensor 
train_data = train_data.batch(BATCH_SIZE)

# assign validate and test data as one whole batch, keeping dimensionality the same
validation_data = validation_data.batch(num_validation_samples)
test_data = scaled_test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

### Model

Outline Model

In [11]:
# dimension of data
input_size = 28*28*1
# number of targets (0 to 9)
output_size = 10
# width of nn, number of neurons
hidden_layer_size = 50

model_1 = tf.keras.Sequential([
                               # flatten into one dimensional layer
                               tf.keras.layers.Flatten(input_shape=(28,28,1)),

                               # assess first hidden layer, relu type of algorithm
                               tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

                               # assess second hidden layer
                               tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

                               # address final output layer, use softmax since later must transform data to probability
                               tf.keras.layers.Dense(output_size, activation='softmax')
                               ])

Choose Optimizer and Loss Function

In [12]:
# choose optimizer and loss
# sparse categorical crossentropy for data not yet one hot encoded (will apply one hot encoding for output)
model_1.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')

Training

In [13]:
# assign number of epochs
num_epochs = 5

# fit model
model_1.fit(train_data, epochs=num_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 8s - loss: 0.4134 - accuracy: 0.8834 - val_loss: 0.2097 - val_accuracy: 0.9390 - 8s/epoch - 15ms/step
Epoch 2/5
540/540 - 4s - loss: 0.1724 - accuracy: 0.9500 - val_loss: 0.1528 - val_accuracy: 0.9565 - 4s/epoch - 7ms/step
Epoch 3/5
540/540 - 4s - loss: 0.1317 - accuracy: 0.9608 - val_loss: 0.1270 - val_accuracy: 0.9635 - 4s/epoch - 7ms/step
Epoch 4/5
540/540 - 4s - loss: 0.1081 - accuracy: 0.9679 - val_loss: 0.1101 - val_accuracy: 0.9673 - 4s/epoch - 7ms/step
Epoch 5/5
540/540 - 4s - loss: 0.0923 - accuracy: 0.9722 - val_loss: 0.0949 - val_accuracy: 0.9715 - 4s/epoch - 7ms/step


<keras.callbacks.History at 0x7f3a9390c810>

**Run cell again for optimization**

In [14]:
# assign number of epochs
num_epochs = 5

# fit model
model_1.fit(train_data, epochs=num_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 4s - loss: 0.0789 - accuracy: 0.9766 - val_loss: 0.0943 - val_accuracy: 0.9743 - 4s/epoch - 7ms/step
Epoch 2/5
540/540 - 4s - loss: 0.0685 - accuracy: 0.9785 - val_loss: 0.0792 - val_accuracy: 0.9795 - 4s/epoch - 7ms/step
Epoch 3/5
540/540 - 4s - loss: 0.0598 - accuracy: 0.9815 - val_loss: 0.0875 - val_accuracy: 0.9750 - 4s/epoch - 7ms/step
Epoch 4/5
540/540 - 4s - loss: 0.0544 - accuracy: 0.9834 - val_loss: 0.0693 - val_accuracy: 0.9798 - 4s/epoch - 7ms/step
Epoch 5/5
540/540 - 4s - loss: 0.0487 - accuracy: 0.9851 - val_loss: 0.0696 - val_accuracy: 0.9802 - 4s/epoch - 7ms/step


<keras.callbacks.History at 0x7f3a9361d1d0>

In [15]:
# assign number of epochs
num_epochs = 10

# fit model
model_1.fit(train_data, epochs=num_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/10
540/540 - 4s - loss: 0.0427 - accuracy: 0.9870 - val_loss: 0.0595 - val_accuracy: 0.9825 - 4s/epoch - 7ms/step
Epoch 2/10
540/540 - 4s - loss: 0.0387 - accuracy: 0.9881 - val_loss: 0.0655 - val_accuracy: 0.9820 - 4s/epoch - 7ms/step
Epoch 3/10
540/540 - 4s - loss: 0.0355 - accuracy: 0.9892 - val_loss: 0.0556 - val_accuracy: 0.9828 - 4s/epoch - 7ms/step
Epoch 4/10
540/540 - 4s - loss: 0.0314 - accuracy: 0.9900 - val_loss: 0.0545 - val_accuracy: 0.9827 - 4s/epoch - 7ms/step
Epoch 5/10
540/540 - 4s - loss: 0.0288 - accuracy: 0.9910 - val_loss: 0.0469 - val_accuracy: 0.9855 - 4s/epoch - 7ms/step
Epoch 6/10
540/540 - 4s - loss: 0.0268 - accuracy: 0.9918 - val_loss: 0.0414 - val_accuracy: 0.9868 - 4s/epoch - 7ms/step
Epoch 7/10
540/540 - 4s - loss: 0.0226 - accuracy: 0.9933 - val_loss: 0.0350 - val_accuracy: 0.9888 - 4s/epoch - 7ms/step
Epoch 8/10
540/540 - 4s - loss: 0.0207 - accuracy: 0.9938 - val_loss: 0.0350 - val_accuracy: 0.9888 - 4s/epoch - 7ms/step
Epoch 9/10
540/540 - 4s 

<keras.callbacks.History at 0x7f3a935df710>

## Model 2
- Try with hidden layer size 100

In [19]:
# dimension of data
input_size = 28*28*1
# number of targets (0 to 9)
output_size = 10
# width of nn, number of neurons
hidden_layer_size = 100

model_2 = tf.keras.Sequential([
                               # flatten into one dimensional layer
                               tf.keras.layers.Flatten(input_shape=(28,28,1)),

                               # assess first hidden layer, relu type of algorithm
                               tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

                               # assess second hidden layer
                               tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

                               # address final output layer, use softmax since later must transform data to probability
                               tf.keras.layers.Dense(output_size, activation='softmax')
                               ])


# choose optimizer and loss
# sparse categorical crossentropy for data not yet one hot encoded (will apply one hot encoding for output)
model_2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')

In [None]:
# assign number of epochs
num_epochs = 65

# fit model
model_2.fit(train_data, epochs=num_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/65
540/540 - 5s - loss: 0.3351 - accuracy: 0.9047 - val_loss: 0.1696 - val_accuracy: 0.9488 - 5s/epoch - 9ms/step
Epoch 2/65
540/540 - 4s - loss: 0.1384 - accuracy: 0.9595 - val_loss: 0.1209 - val_accuracy: 0.9642 - 4s/epoch - 8ms/step
Epoch 3/65
540/540 - 4s - loss: 0.0964 - accuracy: 0.9711 - val_loss: 0.0925 - val_accuracy: 0.9717 - 4s/epoch - 7ms/step
Epoch 4/65
540/540 - 4s - loss: 0.0735 - accuracy: 0.9774 - val_loss: 0.0768 - val_accuracy: 0.9760 - 4s/epoch - 8ms/step
Epoch 5/65
540/540 - 4s - loss: 0.0578 - accuracy: 0.9823 - val_loss: 0.0697 - val_accuracy: 0.9788 - 4s/epoch - 7ms/step
Epoch 6/65
540/540 - 4s - loss: 0.0461 - accuracy: 0.9859 - val_loss: 0.0580 - val_accuracy: 0.9827 - 4s/epoch - 8ms/step
Epoch 7/65
540/540 - 4s - loss: 0.0385 - accuracy: 0.9882 - val_loss: 0.0513 - val_accuracy: 0.9848 - 4s/epoch - 8ms/step
Epoch 8/65
540/540 - 4s - loss: 0.0336 - accuracy: 0.9896 - val_loss: 0.0430 - val_accuracy: 0.9870 - 4s/epoch - 8ms/step
Epoch 9/65
540/540 - 4s 

## Model 2
- Hidden layer size of 200

In [23]:
# dimension of data
input_size = 28*28*1
# number of targets (0 to 9)
output_size = 10
# width of nn, number of neurons
hidden_layer_size = 200

model_2 = tf.keras.Sequential([
                               # flatten into one dimensional layer
                               tf.keras.layers.Flatten(input_shape=(28,28,1)),

                               # assess first hidden layer, relu type of algorithm
                               tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

                               # assess second hidden layer
                               tf.keras.layers.Dense(hidden_layer_size, activation='relu'),

                               # address final output layer, use softmax since later must transform data to probability
                               tf.keras.layers.Dense(output_size, activation='softmax')
                               ])


# choose optimizer and loss
# sparse categorical crossentropy for data not yet one hot encoded (will apply one hot encoding for output)
model_2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')

In [24]:
# assign number of epochs
num_epochs = 20

# fit model
model_2.fit(train_data, epochs=num_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/20
540/540 - 5s - loss: 0.2700 - accuracy: 0.9228 - val_loss: 0.1392 - val_accuracy: 0.9595 - 5s/epoch - 9ms/step
Epoch 2/20
540/540 - 5s - loss: 0.1015 - accuracy: 0.9693 - val_loss: 0.0977 - val_accuracy: 0.9702 - 5s/epoch - 9ms/step
Epoch 3/20
540/540 - 4s - loss: 0.0708 - accuracy: 0.9785 - val_loss: 0.0704 - val_accuracy: 0.9783 - 4s/epoch - 8ms/step
Epoch 4/20
540/540 - 4s - loss: 0.0523 - accuracy: 0.9835 - val_loss: 0.0576 - val_accuracy: 0.9827 - 4s/epoch - 8ms/step
Epoch 5/20
540/540 - 4s - loss: 0.0380 - accuracy: 0.9884 - val_loss: 0.0469 - val_accuracy: 0.9857 - 4s/epoch - 8ms/step
Epoch 6/20
540/540 - 4s - loss: 0.0329 - accuracy: 0.9895 - val_loss: 0.0397 - val_accuracy: 0.9895 - 4s/epoch - 8ms/step
Epoch 7/20
540/540 - 4s - loss: 0.0245 - accuracy: 0.9923 - val_loss: 0.0429 - val_accuracy: 0.9863 - 4s/epoch - 8ms/step
Epoch 8/20
540/540 - 5s - loss: 0.0222 - accuracy: 0.9929 - val_loss: 0.0299 - val_accuracy: 0.9902 - 5s/epoch - 8ms/step
Epoch 9/20
540/540 - 5s 

<keras.callbacks.History at 0x7f3a92a38750>