# Saying Hello to MNIST Dataset

'Hello MNIST Dataset!👋'

In this notebook we are going to take a look at the MNIST dataset and build a simple model, as well as testing it by using TensorFlow!

### Importing relevant libraries

In [1]:
import tensorflow as tf
import tensorflow_datasets as tfds

### Loading the dataset from TFDS

In [2]:
# Loading the dataset
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

2022-04-21 17:12:50.455509: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2022-04-21 17:12:50.455729: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.


In [3]:
ds_info

tfds.core.DatasetInfo(
    name='mnist',
    version=1.0.0,
    description='The MNIST database of handwritten digits.',
    urls=['https://storage.googleapis.com/cvdf-datasets/mnist/'],
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    total_num_examples=70000,
    splits={
        'test': 10000,
        'train': 60000,
    },
    supervised_keys=('image', 'label'),
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann. lecun. com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
    redistribution_info=,
)

### Scaling and shuffling the data

In [4]:
# A validation data is not provided by TFDS
# To split the ds_train to test & validation we need the sample size,
# so we can skip/take the data into ds_test & ds_valid
num_train_samples = ds_info.splits['train'].num_examples
num_train_samples = tf.cast(num_train_samples, tf.int64)

num_valid_samples = 0.1 * ds_info.splits['train'].num_examples
num_valid_samples = tf.cast(num_valid_samples, tf.int64)

num_test_samples = ds_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

In [5]:
# We need to normalize the data because TFDS provide images of type tf.uint8, while the model expects tf.float32
def scale(image, label):
    image = tf.cast(image, tf.float32)
    # Since there are 256 different shades of grey, we divide the pixel values by 255 to get the desired results.
    # All elements are going to be between 0 and 1.
    image /= 255.
    return image, label

In [6]:
# Scaling and shuffling
BUFFER_SIZE = 10000
ds_train_and_valid = ds_train.map(scale)
ds_train_and_valid = ds_train_and_valid.shuffle(buffer_size=BUFFER_SIZE)
# uniformly shuffling casue buffer_size = number of samples

# Assigning validation and train data
ds_valid = ds_train_and_valid.take(num_valid_samples)
ds_train = ds_train_and_valid.skip(num_valid_samples)

# Batching
BATCH_SIZE = 100
ds_train = ds_train.batch(BATCH_SIZE)
ds_valid = ds_valid.batch(num_valid_samples)

# Scaling and batching the test data aswell
ds_test = ds_test.map(scale)
ds_test = ds_test.batch(BATCH_SIZE)

# Taking the only batch
validation_inputs, validation_targets = next(iter(ds_valid))

### Building the model

In [7]:
input_size = 784 # We had (28,28,1) tensors because the images have 28x28 pixel ratio -> 28x28 = 784 input size.
output_size = 10 # For each digit, 0 to 9, we have one output node -> 10.
hidden_layer_size = 50

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

# Choosing the optimizer and the loss function
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

### Training the model

In [8]:
NUM_EPOCHS = 6

model.fit(ds_train, 
    epochs=NUM_EPOCHS, 
    validation_data=(validation_inputs, validation_targets), 
    verbose =2,
    validation_steps=10
)

Epoch 1/6


2022-04-21 17:14:02.997763: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]


540/540 - 60s - loss: 1.6305 - sparse_categorical_accuracy: 0.8594 - val_loss: 0.0000e+00 - val_sparse_categorical_accuracy: 0.0000e+00
Epoch 2/6
540/540 - 60s - loss: 1.5344 - sparse_categorical_accuracy: 0.9343 - val_loss: 1.5232 - val_sparse_categorical_accuracy: 0.9430


2022-04-21 17:15:03.426729: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]


Epoch 3/6
540/540 - 58s - loss: 1.5201 - sparse_categorical_accuracy: 0.9458 - val_loss: 1.5178 - val_sparse_categorical_accuracy: 0.9480


2022-04-21 17:16:01.462847: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]


Epoch 4/6
540/540 - 61s - loss: 1.5130 - sparse_categorical_accuracy: 0.9521 - val_loss: 1.5110 - val_sparse_categorical_accuracy: 0.9540


2022-04-21 17:17:02.245350: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]


Epoch 5/6
540/540 - 60s - loss: 1.5071 - sparse_categorical_accuracy: 0.9575 - val_loss: 1.5076 - val_sparse_categorical_accuracy: 0.9568


2022-04-21 17:18:01.878229: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]


Epoch 6/6
540/540 - 60s - loss: 1.5023 - sparse_categorical_accuracy: 0.9615 - val_loss: 1.5041 - val_sparse_categorical_accuracy: 0.9597


2022-04-21 17:19:02.155529: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]


<tensorflow.python.keras.callbacks.History at 0x10f74f4d0>

For the validation data set we have 96% accuracy!

### Testing the model

In [10]:
test_loss, test_accuracy = model.evaluate(ds_test)



2022-04-21 17:22:21.970728: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]


In [11]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 1.51. Test accuracy: 95.54%


Unfortunately for the test data we got 95% accuracy this time. To improve this we can try to increase our hidden layer size, add more hidden layers or tinker around with the parameters.

Note that all of these methods **should be applied before** we test our model, i.e. in training step. Otherwise we will be overfitting to the test data, which defeats the whole purpose. Also we should be careful not to overfit to the validation data. In this example we can see that our model lost around 0.5% accuracy which is pretty neat nonetheless, considering the best results for MNIST dataset tops around 98% accuracy.