## Neural Network for MNIST Data Set  

#### Importing needed libraries

In [1]:
import tensorflow as tf
import numpy as np

In [2]:
tf.__version__

'2.10.0'

#### Get MNIST Dataset

Obtain the MNIST dataset from tensorflow_datasets library:

In [3]:
import tensorflow_datasets as tfds

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# load the dataset into the duo "mnist_dataset, mnist_info" by using the tfds.load method with "with_info=True" as an argument
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised = True)



"as_supervised": bool 
if True, the returned tf.data.Dataset will have a 2-tuple structure (input, label) according to builder.info.supervised_keys. If False, the default, the returned tf.data.Dataset will have a dictionary with all the features.

Dataset mnist downloaded and prepared to "ABSOLUTE PATH"\tensorflow_datasets\mnist\1.0.0. 
Subsequent calls will reuse this data.

In [5]:
mnist_train, mnist_test = mnist_dataset["train"], mnist_dataset["test"]
mnist_train

<_OptionsDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>

#### Validation Set Size

In [6]:
mnist_info.splits["train"].num_examples

60000

Take 5 percent of training data to be validation data:

In [7]:
num_validation_samples = tf.cast(0.05*mnist_info.splits["train"].num_examples, tf.int64)
num_validation_samples

<tf.Tensor: shape=(), dtype=int64, numpy=3000>

In [8]:
num_test_samples = tf.cast(mnist_info.splits["test"].num_examples, tf.int64)
num_test_samples

<tf.Tensor: shape=(), dtype=int64, numpy=10000>

#### Scaling the inputs 

Scale the data by 
1) defining the scale in a function, and then
2) mapping the new values that are scaled onto their non-scaled value, with the "map" method of "_OptionsDataset" class

In [9]:
def scale(image,label):
    image = tf.cast(image,tf.float32)
    image = float(image / 255)
    return image,label

In [10]:
scaled_train_and_validation_data = mnist_train.map(scale) 
scaled_test_data = mnist_test.map(scale)

#### Shuffling the data

In [11]:
help(mnist_train.shuffle)

Help on method shuffle in module tensorflow.python.data.ops.dataset_ops:

shuffle(buffer_size, seed=None, reshuffle_each_iteration=None, name=None) method of tensorflow.python.data.ops.dataset_ops._OptionsDataset instance
    Randomly shuffles the elements of this dataset.
    
    This dataset fills a buffer with `buffer_size` elements, then randomly
    samples elements from this buffer, replacing the selected elements with new
    elements. For perfect shuffling, a buffer size greater than or equal to the
    full size of the dataset is required.
    
    For instance, if your dataset contains 10,000 elements but `buffer_size` is
    set to 1,000, then `shuffle` will initially select a random element from
    only the first 1,000 elements in the buffer. Once an element is selected,
    its space in the buffer is replaced by the next (i.e. 1,001-st) element,
    maintaining the 1,000 element buffer.
    
    `reshuffle_each_iteration` controls whether the shuffle order should be
    

In [12]:
BUFFER_SIZE = 10000
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

#### Seperate validation and training datasets

In [13]:
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
training_data = shuffled_train_and_validation_data.skip(num_validation_samples)

#### Minibatch Gradient Descent 

In [14]:
BATCH_SIZE = 50

In [15]:
help(training_data.batch)

Help on method batch in module tensorflow.python.data.ops.dataset_ops:

batch(batch_size, drop_remainder=False, num_parallel_calls=None, deterministic=None, name=None) method of tensorflow.python.data.ops.dataset_ops.SkipDataset instance
    Combines consecutive elements of this dataset into batches.
    
    >>> dataset = tf.data.Dataset.range(8)
    >>> dataset = dataset.batch(3)
    >>> list(dataset.as_numpy_iterator())
    [array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]
    
    >>> dataset = tf.data.Dataset.range(8)
    >>> dataset = dataset.batch(3, drop_remainder=True)
    >>> list(dataset.as_numpy_iterator())
    [array([0, 1, 2]), array([3, 4, 5])]
    
    The components of the resulting element will have an additional outer
    dimension, which will be `batch_size` (or `N % batch_size` for the last
    element if `batch_size` does not divide the number of input elements `N`
    evenly and `drop_remainder` is `False`). If your program depends on the
    batches having the

In [16]:
training_data = training_data.batch(BATCH_SIZE)
# after training with each batch, we will calculate loss with the validation dataset
# hence, the validation dataset doesn't need to be split into batches, as we will use all of it to just forward-feed and 
# calculate total loss over it
validation_data = validation_data.batch(num_validation_samples) # still, the model expects all the data to be in batch form
test_data = scaled_test_data.batch(num_test_samples)
# after we see an increase in the total loss over the validation set, we will stop and test the model 

The validation data must have the same shape and object properties as the train and test data.

The MNIST data is iterable and in 2-tuple format, because we loaded the data (with .load method of tfds) with 
"as_supervised=True".

Hence,

In [17]:
validation_inputs, validation_targets = next(iter(validation_data)) 
# iter turns the data to iterable form
# next function loads the next element of an iterable object 

#### Model Formation

We have 28-by-28 images of numbers from 0 to 1, which were scaled from the original values of 0 to 255.
We are estimating which number each image shows, and there are 10 possible numbers that can come out. 

Hence,

In [18]:
input_size = 28*28
output_size = 10

Now we set up the layers of the model. The lecturer tells us to put two hidden layers, and both shall contain 50 nodes.

The lecturer also tells that this choice is definitely suboptimal, and gives a homework to vary these hyperparameters for more accuracy.

In [19]:
hidden_layer_size = 150

Now the time to set the layers.

In [20]:
model = tf.keras.Sequential([
                            tf.keras.layers.Flatten(input_shape=(28,28,1)), # the input layer
                            tf.keras.layers.Dense(hidden_layer_size, activation="relu"), # first hidden layer
                            tf.keras.layers.Dense(hidden_layer_size, activation="relu"), # second hidden layer
                            tf.keras.layers.Dense(output_size, activation="softmax") # the output layer
                            ])

#### Choosing the optimizer and loss functions

In [21]:
help(model.compile)

Help on method compile in module keras.engine.training:

compile(optimizer='rmsprop', loss=None, metrics=None, loss_weights=None, weighted_metrics=None, run_eagerly=None, steps_per_execution=None, jit_compile=None, **kwargs) method of keras.engine.sequential.Sequential instance
    Configures the model for training.
    
    Example:
    
    ```python
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
                  loss=tf.keras.losses.BinaryCrossentropy(),
                  metrics=[tf.keras.metrics.BinaryAccuracy(),
                           tf.keras.metrics.FalseNegatives()])
    ```
    
    Args:
        optimizer: String (name of optimizer) or optimizer instance. See
          `tf.keras.optimizers`.
        loss: Loss function. May be a string (name of loss function), or
          a `tf.keras.losses.Loss` instance. See `tf.keras.losses`. A loss
          function is any callable with the signature `loss = fn(y_true,
          y_pred)`, where `y_true` 

For loss function, we choose "sparse_categorical_crossentropy" because:

1) Binary crossentropy is for cases when categories are base 2 numbers, which is not valid for the outputs here.
2) Categorical crossentropy expects the outputs to be one-hot encoded, and our outputs are not one-hot encoded.
3) Sparse Categorical crossentropy applies one-hot encoding to the outputs.

Additionally, we can put in types of metrics to be calculated into the compile method.

In [22]:
model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics=["accuracy"])

#### Training the model

In [23]:
NUM_EPOCHS = 8

model.fit(training_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs,validation_targets), verbose=2)

Epoch 1/8
1140/1140 - 5s - loss: 0.2448 - accuracy: 0.9282 - val_loss: 0.1499 - val_accuracy: 0.9553 - 5s/epoch - 4ms/step
Epoch 2/8
1140/1140 - 3s - loss: 0.1001 - accuracy: 0.9692 - val_loss: 0.0980 - val_accuracy: 0.9717 - 3s/epoch - 3ms/step
Epoch 3/8
1140/1140 - 3s - loss: 0.0680 - accuracy: 0.9789 - val_loss: 0.0801 - val_accuracy: 0.9750 - 3s/epoch - 3ms/step
Epoch 4/8
1140/1140 - 3s - loss: 0.0499 - accuracy: 0.9847 - val_loss: 0.0546 - val_accuracy: 0.9793 - 3s/epoch - 3ms/step
Epoch 5/8
1140/1140 - 3s - loss: 0.0409 - accuracy: 0.9871 - val_loss: 0.0462 - val_accuracy: 0.9847 - 3s/epoch - 3ms/step
Epoch 6/8
1140/1140 - 3s - loss: 0.0330 - accuracy: 0.9892 - val_loss: 0.0411 - val_accuracy: 0.9853 - 3s/epoch - 3ms/step
Epoch 7/8
1140/1140 - 3s - loss: 0.0277 - accuracy: 0.9905 - val_loss: 0.0319 - val_accuracy: 0.9877 - 3s/epoch - 3ms/step
Epoch 8/8
1140/1140 - 3s - loss: 0.0219 - accuracy: 0.9924 - val_loss: 0.0296 - val_accuracy: 0.9907 - 3s/epoch - 3ms/step


<keras.callbacks.History at 0x22bee910940>

#### Testing the model

In [24]:
test_loss, test_accuracy = model.evaluate(test_data)



In [25]:
print("Test loss: {0:.2f}, Test accuracy: {1:.2f}%".format(test_loss,test_accuracy*100))

Test loss: 0.08, Test accuracy: 97.93%
