# **Let's use TensorFlow to Solve a common Classification problem with the MNIST DataSet a.k.a "The Hello World! of Machine Learning".**

> The MNIST DataSet is an already Pre-Processed DataSet that contains handwritten digits (from 1 to 10) as 28x28 Pixels Matrices, where each pixel represent the density of the color Black, from 0 - Completely Dark to 255 - Completely White.
> We'll teach the Machine, through Supervised Learning, how to correctly classify the handwritten digits, or in other terms, to understand which number it's in front of, based on the color (Black) density of each of the 784 Pixels.

* **Importing the Required Libraries.**

In [30]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

* **Importing the DataSet**

In [31]:
MnistData, MnistInfo = tfds.load(name = "mnist", with_info = True, as_supervised = True)

In [32]:
MnistTrain, MnistTest = MnistData["train"], MnistData["test"]

In [33]:
#Calculating the Validation Size
ValidationSamples = tf.cast(0.1*MnistInfo.splits["train"].num_examples, tf.int64)

#Storing the Test Size into a Variable
TestSamples = tf.cast(MnistInfo.splits["test"].num_examples, tf.int64)

* **Let's now Define a Function to Scale the Data in a Numerical Stable way (from 0 to 1).**

In [34]:
def ScaleData(Image, Label):
    #Casting the Image into Float Type
    Image = tf.cast(Image, tf.float32)
    
    #Scaling the Image and keeping it as a Float (the . after the Division keeps it float)
    Image /= 255.
    
    return Image, Label

* **Scaling Training (and Validation) Data as well as the Test Data**

In [35]:
ScaledTrain = MnistTrain.map(ScaleData)
ScaledTest = MnistTest.map(ScaleData)

* **Let's now Shuffle the Data and Split the Train DataSet into Train and Validation Data**

In [36]:
#Initializing the Buffer, which is going to be used to set the Sample Size that is going to be Shuffled Each Time, since with enormous DataSet we won't be able to Shuffle them all at once.
#NOTE: 
#ShuffleSize = BufferSize = Buffer
#ShuffleSize = 1 -> No Shuffling actually happening.
#ShuffleSize >= TotalSampleSize -> Shuffling will take place at Once and Uiformly.
#1 < ShuffleSize < TotalSampleSize -> Shuffling will in Different Batches and will Optimize the Computational Power of the Machine.

Buffer = 10000

ShuffledTrain = ScaledTrain.shuffle(Buffer)

#Kinda Pointless to Shuffle the Test Set, but Whatever...
ShuffledTest = ScaledTest.shuffle(Buffer)

In [37]:
#Extracting the Validation DataSet and Re-Creating the Train Data without the Validation Data Points.

ShuffledValidation = ShuffledTrain.take(ValidationSamples)
ShuffledTrainOnly = ShuffledTrain.skip(ValidationSamples)

* **Batching the DataSet for the Mini-Batch Stochasting Gradient Descent**

In [38]:
#NOTE:
#BatchSize = 1 -> Stochastic Gradient Descent
#BatchSize = TotalSampleSize -> (SingleBatch) Gradient Descent
# 1 < BatchSize < TotalSampleSize -> MiniBatch Gradient Descent

BatchSize = 100

#Batching
BatchedTrain = ShuffledTrainOnly.batch(BatchSize)

#Since we'll only Forward-Propagate on the Validation and Test Sets, we want them to not be Batched or BatchSize = TotalSamples.
#So, Since the Model will want the Validation Set in also Batch Format, we need to Batch it with its TotalSampleSize as BatchSize.
BatchedValidation = ShuffledValidation.batch(ValidationSamples)

#The Same Applies for the Test Data:
BatchedTest = ShuffledTest.batch(TestSamples)

In [39]:
#The Validation Data must have the same Shape and Properties as the Train and Test Data.
#The Mnist Data is an Iterable and in 2-Tuples Format since we set as_supervised = True.
#Therefore we must Extract and Convert the Inputs and Targets Accordingly.

ValidationInputs, ValidationTargets = next(iter(BatchedValidation))

#iter is used to make the Validation Data an Iterator, and next is used to load the next batch into the Iterable.
#Since there's only one Batch, it'll Load the Inputs and Targets.

2023-02-22 17:53:49.073570: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


* **Outlining the Model.**

In [40]:
InputSize = 784 #One Input Neuron for each Pixel.
OutputSize = 10 #Ten, since they are our Number of Different Ouputs (Numbers from 1 to 10).
HiddenLayersSize = 50 #Size of the Hidden Layers, we will have 2 and assume they'll have same Size.

#Defining the Model.
Model = tf.keras.Sequential([
                            #This will Flatten into a Vector our Input Tensor of Size (28, 28, 1).
                            tf.keras.layers.Flatten(input_shape = (28, 28, 1)),
                            #Dense Takes the Inputs and Calculates the Dot Product of the Inputs and Weights and adds the Bias.
                            #This is where we can apply the Activation Function.
                            #The Process is done Twice since we will have 2 Hidden Layers of the same Size with the same Activation Function.
                            tf.keras.layers.Dense(HiddenLayersSize, activation = "relu"),
                            tf.keras.layers.Dense(HiddenLayersSize, activation = "relu"),
                            #Defining the Output Layer that will use SoftMax Activation Function and will have size 10.
                            #SoftMax is used to turn Values into Probabilities.
                            tf.keras.layers.Dense(OutputSize, activation = "softmax"),
                            ])

#Setting the Optimizer and the Loss Function.
#We'll use the SparseCategoricalCrossEntropy Loss Function, since it applies One Hot Encoding to our Outputs.
Model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"])

* **Training the Model.**

In [41]:
#Setting the Number of Epochs.
Epochs = 5

#Training the Model.
Model.fit(BatchedTrain, epochs = Epochs, validation_data = (ValidationInputs, ValidationTargets), verbose = 2)

#1) At the Beginning of Each Epoch, the Training Loss will be set to 0.
#2) The Algorithm will iterate over the preset number of Batches from TrainData.
#3) The Weights and Biases will be Updated as many times as there are Batches.
#4) We will get a Value for the Loss Function, Indicating how the Training is going.
#5) We will also see the Training Accuracy thanks to the Verbose.
#6) At the End of each Epoch the Algorithm will Forward Propagate the whole Validation Set.
#7) When we'll reach the Maximum number of Epochs the Training will be Over.

Epoch 1/5
540/540 - 5s - loss: 0.4245 - accuracy: 0.8800 - val_loss: 0.2337 - val_accuracy: 0.9327 - 5s/epoch - 9ms/step
Epoch 2/5
540/540 - 3s - loss: 0.1949 - accuracy: 0.9438 - val_loss: 0.1627 - val_accuracy: 0.9533 - 3s/epoch - 6ms/step
Epoch 3/5
540/540 - 3s - loss: 0.1467 - accuracy: 0.9577 - val_loss: 0.1348 - val_accuracy: 0.9622 - 3s/epoch - 6ms/step
Epoch 4/5
540/540 - 3s - loss: 0.1206 - accuracy: 0.9651 - val_loss: 0.1107 - val_accuracy: 0.9690 - 3s/epoch - 6ms/step
Epoch 5/5
540/540 - 3s - loss: 0.0982 - accuracy: 0.9710 - val_loss: 0.1059 - val_accuracy: 0.9678 - 3s/epoch - 6ms/step


<keras.callbacks.History at 0x7fce04700490>

You can pay Around the the Hidden Layers' Sizes and look at the Training Results to see if you can Maximize Accuracy.

You can also, by doing this, trying to force OverFitting and try to spot it by looking at the Training Results (When both Training and Validation Losses are Decreasing, but all of the Sudden the Validation Loss starts Increasing).

You can, in Short:
- Play around with the Width of the Algorithm (Layer Size) in any way you want.
- Play around with the Depth of the Algorithm (Number of Layers) in any way you want.
- Try out different Activation Functions for each Hidden Layer.
- Tweak the Batch Size from 1 to 10000.
- Adjust the Learning Rate from as low as 0.0001.

* **Testing the Model.**

In [42]:
#We'll Test our Model with the Test Data with the Evaluate Method, which will return the Loss and Metrics for the Model in "Test Mode".
TestLoss, TestAccuracy = Model.evaluate(BatchedTest)



In [43]:
#Accuracy and Loss we are Expecting for Real World Scenarios.
print(f"Test Loss: {TestLoss} \nTestAccuracy: {TestAccuracy*100}%")

Test Loss: 0.12049306929111481 
TestAccuracy: 96.46000266075134%
