Training a machine learning model using a prebuilt dataset using the [Keras](https://) Api

## Setting up the TensorFlow


In [25]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.18.0


##Loading the Dataset

Here I am using MNIST dataset. The pixel values of the images range from 0 through 255. I will scale these values to a range of 0 to 1 by dividing the values by `255.0.` This will also convert the data from integers to floating-point numbers.

In [26]:
from re import X
mnist = tf.keras.datasets.mnist
(X_train, y_train), (x_test, y_test) = mnist.load_data()
X_train, x_test = X_train / 255.0, x_test / 255.0

##Building a machine learning model

I will build the ML model using a [tf.keras.Sequential](https://) model. This model uses [Flatten](https://), [Dense](https://), and [Dropout](https://) layers. Layers are functions with a known mathematical structure that can be reused and have trainable variables.

In [27]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28)),
    tf.keras.layers.Dense(128, activation='relu'), #softmax function does not provide an exact and numerically stable loss calculation for all models when used.
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

  super().__init__(**kwargs)


The model returns a vector of [logits](https://) or [log-odds](https://) scores, one for each class

In [28]:
predictions = model(X_train[:1]).numpy()

The [tf.nn.softmax ](https://)function converts these logits to *probabilities* for each class.

In [29]:
tf.nn.softmax(predictions).numpy()

array([[0.09641203, 0.09046596, 0.14854874, 0.12733117, 0.12578098,
        0.1025115 , 0.09339912, 0.06263559, 0.08707506, 0.06583982]],
      dtype=float32)

##Defining a loss function

Here we use [losses.SparseCategoricalCrossentropy](https://)

The loss function takes a vector of ground truth values and a vector of logits and returns a scalar loss for each example. This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3.`

In [30]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

In [31]:
loss_fn(y_train[:1], predictions).numpy()

np.float32(2.2777803)

##Compiling the model

In [32]:
model.compile(optimizer='adam',
               loss=loss_fn,
               metrics=['accuracy']
               )

##Training and Evaluating the Model

I will use [Model.fit](https://) method to adjust the model parameters and minimize the loss

In [33]:
model.fit(X_train, y_train, epochs=10)

Epoch 1/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step - accuracy: 0.8574 - loss: 0.4844
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 4ms/step - accuracy: 0.9561 - loss: 0.1488
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9660 - loss: 0.1128
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 4ms/step - accuracy: 0.9744 - loss: 0.0823
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 4ms/step - accuracy: 0.9776 - loss: 0.0714
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 4ms/step - accuracy: 0.9797 - loss: 0.0638
Epoch 7/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 4ms/step - accuracy: 0.9825 - loss: 0.0534
Epoch 8/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 4ms/step - accuracy: 0.9838 - loss: 0.0503
Epoch 9/10
[1m1875

<keras.src.callbacks.history.History at 0x7a8673f5c950>

Using the [Model.evaluate](https://) method to check the performance of [validation set](https://) or [test set](https://)

In [37]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 1s - 2ms/step - accuracy: 0.9795 - loss: 0.0680


[0.06801829487085342, 0.9794999957084656]

The image classifier is now trained to ~98% accuracy of the mnist dataset

Wrapping the trained model to return a probability

In [35]:
probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])


In [36]:
probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[2.39939091e-08, 3.53864771e-10, 6.87150646e-07, 4.17984047e-05,
        1.68268330e-14, 8.70001937e-09, 2.28211137e-16, 9.99955773e-01,
        2.44033878e-08, 1.65862446e-06],
       [2.60864610e-08, 3.88052331e-06, 9.99996066e-01, 1.04636086e-08,
        1.00581740e-18, 5.56206636e-10, 7.53880680e-10, 3.86680202e-16,
        2.65631361e-09, 5.33507672e-18],
       [5.10079268e-08, 9.99433458e-01, 4.51573942e-06, 8.08093887e-07,
        6.34486241e-06, 9.73586111e-07, 6.85158229e-05, 3.84592859e-04,
        1.00558471e-04, 1.04725366e-07],
       [9.99989986e-01, 6.03072314e-11, 9.65392246e-07, 3.72265135e-10,
        6.59828117e-07, 7.13783891e-07, 7.51251537e-06, 5.07167819e-08,
        7.02665648e-10, 1.62070677e-07],
       [1.09717774e-07, 6.31773822e-14, 6.74385303e-09, 6.14131690e-10,
        9.98772323e-01, 1.07499731e-09, 2.64892464e-07, 5.70834300e-06,
        7.42102269e-10, 1.22158695e-03]], dtype=float32)>