# TensorFlow 2 quicksart for beginners

This introduction workbook uses `Keras` to:
1. Load a prebuit dataset.
2. Build a neural network machine learning model that classifies images.
3. Train this neurl network.
4. Evaluate the accuracy of the model.

### Set up TensorFlaw

Import TensorFlow into your program to get started:

In [1]:
import tensorflow as tf
print('TensorFlow version:', tf.__version__) # tf.__version__ is a special attribute, providing version of a library thats installed in this environment

TensorFlow version: 2.18.0


### Load a dataset

Load and prepare the MNIST dataset. The pixel range of each images is 0 to 255. Scales from 0 to 1, by dividing them by `255.0`. This is so we work with them as floats, rather than in integers. Very important for later steps in neural networks.

In [2]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step


### Build a machine learning model

Build a `tf.keras.Sequential` model:

* `Sequential` > Linearly stacked architectural layer of neural networks. Imagine a stack of pancakes, and you go through them from top to bottom, therefore the 'linear stack' term.

In [3]:
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28)), # Converting 2d image input (28x28 image in this case), into 1d array. Typical first layer for image processing
    tf.keras.layers.Dense(128, activation='relu'), # Fully connected layer with 128 neurons, and using the ReLU activation function.
    tf.keras.layers.Dropout(0.2), # Regularization technique to prevent overfitting. Randomly setting a `rate` of input units to 0. rate > percentage of values
    tf.keras.layers.Dense(10) # Another fully connected layer with 10 neurons. Often used for classification problems with 10 classes. (e.g. digits 0-9)
])

  super().__init__(**kwargs)


Sequentials are useful for stacking layers, where each layer accepts one input, and expells one output. Layers are functions with mathematical features/operations, that can be reused and have trainable variables. Most TensorFlow models are comprised of layers.

In [4]:
predictions = model(x_train[:1]).numpy() # we're returning one row of prediction, just a small demo of model's performance
predictions

array([[-0.02802001,  0.01520202,  0.170098  ,  0.17979825,  0.1942593 ,
         0.08270757, -0.54096663,  0.0057817 , -0.13115457,  0.05920713]],
      dtype=float32)

Seem like random numbers, but the model returns a vector of `logits` or `log-odds` scores, one for each class.

So what are `logits` or `log-odds`?

* `logits`: The output of the final, dense neural networks. The values provided are not probabilities, but raw scores that can be positive or negative.
* `log-odds`: Aka Logarithmic odds. This is where you calculate the odds of something happening, compared to it not happening, but in logarithmic form. Where probabilities vary from 0 to 1, log-odd does it from -inf to inf respectively.

The `tf.nn.softmax` function converts these logits to probabilities for each class:

In [5]:
tf.nn.softmax(predictions).numpy()

array([[0.09536039, 0.09957243, 0.11625446, 0.11738764, 0.11909752,
        0.10652619, 0.05709501, 0.09863883, 0.08601561, 0.10405196]],
      dtype=float32)

Define a loss function for training using `losses.SparseCategoricalCrossentropy`:

In [8]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# SparseCategoricalCrossentropy > a loss function, commmonly used for multi-class classification problems, where target labels are integers
# from_logits = True > Making sure the variables provided are logits, which are raw unormalized scores churned out from model predictions

This function takes a vector of the truth values, and a vector of the logits, to return a scalar loss. This loss is equal to the negative log probability of the true class: For example, if loss is zero, then the model is sure the class is correct.

This untrained model would give probability closer to random, (1 out of 10 for each class), so initial results is closer to - `tf.math.log(1/10)` ~= 2.3

In [7]:
loss_fn(y_train[:1], predictions).numpy()

np.float32(2.2393644)

Before training, configure and compile the model using Keras `Model.compile`. Set the `optimizer` class to `adam`, set the `loss` to the `loss_fn` function you deifned earlier, and specify a metric to be evaluated for the model by setting the `metrics` parameter to `accuracy`.

In [13]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

So why are these parameters chosen? I've asked ChatGPT and get it's thought on the matter:

`optimize='adam`: Adam (Adaptive Moment Estimation) uses two optimization algorithms: **Momentum**, and **RMSProp**. It performs well on a wide range of problems, and computationally efficient. Key features of it are adaptive learning rates which speeds up convergence, faster movement in firections with consistent gradients, and addresses biasness in early stages of training.

`loss=loss_fn`: Using our loss function from before `SparseCategoricalCrossentropy(from_logits=True)`, suitable for multi-class classification problems. Key features is based on our dataset, where we use integer labels, and loss functions directly handle integer labels, making things efficient. `from_logits=True` is just making sure the output recieved is the raw scores from the model prediction.

`metrics=['accuracy']`: Commonly used metric, to simply measure accuracy of our model's performance on this classification problem. Counting the number of corect predictions, from total number of predictions.

### Train and evaluate your model

Use the `model.fit` method to adjust your model parameters and minimize the loss:

In [14]:
model.fit(x_train, y_train, epochs=5) # epochs > the number of times we train on the model

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 5ms/step - accuracy: 0.8602 - loss: 0.4844
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9554 - loss: 0.1528
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9675 - loss: 0.1066
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9732 - loss: 0.0882
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - accuracy: 0.9788 - loss: 0.0729


<keras.src.callbacks.history.History at 0x272dd4a2190>

The `Model.evaluate` method checks the model's performace, usually on a `validation set` or `test set`.

In [15]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 1s - 5ms/step - accuracy: 0.9763 - loss: 0.0761


[0.07607124000787735, 0.9763000011444092]

The image classifier is now trained to ~98% accuracy to this dataset. If you want your model to return a probability, you can wrap the trained model, and atach the softmax to it:

In [16]:
probability_model = tf.keras.Sequential([ # used to stack the original model, with the softmax layers, and create a work progress pipeline
    model, # outputting raw logits of the prediction data
    tf.keras.layers.Softmax() # for converting logits to probabilities - all values between 0 and 1
])

In [17]:
probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[1.95702705e-06, 1.04122485e-07, 2.57920528e-05, 4.18161129e-04,
        4.18612783e-11, 4.89322531e-07, 3.83468973e-13, 9.99530792e-01,
        1.36535630e-06, 2.14940337e-05],
       [1.66741502e-05, 1.48170147e-04, 9.99769151e-01, 1.56249753e-05,
        4.03221068e-17, 2.28980953e-05, 2.48716970e-05, 1.34645713e-13,
        2.65755421e-06, 3.98775061e-13],
       [1.97221766e-06, 9.98739660e-01, 1.52511959e-04, 3.75516424e-06,
        6.16409743e-05, 2.21513183e-05, 1.77154026e-04, 5.48244861e-04,
        2.92281009e-04, 7.41802523e-07],
       [9.99806464e-01, 4.02196143e-09, 1.64343310e-05, 3.79205067e-09,
        1.00194723e-06, 4.74255984e-07, 1.75137451e-04, 1.40318139e-08,
        3.21859073e-09, 5.92901984e-07],
       [2.35124462e-05, 6.74561761e-07, 1.25323970e-06, 1.17560596e-07,
        9.97681618e-01, 3.85828230e-07, 5.37731376e-06, 1.37466312e-04,
        1.04605874e-06, 2.14859657e-03]], dtype=float32)>

Here we see the probabilities for every list of values. See how tiny the values are, showing how the model is certain that they are not the right label to the image.

## Conclusion

Congrats! I have trained a machine learning model using a prebuilt dataset using the `Keras` API.