# Tensorflow tutorial - Introduction to Tensorflow 2 with MNIST
---

## Intro

This is a guided exercise to learn how to work with Tensorflow using one of the most common datasets available for learning.

The MNIST dataset includes labeled images of handwritten numbers. The user is meant to clasify these images into their integer value using a machine learning model. In our case, we're using the high-level Keras API included with Tensorflow.

Documentation info about MNIST:

> This is a dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images. More info can be found at the MNIST homepage.

Let's begin.

---

We start by importing the tensorflow library

In [1]:
import tensorflow as tf
import tensorrt #Hardware requirement
print("Tensorflow version: ", tf.__version__) #Version check

2023-04-29 07:56:20.207833: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Tensorflow version:  2.12.0


In [2]:
#This is a hardware requirement to limit memory size

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

1 Physical GPUs, 1 Logical GPUs


2023-04-29 07:56:22.512944: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-04-29 07:56:22.560229: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-04-29 07:56:22.560568: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf

Next, we can use a built-in function to import the MNIST dataset. Tensorflow already comes packages with it so we just have to import it.

In [3]:
mnist = tf.keras.datasets.mnist #We import the dataset into a 'mnist' variable

(x_train, y_train), (x_test, y_test) = mnist.load_data() #This function returns two tuples, the first one with image data, and the second one with data for validation
x_train, x_test = x_train / 255.0, x_test / 255.0 #We normalize the value of the image data

## Building the model

After we have imported the dataset and necesary dependencies, we can begin building our machine learning model. Make sure to keep up with the theoretical understanding of how a machine learning model works to get a better idea of what this code is doing.

---

In [4]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

From the Tensorflow website:

> Sequential is useful for stacking layers where each layer has one input tensor and one output tensor. Layers are functions with a known mathematical structure that can be reused and have trainable variables. Most TensorFlow models are composed of layers. This model uses the Flatten, Dense, and Dropout layers.

 >For each example, the model returns a vector of logits or log-odds scores, one for each class.

 Write about how each of these layers works:

In [5]:
predictions = model(x_train[:1]).numpy()
predictions

2023-04-29 07:56:24.280042: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


array([[-0.5832413 , -0.24974282,  0.42353064,  0.47082266,  0.28912127,
         0.29625663,  0.9161919 ,  0.20477396,  0.07922244, -0.604723  ]],
      dtype=float32)

The `tf.nn.softmax` function converts these logits to probabilities for each class:

In [6]:
tf.nn.softmax(predictions).numpy()

array([[0.04464163, 0.06231269, 0.12217306, 0.12808968, 0.10680761,
        0.10757245, 0.1999565 , 0.09816816, 0.08658534, 0.04369288]],
      dtype=float32)

Now, we define a loss function for training using `losses.SparseCategoricalCrossentropy(from_logits=True)`

In [7]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

The loss function takes a vector of ground truth values and a vector of logits and returns a scalar loss for each example. This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.

This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to `-tf.math.log(1/10) ~= 2.3`.

In [8]:
loss_fn(y_train[:1], predictions).numpy()

2.2295907

Before beginning training, we must compile our model. We set the optimizer to adam, set the loss to our previously defined function and then we specify a metric for evaluation. In our case we will be evaluating for accuracy.

In [9]:
model.compile(optimizer='adam',
              loss = loss_fn,
              metrics = ['accuracy'])

## Training the model and evaluation

Using the `Model.fit()` method, we train the model with the goal to minimize that loss.

---

In [10]:
model.fit(x_train, y_train, epochs = 5)

Epoch 1/5


2023-04-29 07:56:25.457168: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x7f0c180171e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-29 07:56:25.457191: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce RTX 3070, Compute Capability 8.6
2023-04-29 07:56:25.460850: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-04-29 07:56:26.550754: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8900
2023-04-29 07:56:26.656754: I ./tensorflow/compiler/jit/device_compiler.h:180] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f0daa121d90>

The `Model.evaluate` method checks the model's performance, usually on a validation set or test set.

In [11]:
model.evaluate(x_test, y_test, verbose = 2)

313/313 - 0s - loss: 0.0778 - accuracy: 0.9754 - 307ms/epoch - 980us/step


[0.07780385762453079, 0.9753999710083008]

The image classifier is now trained to ~98% accuracy on this dataset. To learn more, read the TensorFlow tutorials.

If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it:

In [12]:
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[1.38446623e-07, 7.89907606e-09, 7.89567991e-07, 1.44052858e-04,
        1.02526342e-10, 1.67062471e-08, 1.46812007e-11, 9.99852061e-01,
        1.03396282e-07, 2.88068441e-06],
       [1.05617826e-08, 1.68163257e-04, 9.99828577e-01, 1.49606990e-06,
        3.47938964e-15, 3.34662076e-07, 1.15246701e-06, 2.65052804e-14,
        2.31814084e-07, 2.42127417e-13],
       [7.60897194e-08, 9.99592841e-01, 1.33593985e-05, 3.91417552e-06,
        5.92567849e-05, 1.34732454e-05, 4.52999575e-06, 2.89406453e-04,
        2.21155915e-05, 1.04522394e-06],
       [9.98819411e-01, 6.59730715e-10, 3.47340712e-04, 1.56045047e-07,
        6.55504721e-07, 1.60455761e-06, 8.24973453e-04, 1.63362745e-06,
        5.26611416e-07, 3.71514307e-06],
       [1.32425697e-07, 1.99043369e-11, 1.45191098e-06, 3.96152089e-09,
        9.98261869e-01, 1.91603888e-09, 4.83292979e-07, 4.39803080e-06,
        5.87327634e-07, 1.73115591e-03]], dtype=float32)>

## Conclusion

The model has been succesfully trained with Keras as well as evaluated. Try modifying the code to be able to visualize a specific case.