<span style="font-size:10pt">AI-ML @ ENSPIMA / v1.2 september 2024 / Jean-Luc CHARLES (Jean-Luc.charles@mailo.com) / CC BY-SA 4.0 /</span>

<div style="color:brown;font-family:arial;font-size:26pt;font-weight:bold;text-align:center"> 
Machine learning with Python tensorflow2/keras modules</div><br>
<hr>
<div style="color:blue;font-family:arial;font-size:22pt;font-weight:bold;text-align:center"> 
Training a Dense Neural Network to classify handwritten digits<br><br>
DNN-Part-1 : Build and train a Dense Neural Network</div>
<hr>
Expected duration : 120 minutes

<div class="alert alert-block alert-danger">
<span style="color:brown;font-family:arial;font-size:12pt"> 
It is important to use a <span style="font-weight:bold;">Python Virtual Environment</span> (PVE) for your Python projects: a PVE makes it possible to control for each project the versions of the Python interpreter and the "sensitive" modules (like tensorflow).</span></div>

All the notebooks must be loaded in a `jupyter notebook` or `jupyter lab` launched within the <b><span style="color: rgb(200, 151, 102);" >pyml</span></b> PVE specially created for the session.<br>
They should be worked in this order:
- `ML1_MNIST.ipynb`: check that the <b><span style="color: rgb(200, 151, 102);">pyml</span></b> PVE is fuly operationnal, load and use the data from the MNIST database (images and labels).
- `ML2_DNN_part1.ipynb`: build a Dense Neural Network (DNN), train it with data from the MNIST and evaluate its performance.
- `ML2_DNN_part2.ipynb`: reload a previously trained DNN and evaluate its performance with the MNIST test data.

## Targeted learning objectives
Know how to:
- build a dense neural network with the Python modules **tensorflow** and **keras**.
- train a dense network with data from the MNIST bank.
- display the training performance curves.
- save the structure and the weights of the trained network.

## 1 - Verify importing Python modules
The **keras** module which allows high-level manipulation of **tensorflow** objects is integrated in the **tensorflow** (tf) module since version 2. <br>
The **tf.keras** module documentation to consult is here: https://www.tensorflow.org/api_docs/python/tf/keras.

Importing the `tensorflow` module in the cell below may generate some warning messages...<br>
if errors appear they must be corrected, possibly by recreating your PVE <b><span style="color: rgb(200, 51, 102);">pyml</span></b>:

In [None]:
import os, sys, cv2

# Delete the (numerous) warning messages from the **tensorflow** module:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# specific modules:
from utils.tools import scan_dir, plot_loss_accuracy

In [None]:
print(f"Python    : {sys.version.split()[0]}")
print(f"tensorflow: {tf.__version__} incluant keras {keras.__version__}")
print(f"numpy     : {np.__version__}")
print(f"OpenCV    : {cv2.__version__}")

Embedding matplotlib plots in the notebook:

In [None]:
%matplotlib inline

# Dense network structure to build
In this step you will build a **dense network**, with:
- an **input layer** of 784 values in the range [0 ; 1.]<br>
(the pixels of the MNIST 28 $\times$ 28 images put in the form of a vector of 784 normalized `float` numbers),
- a **hidden layer** of 784 neurons using the `relu` activation function,
- an **output layer** with 10 neurons, for the classification of images into 10 classes associated with the digits {0,1,2...9}, using the `softmax` activation function adapted to classification problems .

<p style="text-align:center; font-style:italic; font-size:12px;">
    <img src="img/ReseauChiffres-2_transp.png" alt="archiNetwork.png" style="width:900px;"><br>
    [image credit: JLC]
</p>

Remarks :
- Each neuron of the first hidden layer receives 785 inputs: the 784 values $x_i$ of the pixels of the image plus the bias.
- $\leadsto$ There are therefore 785 unknowns for each neuron: the 784 weights $w_i$ assigned to each input $x_i$, plus the weight $b$ assigned to input $-1$.
- $\leadsto$ there are therefore 785 $\times$ 784 unknowns for the hidden layer and 785 $\times$ 10 unknowns for the output layer: i.e. a total of 623290 unknowns whose value must be optimized by the algorithm d network learning.

<hr>

## Work to do
### 1 - Load MNIST images and define important parameters
### 2 - Pre-process MNIST images and labels
### 3 - Build the dense Neural Network 
### 4 - Save the struture & weights of the initial Network
### 5 - A first "naive" Network training 
### 6 - Train the Network while measuring its performance at each *epoch*
### 7 - Train the Network while measuring its performance at each *epoch* and managing the *over-fit*
### 8 - Save the struture & weights of the trained Network.<br>
<hr>

### 1 - Load MNIST images and define important parameters

The work of loading MNIST images has already been seen in the *notebook* `ML1_MNIST.ipynb`:

In [None]:
(im_train, lab_train), (im_test, lab_test) = tf.keras.datasets.mnist.load_data()
print("im_train -> shape:", im_train.shape, ", dtype:", im_train.dtype,)
print("im_test  -> shape:", im_test.shape,  ", dtype:", im_test.dtype,)
print("lab_train-> shape:", lab_train.shape,  ", dtype:", lab_train.dtype)
print("lab_test -> shape:", lab_test.shape,  ", dtype:", lab_test.dtype)

#### Define important parameters

To avoid "hard writing" the **number of training and test images**, the **dimension** of the images and the **number of classes** to recognize, these parameters are retrieved from existing object attributes:
- the `shape` attribute of the `im_train` and `im_test` tables allows to extract the number of training and test images,
- the `size` attribute of the first training (or test) image gives the number of pixels of the images (784),<br>
- the transformation of the `lab_test` array into a Python `set` (a set) gives the set of labels to recognize, whose size is the number of classes.

complete the cell below accordingly:

Checking:

In [None]:
print(f"{nb_im_train} traing images and {nb_im_test} test images")
print(f"{nb_pixel} pixels in each image")
print(f"{nb_classe} classes to recognize (the digits from 0 to 9)")

## 2 - Pre-process MNIST images and labels

Two treatments must be applied to the data from the MNIST database:
- on the images: transform the matrices of  28$\,\times\,$28 pixels (`uint8`integers) into **normalized** vectors $(V_i)_{i=0..783}$ of 784 real values $V_i$ with $ 0 \leqslant V_i \leqslant 1$;
- on labels: transform scalar numbers into *one-hot* vectors.

### Transform input matrices into normalized vectors

Define the arrays `x_train` and `x_test` containing the matrices of the arrays `im_train` and `im_test` *flattened* as normalized vectors (values between 0 and 1).<br>
*tips*:
- use the `reshape` method of the *ndarray* class of *numpy* and the `nb_im_train`, `nb_im_test` and `nb_pixel` parameters previously defined,
- normalization can be handled by dividing arrays by their max value.

Check the dimensions of the `x_train` and `x_test` arrays as well as their *min* and *max* values:

### *one-hot* encoding of labels:

Consult the documentation of the `to_categorical` function on the page [tf.keras.utils.to_categorical](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) and deduce how to define the `y_train` and `y_test` arrays containing the *hot-one* encoded version of the `lab_train` and `lab_test` arrays:

In [11]:
from tensorflow.keras.utils import to_categorical



Visually check the first 10 values of the `lab_train` and `y_train` arrays:

## 3 - Build the Dense Neural Network

We will now build a dense **sequential** neural network in Python **5 lines** using the **keras** module.

Build the network incrementally in the cell below, following the proposed approach (look for the `add` method in the page [guide/keras/sequential_model](https://www.tensorflow.org/guide/keras/sequential_model) if necessary) :
- 1/ Create the object `model` instance of the class `Sequential` (cf [tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)).
- 2/ With the `add` method of the `model` object add:
    - the input layer `Input(shape=<number of neurons>)` (cf [tf.keras.layers.Input](https://www.tensorflow.org/api_docs/python/tf/keras/Input) )<br>
    Use the `nb_pixel` parameter to specify the value of the `shape` parameter (which must be a `tuple`)...<br>
    - the intermediate dense layer (cf [tf.keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)): `Dense(<number of neurons>, activation='relu')` (cf [tf.keras.activation.relu](https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu))
    - the output dense layer: `Dense(<number of neurons>, activation='softmax')` (cf [tf.keras.activation.softmax](https://www.tensorflow.org/api_docs/python/tf/keras/activations/softmax)).<br>
Use the `nb_pixel` and `nb_classe` parameters to indicate the number of neurons and the number of classes without 'writing them hard'...
- 3/ Once built, the network must be compiled (in the sense of tensorflow) with the `compile` method and the arguments:
    - `loss='categorical_crossentropy'`: choice of the error function (cf [tf.keras.categorical_crossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/categorical_crossentropy))
    - `optimizer='adam'`: choice of Adam optimizer (see page [tf.keras.optimizers.Adam](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam) )
    - `metrics=['accuracy']` to obtain training statistics to draw performance curves.

In [13]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input

# set the seed of the random generators used by tensorflow:
SEED = 1234
tf.random.set_seed(SEED)
tf.config.experimental.enable_op_determinism()

# the 5 lines to build the neural network:
model = Sequential()
...
...
...
...

*Note*: By using the `name` argument in the `Input` and `Dense` constructors, one can give custom names to the layers, which will appear in the `summary` and `plot_model` outputs.

With the `summary` method of the `model` object, display the description of the model and check the dimensions...

Why are there some `None` in the "Output Shape" column?

Response: Because the number of images that will be used for training in not given at the DNN construction stage but later, et the training stage.

Find the total number of parameters with a simple formula...

The `plot_model` function draws the structure of the network (see the page [tf.keras.utils.plot_model](https://www.tensorflow.org/api_docs/python/tf/keras/utils/plot_model)). <br>
Plot the model structure by adding the `show_shapes=True` option to the `plot_model` call:

## 4 - Save the structure & weights of the initial Network

We can save the weights of the initial untrained network (random values) and its structure with the `model.save` method. <br>
This will be useful later to re-create the network to its initial state if we want to compare difrent trainings:

In [None]:
import os

# Check wether the 'model' directory exist (create it if needed):
if not os.path.exists("models"): os.mkdir("models")

# define a uniq key:
key = 'dense1_init'

# define the path where to store the network data:
path = os.path.join('models', key)

# savue the structure and the weights of the current neural network:
model.save(path)

# display the tree beginning at f'./models/{key}':
tree = scan_dir(f"./models/{key}")
print(f'\nFiles written:\n{tree}')    

Note: the `save` method uses the `key` part of its argument to prefix the created file names.<br>
$\leadsto$ When loading the DNN structure & weights later with the `tf.keras.models.load_model` method, we just use the same key to retrieve the relevant files.

## 5 - A first "naive" Network training 

If necessary, consult the documentation of the `fit` method on the page [tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential).

Complete the cell below to train the DNN with the `fit` method of the `model` object using the arguments:
- `x_train`: the 60000 flattened and normalized images
- `y_train`: the 60000 *one-hot* encoded labels.
- `epochs=15`: repeat the training 15 times.
- `batch_size=128`: split the input data set (the 60000 images) into "batches" of size `batch_size` (here: batches of 128 images).<br>
Updating network weights is done after each batch of `batch_size` images.<br>
The value of `batch_size` (by default: 32) is a parameter that influences the quality of the training but also its memory footprint: you can later try different values (64, 128, 256 ...) and observe how the quality of the training evolves).

Name `hist` the data returned by the `fit` method.

In [None]:
# define the key to reload the initial state & structure of the network:
key = 'dense1_init'

# define the path to be used:
path = os.path.join('models', key)

# load the network initial structure and weights:
model = tf.keras.models.load_model(path) 

# Deterministic tensorflow training: 
# set the seed of the random generators inolved by tensorflow:
tf.keras.utils.set_random_seed(SEED)
# see https://blog.tensorflow.org/2022/05/whats-new-in-tensorflow-29.html
tf.config.experimental.enable_op_determinism() 

# train the DNN:
hist = model.fit(  ...  )

Can you explain why there are 469 updates of the DNN weights per epoch ?

 if we divide the datat set size:60000 by the batch_szie:128, we get 468.75


The `hist` object returned by the `fit` method has a `history` attribute of type `dict` whose keys `'loss'` and `'accuracy'` are associated with the corresponding values at each _epoch_:

In [None]:
hist.history.keys()

In [None]:
hist.history['loss']

In [None]:
hist.history['accuracy']

### Plot of the `loss` and `accuracy` curves

The `plot_loss_accuracy` function of the `utils.tools` module (found in the notebook directory) plots the "Model accuracy" and "Model loss" curves with the data stored in `hist`.<br> Import and use the `plot_loss_accuracy` function to plot these curves:

## 6 - Train the Network while measuring its performance at each *epoch*

To have a better indicator of the quality of the training, you can test at the end of each `epoch` the precision of the inferences of the trained network using the test data: just pass the `validation_data` argument to the `fit` method, assigning it the test data tuple `(x_test, y_test)`:

In [None]:
# define the key to reload the initial state & structure of the network:
key = 'dense1_init'

# define the path to be used:
path = os.path.join('models', key)

# load the network initial structure and weights:
model = tf.keras.models.load_model(path) 

# Deterministic tensorflow training: 
# set the seed of the random generators inolved by tensorflow:
tf.keras.utils.set_random_seed(SEED)
# see https://blog.tensorflow.org/2022/05/whats-new-in-tensorflow-29.html
tf.config.experimental.enable_op_determinism() 

hist = model.fit(   ...   )

Now the `hist.history` dictionary has also the new keys `val_loss` and `val_accuracy` calculated with the test data:

In [None]:
hist.history.keys()

Plot again the curves with the `plot_loss_accuracy` function:

the precision calculated with the test data tends towards a limit close to 98.3 %. You might think that increasing the value of `epochs` would improve the precision of the network... but you run the risk of over-training the network (*over-fit*)...

## 7 - Train the Network while measuring its performance at each *epoch* <br>and managing the *over-fit*

The `Keras` module offers tools to automatically stop the training by monitoring for example the growth of precision (accuracy) from one `epoch` to another.<br>
You define the parameters of the `EarlyStopping` (cf [EarlyStopping](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping)) *callback* and pass it to the method `fit` via the `callbacks` argument:

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

# define the parameters of the `EarlyStopping`:
callbacks_list = [ 
    EarlyStopping(monitor='val_accuracy',  # the parameter to monitor
                  patience=2,              # accept that the parameter decreases only twice
                  restore_best_weights=True,
                  verbose=1)
]

# define the key to reload the initial state & structure of the network:
key = 'dense1_init'

# define the path to be used:
path = os.path.join('models', key)

# load the network initial structure and weights:
model = tf.keras.models.load_model(path) 

# Deterministic tensorflow training: 
# set the seed of the random generators inolved by tensorflow:
tf.keras.utils.set_random_seed(SEED)
# see https://blog.tensorflow.org/2022/05/whats-new-in-tensorflow-29.html
tf.config.experimental.enable_op_determinism() 

hist = model.fit(x_train, y_train,
                 validation_data=(x_test, y_test),
                 epochs=15, 
                 batch_size=128, 
                 callbacks = callbacks_list)

In [None]:
plot_loss_accuracy(hist)

Instead of monitoring the decrease of `val_accuracy` you can also monitor the increase of `val_loss`, which may be a prefeerd strategy (can you guess why ?):

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

# define the parameters of the `EarlyStopping`:
callbacks_list = [ 
    EarlyStopping(monitor='val_loss',  # the parameter to monitor
                  patience=2,          # accept that 'val_loss' increases twice
                  restore_best_weights=True,
                  verbose=1)
]

# define the key to reload the initial state & structure of the network:
key = 'dense1_init'

# define the path to be used:
path = os.path.join('models', key)

# load the network initial structure and weights:
model = tf.keras.models.load_model(path) 

# Deterministic tensorflow training: 
# set the seed of the random generators inolved by tensorflow:
tf.keras.utils.set_random_seed(SEED)
# see https://blog.tensorflow.org/2022/05/whats-new-in-tensorflow-29.html
tf.config.experimental.enable_op_determinism() 

hist = model.fit(x_train, y_train,
                 validation_data=(x_test, y_test),
                 epochs=15, 
                 batch_size=128, 
                 callbacks = callbacks_list)

In [None]:
plot_loss_accuracy(hist)

## 8 - Save the structure & weights of the trained Network

The `save` method of the `Sequential` class saves **the structure** and the **weights** of the trained DNN.<br>
You can use later the `tf.keras.models.load_model` function to recreate the network and reload its trained weights to exploit it in operational situation.

In [None]:
import os

# Check wether the 'model' directory exist (create it if needed):
if not os.path.exists("models"): os.mkdir("models")

# define a uniq key:
key = 'dense1_trained'

# define the path where to store the network data:
path = os.path.join('models', key)

# savue the structure and the weights of the current neural network:
model.save(path)

# display the tree beginning at f'./models/{key}':
tree = scan_dir(f"./models/{key}")
print(f'\nFiles written:\n{tree}')    

### Further work:
You can now load the `ML2_DNN_part2_en.ipynb` notebook to learn how to exploit the DNN you have just rained.

## Other interesting resources... videos:

In [1]:
%%HTML
<iframe src="https://www.youtube.com/embed/trWrEWfhTVg" width="400" height="300" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

In [2]:
%%HTML
<iframe src="https://www.youtube.com/embed/aircAruvnKk" width="400" height="300" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

In [3]:
%%HTML
<iframe src="https://www.youtube.com/embed/IHZwWFHWa-w"width="400" height="300" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

In [4]:
%%HTML
<iframe src="https://www.youtube.com/embed/Ilg3gGewQ5U" width="400" height="300" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>