This short intro uses Keras for :
1. Uses Keras to load predefined dataset
2. Create an automatic neureal network machine learning model that classifies images
3. Train this neural network
4. Evaluate the accuracy of the model

Configure TensorFlow
Start by importing TensorFlow


In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}
import numpy as np
import tensorflow as tf

print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.11.0


## Load the dataset

Load the MNIST dataset. Convert the whole number examples in floating point number.
The pixel values of the images range from 0 trough 255. Scale these values to a range of 0 to 1 by dividing the values by 255.0. This also convert from int to float

In [18]:
import sys
import numpy

numpy.set_printoptions(threshold=sys.maxsize)

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # These are matrices the max value is 255 so we divide by 255 to get values float values between 0 and 1

## Build a machine learning model

Create an automatic learning model
Create a model tf.keras.Sequential by stacking layers 

1. With Keras, there is 2 ways of building models : 

    - **Sequential** model : Allow to build model layer by layer. Does not allow to create models that share layers or have multiple inputs or outputs.
    - **Functional** model : Allow to create fare more complexe models. Layers are not only connected to the previous and the next one but can be connected to any other one. 
    
        This allow to create networks such as siamese or residual networks
    - So Sequential is useful for stacking layers where each layer has one input **tensor** and one output **tensor**
    - A **tensor** is an array in Tensorflow. It is like a np.arrays in NumPy. Basically, they are used like matrices in the neural networks computations. they can be rank-0 (constant array), rank-1 (1-D array), rank-2 (2-D array), etc...
    - Layers are function with a known mathematical structure that can be reused and have trainable variables. Most Tensorflow models are composed of layers.
    - **Flatten layer** : collapses the spatial dimension of the input into the channel dimension. Ex : input = H by W by C by N by S array (sequence oif images), then the flattened output is (H\*W\*B) by N by S array. 

        In our example, the Dense layer is using a 128-d vector so the Flatten layer will automaticcaly transform the input to match the requirements of the Dense layer
    - **Dense layer** : The Dense layer is a neural network that is **connected deeply**. This means that each neuron in the dense layer is receiving input from all neurons of its previous layer. Most commonly used layer in the models. 
        
        In the background, the dense layer performs a matric-vector multiplication. The values used in the matrix are actually parameters that can be trained and updated with the help of backpropagation. 
        
            keras.layers.Dense(units, activation=None, use_bias=True, ...) 

                - units = represents the output size of the layer. It is the unit parameter itself that plays a major role in the size of the weight matrix along the basic vector 

                - activation = activation function = function that decides wether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. 

                **Weight*** increase the steepness of activation function = (how fast the activation function will trigger). Whereas the **Bias** is used to delay the triggerinf of the activation function. 
                It may be critical for successful training.
                
                The purpose is to introduce a non-linearity into the output of a neuron. Makes the back-propagation possible. 
                There is a lot of different activation functions with differnets outcomes and roles.  

                - use_bias = wether we should use bias or not, default is True. 

    - **Dropout layer** : randomly sets input units to 0 with a frequency of rate at each step during training. = Drop some inputs, which help prevent overfitting. 

        - overfitting = a model that models the training data too well = rely too much on the training data instead of learning how to find output by itself. 
        - underfitting = a model that can neither model the training data nor generalize to new data. 
        - **Ideally, we want a model at the sweet spot between overfitting and underfitting.**
    

In [3]:

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 128)               100480    
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________



- The model returns a vector of **logits** or **log-odds** : 
    - **logits** : The vector of raw (= non-normalized) predictions that the classification model generates. **KEEP IN MIND THAT THE PREDICTIONS HAVE TO BE NORMALIZED USING A NORMALIZATION FUNCTION (LIKE SOFTMAX ?) AFTERWARD**
    - **log-odds**: The logarithm of the odds of some events.

In [4]:
predictions = model(x_train[:1]).numpy() # !! we need to pass an array to the model so we can NOT call it with x_train[0] since it will returns only the value, this is why we use x_train[:1]. [3, 4, 5][0] = 3 WHEREAS [3, 4, 5][:1] = [3]
predictions

array([[ 0.49722886,  0.65349144,  0.11298701, -0.31919092, -0.61132145,
         0.88351756, -0.1698367 , -0.43340534, -0.32834166, -0.84037584]],
      dtype=float32)

The _tf.nn.softmax_ functon converts these logits to probabilities for each class. This is the normalization function.

In [5]:
tf.nn.softmax(predictions).numpy()

array([[0.14921737, 0.174455  , 0.10161207, 0.06595577, 0.04924727,
        0.2195748 , 0.0765802 , 0.05883695, 0.06535499, 0.03916563]],
      dtype=float32)

**Note: It is possible to bake the tf.nn.softmax function into the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.**

Define a loss unfciton for training using **losses.SparseCategoricalCrossentropy**


In [6]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

The loss function takes a vector of ground truth values (= real value from dataset, value we know to be true) and a vector of logits anrd returns a scalar loss for each example. 
This loss is equal to the negative log probability of the true class: The loss is zero if the model is sure of the correct class.
This untrained model gives probabilities close to random (1/10 for each class), so the initial loss should be close to ```tf.math.log(1/10) ~= 2.3```

In [7]:
loss_fn(y_train[:1], predictions).numpy()

1.5160624

Before you start training, configure and compile the model using Keras ``Model.compile``. Set the optimize class to ``"adam"``, set the the ``loss`` ``loss_fn`` function you defined earlier, and specify a metric to be evaluated for the model by settings the ``metrics`` parameters to ``accuracy``.

Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data.


In [8]:
model.compile(
    optimizer='adam',
    loss=loss_fn,
    metrics=['accuracy']
)

## Train and evaluate your model

Use the ``Model.fit`` method to adjust your model parameters and minimize the loss:

In [9]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f82ad096c10>

The ``Model.evaluate`` method checks the model's performance, usually on a validation set or test set.

In [10]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 1s - loss: 0.0770 - accuracy: 0.9774 - 1s/epoch - 3ms/step


[0.07701293379068375, 0.977400004863739]

The image classifier is now trained to ~98% accuracy on this dataset.

If you want your model to return a probability, you can wrap the trained model, and attach the softmax to it :

In [11]:
probability_model = tf.keras.Sequential([
    model,
    tf.keras.layers.Softmax()
])

probability_model(x_test[:5])

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[6.07538198e-08, 5.60188127e-08, 6.20627861e-06, 3.90672722e-05,
        2.25451241e-10, 4.58226907e-07, 1.14042695e-12, 9.99950528e-01,
        2.69306810e-07, 3.25400561e-06],
       [2.83171753e-09, 6.07930997e-05, 9.99925852e-01, 1.24910584e-05,
        2.34206050e-14, 5.20975163e-07, 1.63246611e-07, 5.64980768e-14,
        7.97295741e-08, 1.63764185e-16],
       [4.14591750e-06, 9.98629928e-01, 3.35483404e-04, 1.18474863e-05,
        2.35423911e-04, 1.22813808e-05, 2.49609111e-05, 6.78999582e-04,
        6.65048210e-05, 4.01648151e-07],
       [9.99186933e-01, 2.41832481e-07, 2.56106432e-04, 6.63312130e-06,
        2.25359054e-06, 4.91436513e-05, 4.24302212e-04, 5.99951054e-05,
        1.88149306e-06, 1.24960125e-05],
       [7.93751667e-07, 8.01753997e-10, 2.52020158e-07, 2.25852208e-08,
        9.92080033e-01, 8.27514484e-08, 1.02170225e-06, 2.29308807e-05,
        2.58874763e-07, 7.89453182e-03]], dtype=float32)>

In [12]:
for i in range(100):
    q = model.predict( np.array([x_test[i]]) )

    print(f"q len = {len(q)}")
    print(f"result = {y_test[i]}" )
    print(f"found result = {np.where(q[0] == numpy.amax(q[0]))[0][0] }")
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####
    ####§§§TODO REVOIR TUTO FLEUR, QUELLE EST LA VALEURE D'UN NEURONE ? puis faire autre tuto nmis tensorflow pour mieux comprendre
    ####

q len = 1
result = 7
found result = 7
q len = 1
result = 2
found result = 2
q len = 1
result = 1
found result = 1
q len = 1
result = 0
found result = 0
q len = 1
result = 4
found result = 4
q len = 1
result = 1
found result = 1
q len = 1
result = 4
found result = 4
q len = 1
result = 9
found result = 9
q len = 1
result = 5
found result = 5
q len = 1
result = 9
found result = 9
q len = 1
result = 0
found result = 0
q len = 1
result = 6
found result = 6
q len = 1
result = 9
found result = 9
q len = 1
result = 0
found result = 0
q len = 1
result = 1
found result = 1
q len = 1
result = 5
found result = 5
q len = 1
result = 9
found result = 9
q len = 1
result = 7
found result = 7
q len = 1
result = 3
found result = 3
q len = 1
result = 4
found result = 4
q len = 1
result = 9
found result = 9
q len = 1
result = 6
found result = 6
q len = 1
result = 6
found result = 6
q len = 1
result = 5
found result = 5
q len = 1
result = 4
found result = 4
q len = 1
result = 0
found result = 0
q len = 1
re

### Save the model
Using ```the model.save(filepath)``` method

In [15]:
model.save('./basic-ocr-model')

INFO:tensorflow:Assets written to: ./basic-ocr-model/assets


### Load the model


In [16]:
loaded_model = tf.keras.models.load_model('./basic-ocr-model/')
