<a href="https://colab.research.google.com/github/Benjamin-morel/TensorFlow/blob/main/01_classification_image.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---


# **Machine Learning Model: basic icon and character recognition**

| | |
|------|------|
| Filename | 01_classification_image.ipynb |
| Author(s) | Benjamin Morel (benjaminmorel27@gmail.com) |
| Date | September 2, 2024 |
| Aim(s) | Build, train and evaluate a neural network machine learning model that classifies images. |
| Dataset(s) | Digit MNIST [[1]](https://www.kaggle.com/datasets/hojjatk/mnist-dataset) and Fashion MNIST [[2]](https://www.kaggle.com/datasets/zalando-research/fashionmnist) |
| Version | Python 3.12 - TensorFlow 2.17.0 |


<br> **!!Read before running!!** <br>
1. Fill in the inputs
2. GPU execution recommended if `training_phase="Yes"`.
3. Run all and read comments.

---

#### **Motivation**
The image recognition is one of the possible applications of **weak artificial intelligence** (IA trained for a specific task). To do this, an AI is trained to classify a series of data into different categories (binary classification). In this Python script, a **neural network** (NN) is built and trained to classify image of handwritten digits first and fashion images secondly.

#### **Outline**
To achieve this, the neural network defines its own internal parameters during the **training phase** to correctly classify the images according to the **label** provided for each input. Then, the neural network is submitted to a test and **evaluation phase** in which it has to classify unknow images similar to the one it has learned during the training phase, but without knowing the label. The prediction made by the neural network is finally compared with the label provided. These steps are shown in the code below.


---


#### **0. Input section**

The model has already been trained: **parameters** (weights and biases) of each neuron are already known according to the base dataset. The user can choose to keep these parameters and **not retrain the model** (No), or he can decide to repeat the **training phase** (Yes). The latter choice may be justified by the fact that the user wishes to update the neural network against an updated dataset.

In [1]:
training_phase = 'No'



---


#### **1. Import libraries & prebuilt dataset**


###### **1.1. Librairies and dependencies**

In [2]:
pip install pyyaml h5py  # dependencies required to save models in HDF5 format



In [3]:
import tensorflow as tf  # machine learning models
import numpy as np # scientific computing
import plotly.express as px # graphing packages
from plotly.subplots import make_subplots # make subplot graphs in plotly
import os
from PIL import Image

###### **1.2. Github importations**

In [4]:
def get_github_files():
  !git clone https://github.com/Benjamin-morel/TensorFlow.git TensorFlow_duplicata # go to the Github repertory TensorFlow and clone it
  results_graph_MNIST = Image.open("TensorFlow_duplicata/99_pre_trained_models/01_classification_image/results_graph_MNIST.png") # see section 3
  model_MNIST = tf.keras.models.load_model('TensorFlow_duplicata/99_pre_trained_models/01_classification_image/01_classification_image_digit.keras') # pre-trained model
  model_MNIST_fashion = tf.keras.models.load_model('TensorFlow_duplicata/99_pre_trained_models/01_classification_image/01_classification_image_fashion_model.keras') # pre-trained model
  !rm -rf TensorFlow_duplicata/ # delete the cloned repertory
  return results_graph_MNIST, model_MNIST, model_MNIST_fashion

###### **1.3. Graphic functions**

In [5]:
# Function 1: plot a sample of nb image with the label as title
def show_data(image, labels, nb):
  fig = px.imshow(image[:nb, :, :], color_continuous_scale='gray_r', facet_col=0, binary_string=False)
  for i, label in enumerate(labels[:nb]):
    fig.layout.annotations[i]['text'] = 'label: %s' % label
  fig.update_layout(margin=dict(l=20, r=20, t=100, b=100), width=1500, height=300)
  fig.update_yaxes(visible=False, showticklabels=False), fig.update_xaxes(visible=False, showticklabels=False), fig.update(layout_coloraxis_showscale=False)
  fig.show()

In [6]:
# Function 2: plot the accuracy obtained from one dataset at each epoch
def show_evolution(history, val):
  history_dict = history.history

  if val == False: # get either the training set accuracy or the validation set accuracy
    acc_train = history_dict['accuracy']
  else:
    acc_train = history_dict['val_accuracy']

  epochs = range(1, len(acc_train) + 1)

  fig = px.line(x = epochs, y = acc_train, width=600, height=400)
  fig.update_layout(legend=dict(x=0.02, y=0.98, xanchor='left', yanchor='top', bgcolor='rgba(255, 255, 255, 0.8)', bordercolor='black', borderwidth=1))
  if val == False: fig.update_traces(name="training", showlegend=True)
  else: fig.update_traces(name="validation", showlegend=True)
  fig.update_xaxes(title = "epochs"), fig.update_yaxes(title = "accuracy")
  fig.show()

In [7]:
# Function 3: plot the probability computed by the model for each output
def show_proba(i, predictions_array, true_label, label): # plot a bar chart with the probability value computed according to the label
  fig = px.bar(x = list(range(len(label))), y = 100*predictions_array, width=600, height=400, text_auto='.2f', title = "Target: %s" %true_label)
  fig.update_xaxes(title="Label", tickvals=list(range(len(label))), ticktext=label), fig.update_layout(showlegend=False), fig.update_yaxes(title="Probability (%)")
  fig.show()

In [8]:
# Function 4: plot the confusion matrix
def show_confusion_mat(actual, prediction,labels):
  confusion_mat = tf.math.confusion_matrix(actual,np.argmax(prediction, axis=1))
  fig = px.imshow(confusion_mat, x = labels, y = labels, text_auto=True, labels=dict(x="Actual", y="Prediction"))
  fig.update(layout_coloraxis_showscale=False)
  fig.show()

###### **1.4. Retrieve data**

Data from **MNIST databse** and correctly labeled is used to train and test the network. Images are 28 pixels by 28 pixels and represent **handwritten digits**. Each pixel is assigned a value corresponding to a **gray level** on a gray scale from 0 to 255 (RGB code, where the three primary colors are equal).

In [9]:
mnist = tf.keras.datasets.mnist # import MNIST dataset (70,000 handwritten digit images of 28x28 pixels)

###### **1.5. How data is organized within MNIST?**

In MNIST database, training and test sets are declared as **tensors**:


*   `x_train` = 60,000 x 28 x 28 : pixel values (0 to 255) of 60,000 images
*   `y_train` = 60,000 x 1 : label name (0 to 9) of 60,000 images
*   `x_test` = 10,000 x 28 x 28 : pixel values (0 to 255) of 10,000 test images

Data are then pre-processed by **normalizing** it. The pixel values are now ranged between 0 and 1.

In [10]:
(x_train, y_train), (x_test, y_test) = mnist.load_data() # training + test tensors
x_train, x_test = x_train / 255.0, x_test / 255.0 # standardization

In [11]:
show_data(x_train, y_train, 10)

---


#### **2. Build the neural network machine learning model**

###### **2.1. What is a neural network formed of?**

The *keras* module is used to easily define neural networks by describing them layer by layer. The model is named *model* and is described by a succession of layers (Sequential type). The function `create_model` is used to create a neural network with specific layers:
* `tf.keras.layers.Flatten` : reformate data by converting a two-dimensional array (28x28) to a one-dimensional array (784x1)
* `tf.keras.layers.Dense` : layer of 128 neurons. The ReLu function is used as an activation function
* `tf.keras.layers.Dropout` : layer used to prevent overfitting. A dropout rate `DR` is defined and determines the probability of any given neuron being excluded temporaly from the neural network [[3]](https://www.scaler.com/topics/dropout-tensorflow/). At each training batch iteration, random neurons are desactivated according to the dropout rate. Therefore, the model must learn redundant representations and rely on something other than specific neurons for accurate predictions.
* `tf.keras.layers.Dense` : output layer of 10 neurons (0, 1, 2..., 9) to return a vector of logits scores for each digit. For example, if the neural network returns `output = (3.6 1.4 11.1 6.2 -6.5 2.3 3.0 -8.7 1.6 -5.5)`, the prediction corresponds to the highest score - the 3th rank - which is number 2.

Here, the number of dense layers and neurons is not optimized, but follows the number of dense layers and neurons commonly used in the scientific literature [[4]](https://www.tensorflow.org/tutorials/quickstart/beginner).

The neural network is **ajusted** and **optimized** the thousands of parameters it has for each neuron input. **Parameter optimization** is based on minimization of the loss function `SparseCategoricalCrossentropy` which computes the cross-entropy loss between true labels and predicted labels. Optimization is performed using the **mini-batch gradient descent algorithm** *SGD*.

In [12]:
def create_model():
  DR = 0.2
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(28, 28))) # input layer
  model.add(tf.keras.layers.Dense(128, activation='relu')) # hidden layer
  model.add(tf.keras.layers.Dropout(DR))
  model.add(tf.keras.layers.Dense(10)) # output (or classification) layer

  loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) # loss or error function
  model.compile(optimizer="sgd", loss=loss_fn, metrics=['accuracy']) # use a gradient descent algorithm to minimize the loss function

  return model

###### **2.2. How to load the neural network?**

By using `create_model`, the neural network is created. The command `summary`can be used to check that the layers are implemented correctly and in the right order. The number of parameters is also specified. Only the layers `Dense` need to be trained to optimize neuron parameters.

It is possible to check the number of weights to be calculated for each fully connected deep layer. There are 128 neurons each connected to the 784 neurons of the input layer. Moreover, one bias per neuron must be added. In the end, for the first layer `Dense`, we have 784 x 128 + 128 = 100,480 parameters and for the last layer 128 x 10 + 10 = 1,290 neurons to train.

In [13]:
model_MNIST = create_model()
model_MNIST.summary()


Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



---


#### **3. Train the model**

###### **3.1. How do I create a backup of model parameters and architecture?**

To save parameter values (= weights) once the model has been trained, a save point is created. The values will be saved in HDF5 format and can be loaded and/or imported.

In [14]:
if training_phase == 'Yes':
  cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath="01_classification_image_digit.keras",
                                                   monitor = "accuracy",
                                                   save_best_only = True,
                                                    mode = "max",
                                                    save_weights_only=False,
                                                    verbose=0)

###### **3.2. How to train the model?**

The `fit` function is used to compute the weights. Calculations are performed using the mini-batch gradient descent method (chosen previously). The batch size is set to 32, a good thereshold to balance between computational speed and precision of the error gradient. The training data used are `X_train` and `Y_train`. The `epochs` option determines the number of steps in the gradient method (= 60,000 / 32 = 1875 iterations per epoch). The function `show_evolution`is used to visualize the evolution of the training set accuracy (`val = False`) or the validation set accuracy (`val = True`).  




In [15]:
if training_phase == 'Yes':
  history = model_MNIST.fit(x_train, y_train, batch_size=32, epochs=15, callbacks=[cp_callback], verbose=1) # save weights in callback
else:
  results_MNIST, model_MNIST, model_MNIST_fashion = get_github_files()

Cloning into 'TensorFlow_duplicata'...
remote: Enumerating objects: 258, done.[K
remote: Counting objects: 100% (155/155), done.[K
remote: Compressing objects: 100% (135/135), done.[K
remote: Total 258 (delta 89), reused 19 (delta 19), pack-reused 103 (from 1)[K
Receiving objects: 100% (258/258), 31.16 MiB | 18.66 MiB/s, done.
Resolving deltas: 100% (122/122), done.


In [16]:
if training_phase == 'Yes':
  show_evolution(history, False)
else:
  fig = px.imshow(results_MNIST)
  fig.update_layout(width=700, height=700, xaxis=dict(visible=False), yaxis=dict(visible=False))
  fig.show()


---


#### **4. Evaluate the model**

Once the model has been trained, it can be tested and **evaluated** by comparing the predictions it makes from the test set with its true values.

In [17]:
test_loss, test_acc = model_MNIST.evaluate(x_test,  y_test, verbose=0) # give the loss function value (minimum reached by SGD) and accuracy for the test set
print(f"""{round(100*test_acc, 3)}% of the test set is corretly predicted""")

96.75% of the test set is corretly predicted



---


#### **5. Digit recognition**

###### **5.1. How to access model predictions?**

To predict handwritten figures from the test set or external image, it is essential to study the **probability level** that the model computes for each test or prediction. To do this, the 10-uplet of logit scores calculated at the output layer is converted into a 10-uplet of propabilities. The conversion is done with the **softmax function**. For each digit, a probability is associated and the model concludes by considering the **highest probability**.

The user can enter the index `i` of the image to know the prediction.  

In [18]:
i = 7388 # from 0 to 9,999

In [19]:
probability_model = tf.keras.Sequential([model_MNIST, tf.keras.layers.Softmax()]) # combine the model and a softmax layer to get propabilities as output
predictions = probability_model.predict(x_test, verbose=0) # predict the label
prediction_label = np.argmax(predictions[i]) # get the label of the max probability
true_label = y_test[i]

show_proba(i, predictions[i], true_label, np.arange(0,10))

###### **5.2. How to explore the entire test set?**

To visualize the set of predictions made on the test set of 10,000 images, it is common to use the confusion matrix. A successful classification model can be confirmed when the sum of the diagonal elements of this matrix is close to or equal to the total number of predictions made (here 10,000).

In [20]:
show_confusion_mat(y_test, predictions, np.arange(0,10))

---


#### **6. Go further with the Fashion MNIST dataset**

###### **6.1. Librairies and dependencies**

To take this a step further, another database is used, with the aim of optimizing the number of neurons in the dense layer and the number of epochs during the training phase. The optimal learning rate for the gradient descent algorithm is also tuned. The Keras module `tuner` is used for this specific task.

In [21]:
pip install -q -U keras-tuner # install the tuner module

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/129.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [22]:
import keras_tuner as kt

###### **6.2. Database importation and organization**

The database considered is Fashion MNIST composed of 70,000 labeled **clothing images**. Images have a 28x28 pixels size and they are labeled according to the following classification:
*   T-shirt (label value = 0)
*   Trouser (=1)
*   Pullover (=2)
*   Dress (=3)
*   Coat (=4)
*   Sandal (=5)
*   Shirt (=6)
*   Sneaker (=7)
*   Bag (=8)
*   Ankle boot (=9)

To simplify data reading, training labels are replaced by their corresponding names. The training labels are first converted to 32-bit integers to ensure compatibility with the mapping operation performed by `tf.gather`. The elements of `class_names` are extracted at the indices specified by `train_labels`. Finally, class names stored in bytes format are converted to strings.

A brief visualization of the images in this database is presented by using function `show_data`.

In [23]:
fashion_mnist = tf.keras.datasets.fashion_mnist # import fashion MNIST dataset

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() # create training and test tensors
train_images, test_images = train_images / 255.0, test_images / 255.0 # pre-processing training and test data

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] # label clothe names definition

train_labels = tf.cast(train_labels, tf.int32) # convert int8 into int32
train_labels_name = tf.gather(class_names, train_labels) # map class_names with train_labels
train_labels_name = [name.decode('utf-8') for name in train_labels_name.numpy()] # convert bytes into strings

show_data(train_images, train_labels_name, 10) # function defined in 1.2

###### **6.3. Model and optimization**

The neural network used has the same architecture than the previous one: a sequence of `Flatten`, `Dense`, `Dropout` and `Dense` layers. . However, the number of neurons (= units) in the first full-connected layer `Dense` is not set but optimized by testing different values of units to find the best configuration (= hyperparameter tuning process). For this, a hyperparameter named `units` is created for an integer range from 256 to 512 neurons, with a step size of 32.

The learning rate chosen for the gradient descent algorithm (here the powerful ADAM algorithm) is also tested for 3 different values: 0.01, 0.001 and 0.0001.

In [24]:
def create_model_tune(hp):
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
  hp_units = hp.Int('units', min_value=256, max_value=512, step=32) # tune the number of neurons in the first dense layer (256 to 512)
  model.add(tf.keras.layers.Dense(units=hp_units, activation='relu'))
  model.add(tf.keras.layers.Dropout(0.2))
  model.add(tf.keras.layers.Dense(10))

  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4]) # tune the learning rate for the gradient descent algorithm
  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])

  return model

`Hyperband` is a hyperparametric search algorithm that used the function `create_model_tune` to build and test the model with different hyperparameter combinaisons [[5]](https://arxiv.org/pdf/1603.06560). The optimal combination is the one that maximizes `val_accuracy`. The maximum number of training epochs for each hyperparameter configuration is set to 10. At the beginning of the search process, Hyperband starts by training the model with a few epochs and then gradually increases the epoch number for the most promising combinaisons.

In [25]:
if training_phase == "Yes":
  tuner = kt.Hyperband(create_model_tune, objective='val_accuracy', max_epochs=10)

The search for optimal hyperparameters can take consume a lot of time and sometimes continue unnecessarily. `EarlyStopping` allows you to stop training the model if performance on the validation set no longer improves. Thus, if `val_loss` doesn't decreases at least by 0.001 during 5 epochs, training stops and the best epoch weights are restored.

In [26]:
if training_phase == "Yes":
  stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True, min_delta=0.001)

In [27]:
if training_phase == "Yes":
  tuner.search(train_images, train_labels, validation_split=0.2, callbacks=[stop_early])

  best_hps=tuner.get_best_hyperparameters(num_trials=1)[0] # get the optimal hyperparameters

  print(f"""
  The hyperparameter search is complete. The optimal number of units in the first densely-connected
  layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
  is {best_hps.get('learning_rate')}.
  """)

###### **6.4. Model training**

It is possible to optimize the number of epochs required for the training phase. A first training is performed with 50 epochs, then the number of epochs where `val_loss` is maximal is retained as the optimal number of epochs.

In [28]:
if training_phase == "Yes":
  hypermodel = tuner.hypermodel.build(best_hps)
  history = hypermodel.fit(train_images, train_labels, epochs=50, validation_split=0.2, callbacks=[stop_early], verbose=1)
  val_acc_per_epoch = history.history['val_accuracy']
  best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
  print('Best epoch: %d' % (best_epoch,))
  hypermodel.save('01_classification_image_fashion_model.keras')
else:
  hypermodel = model_MNIST_fashion

In [29]:
if training_phase == 'Yes':
  show_evolution(history, True)

Finally, the model is trained one last time with the optimal hyperparameters and number of epochs. The entire model is saved at the end of the training phase to export the model architecture (particularly the number of neurons in the first layer `Dense`) and weights.

###### **6.5. Validation and comments**

It's a useful feature to visualize the architecture and number of weights after training. Only the parameters of the layers `Dense` have been calculated and updated, since the other layers - `Flatten` and `Dropout` - do not affect the neural network data. In total, 356,170 parameters were calculated and the ADAM optimizer stores 2 vectors per trainable parameter (= 712,342).

In [30]:
test_loss, test_acc = hypermodel.evaluate(test_images,  test_labels, verbose=0)
print(f"""{100*round(test_acc, 3)} of the test set is corretly predicted""")

88.7 of the test set is corretly predicted


It is possible to explore the different inputs of the test set and check what the model has predicted. To do this, the user enters the index of the image `i` for which the user wishes to know the prediction. The probabilities calculated by the model are displayed in a bar chart. The model concludes on the outfit with the highest probability. Try `i=12` to study a bad prediction case.


In [31]:
i = 12

In [32]:
probability_hypermodel = tf.keras.Sequential([hypermodel, tf.keras.layers.Softmax()])

predictions = probability_hypermodel.predict(test_images, verbose=0) # predict the label of each test sample according to the probability computed with softmax function
prediction_label = np.argmax(predictions[i]) # get the label of the max probability
true_label = test_labels[i]

show_proba(i, predictions[i,:], class_names[true_label], class_names)

Finally, in order to done a more critical analysis of the neural network, the confusion matrix is calculated to identify confusions requiring an extra training. Here, T-shirt, shirt and sweater are 3 clothes that are sometimes confused. The shapes of these 3 clothes are very similar. It would be a good idea to perform a second training phase for these clothes by adding additional features to these 3 clothes (sleeve size, shirt buttons, texture, etc.).

In [33]:
show_confusion_mat(test_labels, predictions, class_names)

---


#### **7. Comments and limitations**

The neural network models built in this code achieve satisfactory results, with an accuracy rate on the validation and test set of over 88%. However, the low resolution of the images helps the models to achieve this level of performance. A image is defined here by 784 values, which are the inputs to the models. For images with much higher resolutions, the size of the input vector to the model exceeds one million, or even ten million for 8K resolutions. The number of fully connected hidden layers must necessarily increase, and can reach several thousand with several million parameters to calculate.

Such models, with an excessive unit number doesn't seem to be a suitable solution. The use of convolution layers in the Python script `04_convolution_CNN` provides satisfactory results for image classification in reasonable computation times.

---


#### **8. References**

<br>[[6]](https://www.tensorflow.org/tutorials/quickstart/beginner?hl=fr): TensorFlow 2 quickstart for beginners <br>
<br>[[7]](https://www.tensorflow.org/tutorials/keras/classification?hl=fr): Basic classification: Classify images of clothing <br>
<br>[[8]](https://www.tensorflow.org/tutorials/keras/keras_tuner?hl=fr): Introduction to the Keras Tuner <br>
<br>[[9]](https://www.tensorflow.org/tutorials/keras/save_and_load?hl=fr): Save and load models <br>