<a href="https://colab.research.google.com/github/Benjamin-morel/TensorFlow/blob/main/01_classification_image.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---


# Machine Learning Model: basic icon and character recognition

| | |
|------|------|
| Filename | 01_classification_image.ipynb |
| Author(s) | Benjamin Morel (benjaminmorel27@gmail.com) |
| Date | September 2, 2024 |
| Aim(s) | Build, train and evaluate a neural network machine learning model that classifies images. |
| Dataset(s) | Digit MNIST [[1]](https://www.kaggle.com/datasets/hojjatk/mnist-dataset) and Fashion MNIST [[2]](https://www.kaggle.com/datasets/zalando-research/fashionmnist) |
| Version | Python 3.12 - TensorFlow 2.17.0 |


<br> **!!Read before running!!** <br>
1. Fill in the inputs
2. GPU execution recommended.
3. Run all and read comments.

---



The image recognition is one of the possible applications of **weak artificial intelligence** (IA trained for a specific task). To do this, the AI is trained to classify a series of data into different categories (binaural classification). In this Python script, a **neural network** (NN) is built and trained to classify image of handwritten digits first and fashion images secondly.

To achieve this, the neural network defines its own internal parameters during the **training phase** to correctly classify the images according to the **label** provided for each input. Then, the neural network is submitted to a test and **evaluation phase** in which it has to classify unknow images similar to the one it has learned during the training phase, but without knowing the label. The prediction made by the neural network is finally compared with the label provided. These steps are shown in the code below.

## 0. Input section

The model has already been trained: parameters (weights and biases) of each neuron are already known according to the base dataset. The user can choose to keep these parameters and not retrain (No), or he can decide to repeat the training phase (Yes). The latter choice may be justified by the fact that the user wishes to update the neural network against an updated dataset.

In [1]:
training_phase = 'Yes'

## 1. Import libraries & prebuilt dataset

Data from **MNIST databse** and correctly labeled is used to train and test the network. Images are 28 pixels by 28 pixels and represent **handwritten digits**. Each pixel is assigned a value corresponding to a **gray level** on a gray scale from 0 to 255 (RGB code, where the three primary colors are equal).

In [2]:
pip install -q -U keras-tuner


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/129.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m122.9/129.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import tensorflow as tf  # machine learning models
import numpy as np # scientific computing
import plotly.express as px # graphing packages
from plotly.subplots import make_subplots # make subplot graphs in plotly
import keras_tuner as kt

mnist = tf.keras.datasets.mnist # import MNIST dataset (70,000 handwritten digit images of 28x28 pixels)

In MNIST database, training and test sets are declared as **tensors**:


*   `x_train` = 60,000 x 28 x 28 : pixel values (0 to 255) of 60,000 images
*   `y_train` = 60,000 x 1 : label name (0 to 9) of 60,000 images
*   `x_test` = 10,000 x 28 x 28 : pixel values (0 to 255) of 10,000 test images

Data are then pre-processed by **normalizing** it. The pixel values are now ranged between 0 and 1.

In [4]:
(x_train, y_train), (x_test, y_test) = mnist.load_data() # training + test tensor
x_train, x_test = x_train / 255.0, x_test / 255.0 # standardization

# plot the first 10 training data
fig = px.imshow(x_train[:10, :, :], color_continuous_scale='gray_r', facet_col=0, binary_string=False)
for i, label in enumerate(y_train[:10]):
    fig.layout.annotations[i]['text'] = 'label: %d' % label
fig.update_layout(margin=dict(l=10, r=10, t=100, b=100), width=1000, height=300)
fig.update_yaxes(visible=False, showticklabels=False), fig.update_xaxes(visible=False, showticklabels=False), fig.update(layout_coloraxis_showscale=False)
fig.show()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


## 2. Build the neural network machine learning model

The *keras* module is used to easily define neural networks by describing them layer by layer. The model is named *model* and is described by a succession of layers (Sequential type):


*   Layer 1 `tf.keras.layers.Flatten` : reformate data by converting a two-dimensional array (28x28) to a one-dimensional array (784x1)
*   Layer 2 `tf.keras.layers.Dense` : layer of 128 neurons. The ReLu function is used as an activation function
*   Layer 3 `tf.keras.layers.Dropout` : layer used to prevent overfitting. A dropout rate `DR` is defined and determines the probability of any given neuron being excluded temporaly from the neural network [[3]](https://www.scaler.com/topics/dropout-tensorflow/). At each training batch iteration, random neurons are desactivated according to the dropout rate. Therefore, the model must learn redundant representations and rely on something other than specific neurons for accurate predictions.
*   Layer 4 `tf.keras.layers.Dropout` : output layer of 10 neurons (0, 1, 2..., 9) to return a probability for each digit.



In [5]:
DR = 0.2

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28, 28))) # input layer
model.add(tf.keras.layers.Dense(128, activation='relu')) # hidden layer
model.add(tf.keras.layers.Dropout(DR))
model.add(tf.keras.layers.Dense(10)) # output (or classification) layer


Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



## 3. Train the model

The neural network is **ajusting** and is **optimizing** the thousands of parameters it has for each neuron input. **Parameter optimization** is based on minimization of the loss function `SparseCategoricalCrossentropy` which computes the cross-entropy loss between true labels and predicted labels. Optimization is performed using the **gradient descent algorithm** *ADAM*.

In [None]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy']) # use a stochastic gradient descent to minimize the loss function
model.fit(x_train, y_train, epochs=5)
model.summary()

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 9ms/step - accuracy: 0.8594 - loss: 0.4853
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 9ms/step - accuracy: 0.9557 - loss: 0.1511
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 5ms/step - accuracy: 0.9671 - loss: 0.1111
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9726 - loss: 0.0892
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 3ms/step - accuracy: 0.9773 - loss: 0.0721


## 4. Evaluate the model



**Comments**
<br> Once the model has been trained, it can be tested and **evaluated** by comparing the predictions it makes from the test set with its true values. It is also possible to study the probabilities it associates with each digit for a particular image. To do this, the values calculated at the output of the hidden layer are converted into probability using the ***softmax* function**. For each digit, a probability is associated and the model concludes by considering the **highest probability**.

In [None]:
test_loss, test_acc = model.evaluate(x_test,  y_test, verbose=1)
print(" --------------------------------------------- \n", 100*round(test_acc, 3) , "% of the test set is correctly predicted \n", "---------------------------------------------\n")
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9738 - loss: 0.0852
 --------------------------------------------- 
 97.8 % of the test set is correctly predicted 
 ---------------------------------------------



**Comments**
<br> It is possible to explore the different inputs of the test set and check what the model has predicted. To do this, the user enters the index of the image `i`for which the user wishes to know the prediction. The probabilities calculated by the model are displayed in a bar chart. The model concludes on the outfit with the highest probability. <br>

In [None]:
i = 1

In [None]:
def plot_value_array(i, predictions_array, true_label): # plot a bar chart with the probability value computed according to the label
  fig = px.bar(100*predictions_array, width=800, height=400, text_auto='.2f')
  fig.update_xaxes(title="Digit", tickvals=np.arange(10), ticktext=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]), fig.update_layout(showlegend=False), fig.update_yaxes(title="Probability (%)")
  fig.show()

predictions = probability_model.predict(x_test) # predict the label of each test sample according to the probability computed with softmax function
predictions[i] # first prediction
prediction_label = np.argmax(predictions[i]) # get the label of the max probability
true_label = y_test[i]
print("Prediction: ", prediction_label)
print("Label: ", true_label)

if prediction_label == true_label:
  print("\nGREAT!!")
else:
  print("\nBad prediction!")

plot_value_array(i, predictions[i],  true_label)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step
Prediction:  2
Label:  2

GREAT!!


## 5. Go further with the Fashion MNIST dataset

**Comments**
<br> Now, the neural network has to classify among 70,000 28x28 labeled **clothe images** from the Fashion MNIST database. There are 10 different items:
*   T-shirt (label value = 0)
*   Trouser (=1)
*   Pullover (=2)
*   Dress (=3)
*   Coat (=4)
*   Sandal (=5)
*   Shirt (=6)
*   Sneaker (=7)
*   Bag (=8)
*   Ankle boot (=9)
<br>

In [None]:
fashion_mnist = tf.keras.datasets.fashion_mnist # import fashion MNIST dataset

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() # create training and test tensors
train_images, test_images = train_images / 255.0, test_images / 255.0 # pre-processing training and test data

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] # clothe labels present in the dataset

fig = px.imshow(train_images[:12, :, :], color_continuous_scale='gray_r', facet_col=0, binary_string=False) # plot an example of the first 12 training data
for i, label in enumerate(train_labels[:12]): # label of the training data
    fig.layout.annotations[i]['text'] = '%s' % class_names[label]
fig.update_layout(margin=dict(l=10, r=10, t=100, b=100), width=1000, height=300)
fig.update_yaxes(visible=False, showticklabels=False), fig.update_xaxes(visible=False, showticklabels=False), fig.update(layout_coloraxis_showscale=False)
fig.show()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
[1m29515/29515[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
[1m26421880/26421880[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
[1m5148/5148[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
[1m4422102/4422102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


**Comments**
<br> Same model and loss function minimization as the previous example. The training phase is longer than the previous one. The test phase computes the accuracy of the neural network.

In [None]:
def model_builder(hp):
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))

  # Tune the number of units in the first Dense layer
  # Choose an optimal value between 32-512
  hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
  model.add(tf.keras.layers.Dense(units=hp_units, activation='relu'))
  model.add(tf.keras.layers.Dense(10))

  # Tune the learning rate for the optimizer
  # Choose an optimal value from 0.01, 0.001, or 0.0001
  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])

  return model

In [None]:
tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     max_epochs=10,
                     factor=3,
                     directory='my_dir',
                     project_name='intro_to_kt')


Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



In [None]:
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

In [None]:
tuner.search(train_images, train_labels, epochs=20, validation_split=0.2, callbacks=[stop_early])

# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

Trial 30 Complete [00h 01m 14s]
val_accuracy: 0.8600833415985107

Best val_accuracy So Far: 0.8914999961853027
Total elapsed time: 00h 19m 48s

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 480 and the optimal learning rate for the optimizer
is 0.001.



In [None]:
# Build the model with the optimal hyperparameters and train it on the data for 50 epochs
model = tuner.hypermodel.build(best_hps)
history = model.fit(train_images, train_labels, epochs=30, validation_split=0.2)

val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))

In [None]:
hypermodel = tuner.hypermodel.build(best_hps)

# Retrain the model
hypermodel.fit(img_train, label_train, epochs=best_epoch, validation_split=0.2)

NameError: name 'img_train' is not defined

In [None]:
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=1)
print(" --------------------------------------------- \n", 100*round(test_acc, 3) , "% of the test set is correctly predicted \n", "---------------------------------------------\n")
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
model.summary()

In [None]:
model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=1)
print(" --------------------------------------------- \n", 100*round(test_acc, 3) , "% of the test set is correctly predicted \n", "---------------------------------------------\n")
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
model.summary()

**Comments**
<br> It is possible to explore the different inputs of the test set and check what the model has predicted. To do this, the user enters the index of the image `i`for which the user wishes to know the prediction. The probabilities calculated by the model are displayed in a bar chart. The model concludes on the outfit with the highest probability. Try `i=12`to study a bad prediction case. <br>


In [None]:
i = 12

In [None]:
def plot_value_array(i, predictions_array, true_label): # plot a bar chart with the probability value computed according to the label
  true_label = true_label[i]
  fig = px.bar(100*predictions_array, width=800, height=400)
  fig.update_xaxes(title="Outfit", tickvals=np.arange(10), ticktext=class_names), fig.update_layout(showlegend=False), fig.update_yaxes(title="Probability (%)")
  fig.show()

predictions = probability_model.predict(test_images) # predict the label of each test sample according to the probability computed with softmax function
predictions[i] # first prediction
prediction_label = np.argmax(predictions[i]) # get the label of the max probability
true_label = test_labels[i]
print("Prediction: ", class_names[prediction_label] , "(",round(100*np.max(predictions[i]),1), "%)")
print("Label: ", class_names[true_label])

if prediction_label == true_label:
  print("\nGREAT!!")
else:
  print("\nBad prediction!")

plot_value_array(i, predictions[i],  test_labels)

**Comments**
<br> Finally, in order to done a more critical analysis of the neural network, the confusion matrix is calculated to identify confusions requiring further training. Here, T-shirt, shirt and sweater are 3 clothes that are sometimes confused. The shapes of these 3 clothes are very similar. It would be a good idea to perform a second training phase for these clothes by adding additional features to these 3 clothes (sleeve size, shirt buttons, texture, etc.).

In [None]:
confusion_mat = tf.math.confusion_matrix(test_labels,np.argmax(predictions, axis=1))
fig = px.imshow(confusion_mat, x = class_names, y = class_names, text_auto=True)
fig.update(layout_coloraxis_showscale=False)
fig.show()

## 6. Go further again...

**Comments**
<br> A final question that is legitimate to ask is: “Would doubling the number of hidden layers increase the accuracy of the model for the same learning and training set?” For this, the same Fashion MNIST dataset is taken and the neural network is modified to add a 2nd hidden layer of 128 neurons. Computation time increases by 9% compared to the single hidden layer model, and accuracy by only 0.35%. Adding more layers of neurons is not a serious solution in this case.

In [None]:
DR = 0.2
model = tf.keras.models.Sequential() # stack layers - each layer has 1 input tensor and 1 output tensor
model.add(tf.keras.layers.Flatten(input_shape=(28, 28))) # input layer
model.add(tf.keras.layers.Dense(128, activation='relu')) # hidden layer
model.add(tf.keras.layers.Dense(128, activation='relu')) # hidden layer
model.add(tf.keras.layers.Dropout(DR))
model.add(tf.keras.layers.Dense(10)) # classification layer

model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=1)
print(" --------------------------------------------- \n", 100*round(test_acc, 3) , "% of the test set is correctly predicted \n", "---------------------------------------------\n")