# Traffic signs detection and classification with Detecto and Tensorflow 

### Part 3 - *Classification*

All the functions and visualizations I used here can be found on my GitHub page: [https://github.com/alexisvannaire/GTSRB_detect-and-predict](https://github.com/alexisvannaire/GTSRB_detect-and-predict)

See the first part (1_detect-and-predict) if you need details on the packages I used.

**For this notebook:**

In [None]:
# libraries
import json
from functools import partial
from PIL import Image
import pandas as pd
import keras
import tensorflow as tf

## python files
import plots # plots.py
import process_data # process_data.py
import calculations # calculations.py

In [None]:
# False for default visualizations and computations
gtsrb_exists = False # True if you've placed the gtsrb dataset in the "./data/gtsrb/" folder

In [None]:
# variables
n_classes = 43

# IV. Classification Model

## 1. Models and preprocessing

For the classification task, we're going to train two types of models: standard CNNs and MobileNets.

### i) standard CNNs

A CNN is a type of neural networks using convolutional layers in order to reduce the dimensionality of data and extract features.
They're known to be used in image classification tasks.

In fact, there are succesive blocks composed of one convolutional layer and one pooling layer. The convolution creates features maps and the pooling reduces the dimentionality.

So the idea is that we start with large images ("high" dimensions in terms of height and width), then as we progress through the layers we reduce the image size and increase the depth. At the end, when we flatten the last feature map, we get a feature vector which will be used to classify images through a classical neural network (fully connected layer). 

Here's how you can create CNN models with tensorflow:

In [None]:
def init_simple_CNN_model(batch_size, img_height, img_width, n_classes):
    
    model = tf.keras.Sequential([
        # First conv layer: 64 filters with size 3x3
        tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal",
                              input_shape=(batch_size, img_height, img_width, 3)), # input shape
        tf.keras.layers.MaxPool2D(),
        # Second layer: 128 filters with size 3x3
        tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.MaxPool2D(),
        # Third layer: 256 filters with size 3x3 
        tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        C,
        # Flatten layer: convert the features map into a vector
        tf.keras.layers.Flatten(),
        # The fully connected layer (128->64->n_classes) try to classify features into the classes we want
        tf.keras.layers.Dense(units=128, activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Dense(units=64, activation="relu", kernel_initializer="he_normal"),
        # The ooutput layer: has to be as unit as the class number we have
        tf.keras.layers.Dense(units=n_classes, activation="softmax")
    ])
    return model

I trained 3 CNNs with the same architecture but with several input shapes.

As we've seen, the majority of images sizes are squares between 30 and 100 pixels.
So I've trained the first one on 30x30 pixels resized images, then another one on 60x60 pixels resized images and the last on 90x90 pixels resized images (for 50 epochs each).

After, I added Data Augmentation and started with these models as a base and trained them for more epochs. 

Here's how I named them:

* CNN_30-30_50e
* CNN_60-60_50e
* CNN_90-90_50e
* CNN_30-30_50e_DA-50e
* CNN_60-60_50e_DA_50e
* CNN_90-90_50e_DA-30e

The architecture I used is this one:

In [None]:
def init_CNN_model(batch_size, img_height, img_width, n_classes):
    
    model = tf.keras.Sequential([
        # Rescaling images: to get pixel values between 0 and 1 instead of 0 and 255 
        tf.keras.layers.Rescaling(1./255),
        # First conv layer: 64 filters with size 7x7
        tf.keras.layers.Conv2D(filters=64, kernel_size=7, padding="same", activation="relu", kernel_initializer="he_normal",
                              input_shape=(batch_size, img_height, img_width, 3)), # input shape
        tf.keras.layers.MaxPool2D(),
        # Second layer: 2 conv. layers with 128 filters with size 3x3 
        tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.MaxPool2D(),
        # Third layer: 2 conv. with 256 filters with size 3x3 
        tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.MaxPool2D(),
        # Flatten layer: convert the features map into a vector
        tf.keras.layers.Flatten(),
        # The fully connected layer (128->64->n_classes) try to classify features into the classes we want
        tf.keras.layers.Dense(units=128, activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=64, activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Dropout(0.5),
        # The output layer: has to be as unit as the class number we have
        tf.keras.layers.Dense(units=n_classes, activation="softmax")
    ])
    return model

In [None]:
batch_size = 32
img_height, img_width = 30, 30
cnn1 = init_CNN_model(batch_size, img_height, img_width, n_classes)

The changes you can see from the basic models I showed earlier are as follows:

* There's a rescaling layer that normalizes values to the interval $[0, 1]$ (neural networks work better with values into the intervals $[0, 1]$ or $[-1, 1]$). But of course you could do it directly on your data instead.
* There are two convolutional layers before the pooling one in layers 2 and 3.
* I added Dropout layers in the fully connected layer. 
This layer deactivates a random percentage of neurons at each step during training time.
This prevents the model from falling into overfitting.

### ii) MobileNets

MobileNet models have been developed with the aim of enabling the use of computer vision models for mobile and embedded applications.

Indeed, you can develop a great model really accurate but with so much parameters that the prediction speed is pretty low and the required memory size is too large to be handled by phones.

*"MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks."*

For more details you can see this article: https://arxiv.org/abs/1704.04861
    
Here you can see its architecture:

In [None]:
Image.open("imgs/mobilenet_architecture.png").convert("RGB")

You can easily get this model with Tensorflow:

In [None]:
mobile_net = tf.keras.applications.mobilenet.MobileNet(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

You can then choose to make the pre-trained weights trainable or not.

But you should most of the time train the fully connected layer you added at first before thinking of the pre-trained layers. 
This is an optimized pre-trained model that can extract relevant features from your images. 
So at first you should just try to train the fully connected layer, it will aim to see if by combining these features it can predict the right classes or not.

To do so:

In [None]:
for layer in mobile_net.layers:
    layer.trainable=False

Then, you can add your fully connected layer:

In [None]:
def init_mobilenet_model(mobile_net, n_classes):

    mobile_net_1 = keras.Sequential([
        mobile_net, # The MobileNet model (without the top, because of the 'include_top=False' we provided in its definition)
        # A pooling layer that will return a flatten vector
        keras.layers.GlobalAveragePooling2D(),
        # Fully connected layer with dropouts
        keras.layers.Dense(1024, activation='relu'),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(512, activation='relu'),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(n_classes, activation='softmax')
    ])
    return mobile_net_1

I've trained 3 models:

The first one, with only the fully connected layer trainable for 100 epochs.
The second one is based on the first but letting some of the MobileNet layers trainable for another 20 epochs.
And the last one is based on the second but adding Data Augmentation for 50 epochs.

Here are their names:

* MobileNet_224-224_100e
* MobileNet_224-224_120e
* MobileNet_224-224_120e_DA-50e

If you want to let some pre-trained layers being trainable you juste have to specify it before you initialize the model:

In [None]:
mobile_net = tf.keras.applications.mobilenet.MobileNet(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

trainable_index = 61
for layer in mobile_net.layers[:trainable_index]:
    layer.trainable=True
for layer in mobile_net.layers[trainable_index:]:
    layer.trainable=False
    
model = init_mobilenet_model(mobile_net, n_classes)

### iii) Datasets

Now we're going to see how to create training and validation sets.

As the GTSRB dataset is already structured for learning we're going to use `tf.keras.utils.image_dataset_from_directory`.

This function creates a set just by using the folder structure.

The `Train` folder contains a list of folders corresponding to each class where images are stored:

```
gtsrb
│ 
├── Train
│   ├── 0
│   │   ├── 00000_00000_00000.png
│   │   ├── 00000_00000_00001.png
│   │   ├── ...
│   │   └── 00000_00006_00029.png
│   ├── 1
│   ├── 2
│   ├── ...
│   └── 42
├── Test
│   ├── 00000.png
│   ├── 00001.png
│   ├── ...
│   └── 12629.png
├── Meta
│   ├── 0.png
│   ├── 1.png
│   ├── ...
│   └── 42.png
├── GT-final_test.csv
├── Meta.csv
├── Test.csv
└── Train.csv
```

(I moved the `GT-final_test.csv` file from the Test folder in order to handle easily the test part)

It's important to keep class names in the order you want to not be confused with predictions classes later.

In [None]:
if gtsrb_exists:
    data_dir = "data/gtsrb/dataset/"
    train_data_dir = data_dir+"Train/"
    test_data_dir = data_dir+"Test/"

    class_names = os.listdir(train_data_dir)
    n_classes = len(class_names)
    class_names_int = sorted(list(map(lambda x: int(x), class_names)))
else:
    class_names = None

Then, you just have to give the folderpath of your dataset and choose some parameters:

In [None]:
if gtsrb_exists:
    train_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Train/",
        label_mode='int', # labels are encoded as integers
        class_names=class_names, # list of class names (they've to match the names of subdirectories)
        batch_size=batch_size, # batch_size
        image_size=(img_height, img_width), # size to resize images to after they are read from disk
        seed=seed, # seed, to ensure reproductibility (~ random_state)
        validation_split=0.2, # percentage of data you want in your validation set
        subset="training", # for training set creation
        crop_to_aspect_ratio=False # if True, resize the images without aspect ratio distortion
    )

    val_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Train/",
        label_mode='int', # labels are encoded as integers
        class_names=class_names, # list of class names (they've to match the names of subdirectories)
        batch_size=batch_size, # batch_size
        image_size=(img_height, img_width), # size to resize images to after they are read from disk
        seed=seed, # seed, to ensure reproductibility (~ random_state)
        validation_split=0.2, # percentage of data you want in your validation set
        subset="validation", # for validation set creation
        crop_to_aspect_ratio=False # if True, resize the images without aspect ratio distortion
    )

The split will be valid only if you give the same seed to both calls.
Otherwise, you may find the same data in both sets and not find some either.

And if you prefer, you can also create them this way:

In [None]:
if gtsrb_exists:
    train_ds, val_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Train/",
        label_mode='int', # labels are encoded as integers
        class_names=class_names, # list of class names (they've to match the names of subdirectories)
        batch_size=batch_size, # batch_size
        image_size=(img_height, img_width), # size to resize images to after they are read from disk
        seed=seed, # seed, to ensure reproductibility (~ random_state)
        validation_split=0.2, # percentage of data you want in your validation set
        subset="both", # for training and validation sets creation
        crop_to_aspect_ratio=False # if True, resize the images without aspect ratio distortion
    )

(For more details you can take a look here: https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory)

That's it! Let's see how to train a model now.

## 2. Training

Training the models is as simple as calling the `compile` and `fit` methods.

But there are a lot of parameters you can tune that can have a strong effect on the quality of the training.

### i) compilation

When you compile a model you configure it for training.

You have to choose an optimizer, which is an algorithm that optimizes model weights to minimize the loss function during the training.

Here are the most used ones:

* SGD (Stochastic Gradient Descent)
* AdaGrad (Adaptive Gradient Descent)
* RMS-Prop (Root Mean Square Propagation)
* AdaDelta
* Adam (Adaptive Moment Estimation)
* Nadam (Adam with Nesterov momentum)
* Ftrl (Follow The Regularized Leader)

Here's the corresponding string list: `'sgd'`, `'adagrad'`, `'rmsprop'`, `'adadelta'`, `'adam'`, `'nadam'`, `'ftrl'`.

<br>

Also, you have to choose the loss, which is the way you want the errors to be computed and will be optimized (often minimized) by the optimizer.

This choice depends on your task, for example if it's a:

* **Regression task:** 
    + Mean Squared Error
    + Mean Absolute Error
    + Log-Cosh Loss
* **Binary classification:**
    + Binary Cross-Entropy
    + Hinge Loss
    + Squared Hinge Loss
* **Multi-class Classification (which is our case):**
    + Categorical Cross-Entropy
    + Sparse Categorical Cross-Entropy
    + Kullback-Leibler Divergence
    
Here's the corresponding string list: `'mean_squared_error'`, `'mean_absolute_error'`, `'logcosh'`, `'binary_crossentropy'`, `'hinge'`, `'squared_hinge'`, `'categorical_crossentropy'`, `'sparse_categorical_crossentropy'`, `'kullback_leibler_divergence'`.

<br>

Then, you have to choose the metric. Which is the way you want the performance of the model to be computed.

The most known and used are: 

* **Classification:**
    + `'accuracy'`
    + `'precision'`
    + `'recall'`
    + `'f1_score'`
* **Regression:**
    + `'mean_squared_error'`
    + `'mean_absolute_error'`.

We will use this configuration:

In [None]:
model.compile(
    loss="sparse_categorical_crossentropy", 
    optimizer="nadam",
    metrics=["accuracy"]
)

### ii) fit

When you call the fit function you basically train your model.

There are interesting parameters you can tune:

* **epochs**: times number the model will pass through all data
* **callbacks**: methods that allow us to stop and/ or save the model according to some conditions
* **use_multiprocessing**: if you want to accelerate the training using multiprocessing
* **batch_size** (we won't specify it because we've already done it in the training and validation datasets)

And a lot of others, you can check them here: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

Here's the callback method we're going to use:

In [None]:
models_path = "models/classifier/"
model_name = "MobileNet_224-224"
callbacks = [
    tf.keras.callbacks.ModelCheckpoint(
            models_path+model_name+'/checkpoint',
            monitor="val_accuracy",
            save_weights_only=True,
            save_best_only=True
    )
]

It allows us to save the model weights each time it performs better.

And now we can train our model:

In [None]:
epochs = 10

if gtsrb_exists:
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=epochs,
        callbacks=callbacks,
        use_multiprocessing=True
    )

You can save the weights of the model by using:

In [None]:
model.save_weights(f"models/classifier/{model_name}/model_weights")

In order to load and use the model later you'll have to:
    
* initialize the model:

In [None]:
model = init_mobilenet_model(mobile_net, n_classes)

* load weights: 

In [None]:
model.load_weights(f"models/classifier/{model_name}/model_weights")

* compile the model: 

In [None]:
model.compile(
    loss="sparse_categorical_crossentropy", 
    optimizer="nadam",
    metrics=["accuracy"]
)

Don't forget the compilation, otherwise it won't work.

Let's come back to the training part:

The `fit` function returns a History object containing training loss and accuracy values.

By the way, you can save it this way:

In [None]:
if gtsrb_exists:
    json.dump(history.history, open(f"models/classifier/{model_name}/history.json", 'w'))

Let's see it for the trained models:

In [None]:
models_training_folderpath = "imgs/models_training/"

*CNN 30x30: 50 epochs*

In [None]:
model_name = "CNN1_30-30_50e"
plots.plot_model_training("image", model_name,
    image_folderpath=models_training_folderpath)

For each model, there are the loss evolution on the left and the accuracy evolution on the right.

You can see here that the loss decreases quickly close to zero. 
The same thing happens for accuracy: the values increase close to 1 within several epochs (which is what we want).

Something important you can notice it is that validation loss and validation accuracy is mostly better than the train ones.
Usually you would expect the opposite happening. 
The reason is that we've used dropout layers in these models, and they're active only during the training step.
So it's obviously harder for the model to be as accurate during the training step than in the validation one.
And you can see it easily by evaluating you model on you training set after the training: it'll have way better scores.

The same things will happen for the other standard CNNs.

*CNN 60x60: 50 epochs*

In [None]:
model_name = "CNN1_60-60_50e"
plots.plot_model_training("image", model_name, 
    image_folderpath=models_training_folderpath)

*CNN 90x90: 50 epochs*

In [None]:
model_name = "CNN1_90-90_50e"
plots.plot_model_training("image", model_name, 
    image_folderpath=models_training_folderpath)

*MobileNet 224x224: 120 epochs*

In [None]:
model_name = "MobileNet_1_224-224_120e"
plots.plot_model_training("image", model_name, 
    image_folderpath=models_training_folderpath)

Here there are two models we can display in a single view: MobileNet 224x224: 100 epochs and MobileNet 224x224: 120 epochs.

The first one has been trained on the top layers (the non pre-trained part), the second one is based on the first and has been trained for 20 additional epochs on both top layers and some pre-trained layers.

We can see that MobileNet performs better when you allow it to adapt its pre-trained weights. 
But it takes more time to train because there are a lot of neurons/weights.
A good way to approach the training of a pre-trained model is to train it on the top layers and make some of the pre-trained layers trainable when the model already performs rather well or begins to overfit.

**Data Augmentation**

Data Augmentation allows you to add variability in your datasets.
You change your data with some filters or methods so that the model can see a larger range of possible data.
In fact this technique can really help to avoid overfitting.

You can see it as adding noise in your data so that your model will be forced to be more focus on "relevant" features in order to get the best predictions.

It's also really helpful when you have few data and/or few data in some classes.

In the following graphs you'll see that when you add Data Augmentation on your pre-trained model (the model you trained without DA), the loss and the accuracy get worse.
It's as if the model is learning all over again.
But if you've tuned well your DA, you could give to your model a better generalization ability.
That is, it'll be better on data it had never seen.

Here are the methods used to add DA in these models:
    
* **Rescaling** with the `1./255` value will just rescale data in the [0, 1] range instead of the [0, 255] one. This isn't a Data Augmentation layer.
* **RandomContrast** will randomly adjust the contrast of images according to a contrast factor. We set it at 0.1 which means a random constrast between -0.1 and 0.1.
* **RandomRotation** will randomy rotate images according to the factor. I've chosen 0.2 so that rotations will be randomly between around -72 and 72 degrees ($\pm 0.2 \times 2 \pi$). Which is surely a too large range (for traffic signs that can have symmetrical symbols).
* **RandomZoom** will zoom randomly in or out images. Here the factor is 0.1 which will zoom in within the [-0.1, 0.1] range (or [-10%, +10%]).
* **RandomTranslation** will randomly translate images heightwise and widthwise. Here, with 0.1 for height and 0.1 for width, we will have images translated heightwise and widthwise within the [-10%, +10%] range.
* **Resizing** isn't a data augmentation layer here, because we just want to be sure the outputs images keep the right dimensions.

The more you will add variability the mode difficult it will be for your model to learn.
So you have to test it and try to find the right balance.
Not enough variability won't have any effect, and too much could confuse or get the model lost.

And here's how to add it in your code:

In [None]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.Rescaling(1./255),
    tf.keras.layers.RandomContrast(0.1),
    tf.keras.layers.RandomRotation(0.2),
    tf.keras.layers.RandomZoom(0.1),
    tf.keras.layers.RandomTranslation(0.1, 0.1),
    tf.keras.layers.Resizing(img_height, img_width),
])

Then, you just have to add it at the begining of your model initialization:

In [None]:
def init_CNN_model_with_da(batch_size, img_height, img_width, n_classes):
    
    data_augmentation = tf.keras.Sequential([
        tf.keras.layers.Rescaling(1./255),
        tf.keras.layers.RandomContrast(0.1),
        tf.keras.layers.RandomRotation(0.2),
        tf.keras.layers.RandomZoom(0.1),
        tf.keras.layers.RandomTranslation(0.1, 0.1),
        tf.keras.layers.Resizing(img_height, img_width),
    ])
    
    model = tf.keras.Sequential([
        # Data Augmentation
        data_augmentation,
        # First conv layer: 64 filters with size 7x7
        tf.keras.layers.Conv2D(filters=64, kernel_size=7, padding="same", activation="relu", kernel_initializer="he_normal",
                              input_shape=(batch_size, img_height, img_width, 3)), # input shape
        tf.keras.layers.MaxPool2D(),
        # Second layer: 2 conv. layers with 128 filters with size 3x3 
        tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.MaxPool2D(),
        # Third layer: 2 conv. with 256 filters with size 3x3 
        tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.MaxPool2D(),
        # Flatten layer: convert the features map into a vector
        tf.keras.layers.Flatten(),
        # The fully connected layer (128->64->n_classes) try to classify features into the classes we want
        tf.keras.layers.Dense(units=128, activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(units=64, activation="relu", kernel_initializer="he_normal"),
        tf.keras.layers.Dropout(0.5),
        # The output layer: has to be as unit as the class number we have
        tf.keras.layers.Dense(units=n_classes, activation="softmax")
    ])
    return model

In [None]:
def init_mobilenet_model_with_da(mobile_net, img_height, img_width, n_classes):

    data_augmentation = tf.keras.Sequential([
        tf.keras.layers.Rescaling(1./255),
        tf.keras.layers.RandomContrast(0.1),
        tf.keras.layers.RandomRotation(0.2),
        tf.keras.layers.RandomZoom(0.1),
        tf.keras.layers.RandomTranslation(0.1, 0.1),
        tf.keras.layers.Resizing(img_height, img_width),
    ])
    
    mobile_net_1 = keras.Sequential([
        # Data Augmentation
        data_augmentation,
        # The MobileNet model (without the top, because of the 'include_top=False' we provided in its definition)
        mobile_net, 
        # A pooling layer that will return a flatten vector
        keras.layers.GlobalAveragePooling2D(),
        # Fully connected layer with dropouts
        keras.layers.Dense(1024, activation='relu'),
        keras.layers.Dropout(0.5),
        keras.layers.Dense(512, activation='relu'),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(n_classes, activation='softmax')
    ])
    return model

If you've already trained your model and you want to continue the training, you just have to initialize your model as above and then load weights.

Let's see what's happening on the trainings:

*CNN 30x30: 50 epochs + DA 50 epochs*

In [None]:
model_name = "CNN1_30-30_50e_DA-50e"
plots.plot_model_training("image", model_name, 
    image_folderpath=models_training_folderpath)

As you can see, from the moment you start the Data Augmentation, the model performs worse and then try to learn again how to classify the images.
It seems the model is worse, but it could have a better generalization ability and we could be sure of that only at the test evaluation step.
Maybe here you could have tried with more epochs.

You'll see the same phenomenon again with the other models.

*CNN 60x60: 50 epochs + DA 50 epochs*

In [None]:
model_name = "CNN1_60-60_50e_DA-50e"
plots.plot_model_training("image", model_name, 
    image_folderpath=models_training_folderpath)

*CNN 90x90: 50 epochs + DA 30 epochs*

In [None]:
model_name = "CNN1_90-90_50e_DA-30e"
plots.plot_model_training("image", model_name, 
    image_folderpath=models_training_folderpath)

*MobileNets 224x224: 120 epochs + DA 50 epochs*

In [None]:
model_name = "MobileNet_1_224-224_120e_DA-50e"
plots.plot_model_training("image", model_name, 
    image_folderpath=models_training_folderpath)

Let's summarize the results on training and validation sets:

In [None]:
Image.open("imgs/global_accuracy_scores_train-val.png").convert("RGB")

First of all, we have great models! They have almost all an accuracy above $98\%$.

For the CNNs, we can see that adding Data Augmentation doesn't improve the accuracies on the validation set.
When we look at the training plots, it seems that we should have continued the training until the model stopped to learn. 
Increasing the dimension of images doesn't seem to help to improve the models. 

For MobileNets, the Data Augmentation has improved a bit the model. But the higher improvement comes to the moment we enabled some layers to be trainable.

* The best CNN model is the `CNN_30-30_50e` with $99.9\%$ and $99.4\%$ accuracy on training and validation sets respectively.
* For the MobileNets, the best is the `MobileNet_224-224_120e_DA-50e` with $99.9\%$ and $99.8\%$ accuracy.

## 3. Predictions

Now, let's see at the predictions of our models.

Even if the predictions are rather good here, we can anlayze where the model does mistakes.

To do so, we're going to use the **confusion matrix**.
The confusion matrix is a matrix showing predicted labels versus actual labels.
Each row represents an actual class and each column represents the predicted class.

A matrix element $m_{i, j}$ is the number of times data of class $i$ has been predicted as a class $j$.

Then you can see what types of errors your model does and maybe try to improve it according to them.

But before, let's see how to get predictions:

* You can get them by giving it images you loaded of course, but if they're a lot it might be too much memory for your computer.

   Let's say you've loaded a set of images with PIL or cv2:
   * resize them for they match your model input size 
   * convert them into numpy arrays
   * and create a numpy array (X) containing them all, in the way its shape will be: (n_images, img_height, img_width, n_channels)
       
   Finally you'll just have to run: 

In [None]:
#model.predict(X)

* But here, because of the large number of images, I'll use the dataset I've created for training with the `tf.keras.utils.image_dataset_from_directory` function.

    Then we have to be carefull and load these datasets each time we would get predictions.
    
    Here is the way I choose in order to get predictions (that's what you'll find in the `get_y_predictions` function later):

In [None]:
if gtsrb_exists:
    y, y_probas, y_pred = [], [], []
    ## loop on batches
    for features, labels in train_ds:
        # batch of labels
        batch_labels = labels.numpy().tolist()
        y.extend(batch_labels) # add to y list, which is the actual labels
        # batch of images
        X = features.numpy()
        # predict 
        y_proba = model.predict(X)
        y_probas.append(y_proba) # add predictions that are probability for each class
        # best probabilities
        best_pred_indexes = y_proba.argmax(axis=1) 
        # classes predictions
        best_pred_classes = np.array(class_names)[best_pred_indexes]
        y_pred.append(best_pred_classes) # add classes predictions

Check this function for more details.

In [None]:
if gtsrb_exists:
    y, y_proba, y_pred = calculations.get_y_predictions(model, train_ds, class_names)
else:
    y, y_proba, y_pred = None, None, None

Don't forget to load dataset this way each time you want to do other predictions on it (for example with another model, or to get specific predictions, do visualizations, etc.):

In [None]:
#train_ds = tf.keras.utils.image_dataset_from_directory(
#    "data/gtsrb/Train/",
#    label_mode='int',
#    class_names=class_names,
#    batch_size=batch_size,
#    image_size=(img_height, img_width),
#    seed=seed,
#    validation_split=0.2,
#    subset="training",
#    crop_to_aspect_ratio=False
#)

Now, let's look at the models results:

### 1) `CNN_30-30_50e`

In [None]:
if gtsrb_exists:
    batch_size = 32
    img_height, img_width = 30, 30
    
    train_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Train/",
        label_mode='int',
        class_names=class_names,
        batch_size=batch_size,
        image_size=(img_height, img_width),
        seed=seed,
        validation_split=0.2,
        subset="training",
        crop_to_aspect_ratio=False
    )
    val_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Train/",
        label_mode='int',
        class_names=class_names,
        batch_size=batch_size,
        image_size=(img_height, img_width),
        seed=seed,
        validation_split=0.2,
        subset="validation",
        crop_to_aspect_ratio=False
    )
    
    y_train, y_train_proba, y_train_pred = calculations.get_y_predictions(model, train_ds, class_names)
    train_conf_mat = calculations.get_confusion_matrix(y_train, y_train_pred)
    
    y_val, y_val_proba, y_val_pred = calculations.get_y_predictions(model, val_ds, class_names)
    val_conf_mat = calculations.get_confusion_matrix(y_val, y_val_pred)
else:
    train_conf_mat, val_conf_mat = None, None

Here's the confusion matrix of the `CNN_30-30_50e` on the training set:

In [None]:
fig = plots.plot_confusion_matrix_type(
    "confusion", "train", class_names, 
    conf_mat=train_conf_mat, 
    output_folderpath="imgs/CNN_30-30_50e/")
fig

So as you can guess, a good model will have high value on the diagonal, because diagonal values are the right predicted labels.

Here we can't really see at the errors because of the colorscale. So a good way to visualize them is to divided each row (actual label) by its sum (the number of predictions for it), and remove all diagonal values. So we get prediction errors proportions (or percentages by multipling by 100) belonging to each actual class.

Let's see it:

In [None]:
fig = plots.plot_confusion_matrix_type(
    "errors", "train", class_names, 
    conf_mat=train_conf_mat, 
    output_folderpath="imgs/CNN_30-30_50e/")
fig

Here's why it was hard to see errors: there are few of them. 
The higher error is around 0.5%, which is very low (but keep in mind that we're looking at the training set results).

Let's analyze the errors:

- 0.50% of class n°42 images (*end no overtaking by heavy goods vehicles*) have been predicted as class n°6 (*end speed limit 80*). 

Let's see random images from these classes:

In [None]:
plots.plot_class_confusion(60, 10, 6, "42", "6", "data/gtsrb/Train/", 
    output="", set_type="train", random_state=123, gtsrb_exists=gtsrb_exists)

Of course these classes have the stripes in common that could be a reason why some confusions happened.

- 0.46% of class n°29 images (*cyclists*) have been predicted as class n°26 (*traffic signals*). 

In [None]:
plots.plot_class_confusion(60, 10, 6, "29", "26", "data/gtsrb/Train/", 
    output="", set_type="train", random_state=123, gtsrb_exists=gtsrb_exists)

Here we could say that the poor quality of some images could lead to confusion.

- 0.46% of class n°29 images (*cyclists*) have been predicted as class n°31 (*wild animals*). 

In [None]:
plots.plot_class_confusion(60, 10, 6, "29", "31", "data/gtsrb/Train/", 
    output="", set_type="train", random_state=123, gtsrb_exists=gtsrb_exists)

Here, we could say that the symbols are a bit similar in a way.

Let's see the results on the validation set:

In [None]:
fig = plots.plot_confusion_matrix_type(
    "errors", "val", class_names, 
    conf_mat=val_conf_mat, 
    output_folderpath="imgs/CNN_30-30_50e/")
fig

Errors percentages are higher, but still low.

There are others types of errors here, let's look at the bigest:

* 4.54% of class n°27 images (*pedestrians*) have been predicted as class n°11 (*crossroads minor road*).

In [None]:
plots.plot_class_confusion(60, 10, 6, "27", "11", "data/gtsrb/Train/", 
    output="", set_type="val", random_state=123, gtsrb_exists=gtsrb_exists)

The confusion is easily understandable: same sign shape and symbols really similar (considering the dimension reduction at 30 by 30 pixels)
    
* 4.17% of class n°0 images (*speed limit 20*) have been predicted as class n°1 (*speed limit 30*).

In [None]:
plots.plot_class_confusion(60, 10, 6, "0", "1", "data/gtsrb/Train/", 
    output="", set_type="val", random_state=123, gtsrb_exists=gtsrb_exists)

Here the only change is the digits 2 becoming 3, which are close shapes and even more when you have low size/dimensions images.

### 2) `MobileNet_224-224_120e_DA-50e`

In [None]:
if gtsrb_exists:
    batch_size = 32
    img_height, img_width = 224, 224
    
    train_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Train/",
        label_mode='int',
        class_names=class_names,
        batch_size=batch_size,
        image_size=(img_height, img_width),
        seed=seed,
        validation_split=0.2,
        subset="training",
        crop_to_aspect_ratio=False
    )
    val_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Train/",
        label_mode='int',
        class_names=class_names,
        batch_size=batch_size,
        image_size=(img_height, img_width),
        seed=seed,
        validation_split=0.2,
        subset="validation",
        crop_to_aspect_ratio=False
    )
    
    y_train, y_train_proba, y_train_pred = calculations.get_y_predictions(model, train_ds, class_names)
    train_conf_mat = calculations.get_confusion_matrix(y_train, y_train_pred)
    
    y_val, y_val_proba, y_val_pred = calculations.get_y_predictions(model, val_ds, class_names)
    val_conf_mat = calculations.get_confusion_matrix(y_val, y_val_pred)
else:
    train_conf_mat, val_conf_mat = None, None

Here's the errors matrix of the `MobileNet_224-224_120e_DA-50e` model, on the training set:

In [None]:
fig = plots.plot_confusion_matrix_type(
    "errors", "train", class_names, 
    conf_mat=train_conf_mat, 
    output_folderpath="imgs/MobileNet_224-224_120e_DA-50e/")
fig

There are few confusions, and the greatest is around 1.85% with the class n°29 (*cyclists*) predicted as class n°13 (*give way*):

In [None]:
plots.plot_class_confusion(60, 10, 6, "29", "13", "data/gtsrb/Train/", 
    output="", set_type="train", random_state=123, gtsrb_exists=gtsrb_exists)

This confusion isn't obvious because the sign shapes are inverted and the first has a symbol unlike the second which is empty.

Now, the error matrix on the validation set:

In [None]:
fig = plots.plot_confusion_matrix_type(
    "errors", "val", class_names, 
    conf_mat=val_conf_mat, 
    output_folderpath="imgs/MobileNet_224-224_120e_DA-50e/")
fig

The errors are a bit higher but not as much as with the CNN model.

The highest one is 2.27% with class n°27 (*pedestrians*) predicted as a class n°18 (*other danger*).

In [None]:
plots.plot_class_confusion(60, 10, 6, "27", "18", "data/gtsrb/Train/", 
    output="", set_type="train", random_state=123, gtsrb_exists=gtsrb_exists)

We can guess that the confusion could come from the reshaped pedestrian image in a low resolution.
Maybe it could look like a bar and then an exclamation mark.

The second highest error is the class n°29 (*cyclists*) predicted as the class n°13 (*give way*). We've already seen this confusion with the training set results and it was the highest.

In [None]:
plots.plot_class_confusion(60, 10, 6, "29", "13", "data/gtsrb/Train/", 
    output="", set_type="val", random_state=123, gtsrb_exists=gtsrb_exists)

Of course, for more details you can see the specific images that have been wrongly predicted (you'll see it in the next part).

How to improve the model then ?

* An obvious way would be to train data with higher dimensions.
Here, our results are really good so it isn't required to continue.
But it could improve the model, with data augmentation for more epochs than I did, it could work.

* Another way could be to change the structure of the CNN models: adding layers, changing the neurons number, the fitting parameters like the optimizer and so on.

* Also, it could be to find and add new data in the training set.

* You could try others classification models too.

* An interesting one is to create an ensemble model. For example we could create an ensemble from the the CNN and the MobileNet we've just analyzed. This method can be really efficient and improve the accuracy of the model if the models do different type of errors. Each model of an ensemble will give a prediction and then a hard or soft vote will be processed in order to have the final prediction.
You can look in this notebook that illustrates what you can find in a chapter of the book *Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow* by Aurélien Géron: https://github.com/ageron/handson-ml3/blob/main/07_ensemble_learning_and_random_forests.ipynb

## 4. Test set and model selection

The GTSRB dataset contains a Test directory with 12630 images.

With the `Test.csv` file, you can find their actual classes.

Let's load it:

In [None]:
if gtsrb_exists:
    test_metadata = pd.read_csv(data_dir+"Test.csv")

In order to load these images as we did with training and validation sets we need to put the images in subfolders corresponding to their classes.

I coded a function for that, you can find it in the `calculations.py` file.

In [None]:
delete = False

I set the `delete` value to `False`, in that case you'll have to delete all the files that are not in a class subfolder (in the Test folder).
With `delete=False`, the function copies files in these subfolders but doesn't delete them.

If `delete=True` these files will be deleted when copied (in other words, files will be moved).

In [None]:
if gtsrb_exists:
    _ = calculations.load_test_dataset_labels_and_create_folders(
        test_metadata, 
        test_folderpath="data/gtsrb/Test/", 
        new_data_location="data/gtsrb/Test/",
        delete=delete
    )

Then you can load test dataset:

In [None]:
if gtsrb_exists:
    test_ds = tf.keras.utils.image_dataset_from_directory(
        "data/gtsrb/Test/",
        label_mode='int', # labels are encoded as integers
        class_names=class_names, # list of class names (they've to match the names of subdirectories)
        batch_size=batch_size, # batch_size
        image_size=(img_height, img_width), # size to resize images to after they are read from disk
        seed=seed, # seed, to ensure reproductibility (~ random_state)
        crop_to_aspect_ratio=False # if True, resize the images without aspect ratio distortion
    )

As we have to load test data each time we want to get predictions, you can write this that will enable us to load it each time with the same default parameters:

In [None]:
if gtsrb_exists:
    batch_size = 32
    img_height, img_width = 224, 224
    
    load_test_ds = partial(
        tf.keras.utils.image_dataset_from_directory, ## the function
        label_mode='int',                            ## the default parameters
        class_names=class_names,                     #
        color_mode=color_mode,                       #
        batch_size=batch_size,                       #
        image_size=(img_height, img_width),          #
        shuffle=True,                                #
        seed=seed,                                   #
        crop_to_aspect_ratio=False                   #
    )
else:
    load_test_ds = None

So in order to just load the test data before doing predictions, we'll just call it this way:

In [None]:
if gtsrb_exists:
    test_ds = load_test_ds("data/gtsrb/Test/")
    
    y_test, y_test_proba, y_test_pred = calculations.get_y_predictions(model, test_ds, class_names)
    test_conf_mat = calculations.get_confusion_matrix(y_test, y_test_pred)
else:
    test_conf_mat = None

Let's come back to the results:

Here are the results on the test set:

In [None]:
Image.open("imgs/global_accuracy_scores_train-val-test.png").convert("RGB")

The accuracies aren't as good as on training and validation sets but still.

Again, the models using data augmentation don't perform better than the "originals", except for the MobileNet one.

Note that now, the CNN models perform better when they use higher images dimensions.
The best CNN model is then the `CNN_90-90-50e` with 97.0% accuracy.

But the `MobiletNet_224-224_DA-50e` is even better with 98.0% accuracy.

Let's look at its results on this test set:

First, the confusion matrix:

In [None]:
fig = plots.plot_confusion_matrix_type(
    "confusion", "test", class_names, 
    conf_mat=test_conf_mat, 
    output_folderpath="imgs/MobileNet_224-224_120e_DA-50e/")
fig

And the error matrix:

In [None]:
fig = plots.plot_confusion_matrix_type(
    "errors", "test", class_names, 
    conf_mat=test_conf_mat, 
    output_folderpath="imgs/MobileNet_224-224_120e_DA-50e/")
fig

There's a big error here: 19.17% of class n°22 (*uneven surface*) images have been predicted as class n°25 (*roadworks*).

In [None]:
plots.plot_class_confusion(60, 10, 6, "22", "25", "data/gtsrb/Test/", 
    output="", set_type="test", random_state=123, gtsrb_exists=gtsrb_exists)

These symbols aren't similar but they have the right *"bump"* in common.
Also we could guess that some shades on *uneven surface* signs could confuse the model and make it think that there's a person on the left like in *roadworks*.

Let's see the images causing the errors:

In [None]:
batch_size = 32
actual, predicted = 22, 25
plots.plot_wrong_predictions(
    load_test_ds, "data/gtsrb/Test/", 
    f"imgs/confusions/MN224-224-120da50_classes-{actual}-{predicted}.png", 
    y, y_pred, actual, predicted, batch_size, 
    gtsrb_exists=gtsrb_exists)

These images contains shades and trees background. There might be some problems with that. So maybe Data Augmentation or adding more data could help to fix these errors.

Another error that has been done 9.33% of the time is the class n°30 (*ice snow*) predicted as class n°28 (*children*).

In [None]:
plots.plot_class_confusion(60, 10, 6, "30", "28", "data/gtsrb/Test/", 
    output="", set_type="test", random_state=123, gtsrb_exists=gtsrb_exists)

Here the only reason I can think about is that some snow ice images have a poor quality.

In [None]:
batch_size = 32
actual, predicted = 30, 28
plots.plot_wrong_predictions(
    load_test_ds, "data/gtsrb/Test/", 
    f"imgs/confusions/MN224-224-120da50_classes-{actual}-{predicted}.png", 
    y, y_pred, actual, predicted, batch_size, 
    gtsrb_exists=gtsrb_exists)

These errors seem to happen when sings are too bright or overexposed with a really white background. It also seems to cause damages on symbols.

And, we're going to end with the 6.11% errors done for class n°17 (*no entry*) predicted as class n°12 (*priority road*).

In [None]:
plots.plot_class_confusion(60, 10, 6, "17", "12", "data/gtsrb/Test/", 
    output="", set_type="test", random_state=123, gtsrb_exists=gtsrb_exists)

Here again, the signs aren't similar at all, so the only reason seems to be poor quality images.

In [None]:
batch_size = 32
actual, predicted = 17, 12
plots.plot_wrong_predictions(
    load_test_ds, "data/gtsrb/Test/", 
    f"imgs/confusions/MN224-224-120da50_classes-{actual}-{predicted}.png", 
    y, y_pred, actual, predicted, batch_size, 
    gtsrb_exists=gtsrb_exists)

These images are unsaturated, this might be the reason why there are confusion.

Again Data Augmentation or adding more data could help to avoid these errors.

Anyway! The `MobiletNet_224-224_DA-50e` model is doing great, the one giving the best results, so we'll keep it for the full process.