# Traffic signs detection and classification with Detecto and Tensorflow 

### Part 2 - *Object Detection*

All the functions and visualizations I used here can be found on my GitHub page: [https://github.com/alexisvannaire/GTSRB_detect-and-predict](https://github.com/alexisvannaire/GTSRB_detect-and-predict)

See the first part (1_detect-and-predict) if you need details on the packages I used.

**For this notebook:**

In [None]:
# libraries
from PIL import Image
from IPython import display

import process_data # process_data.py

In [None]:
# False for default visualizations and computations
update_xml_filepaths = False # True if you want to update the paths in the xml files

# III. Object Detection Model

## 1. Models and preprocessing

The best models for this task are **R-CNN** (Region Based CNN) and its derivatives.

The R-CNNs aim to extract ROIs (Regions of Interest) from an image, where each one is a rectangle corresponding to the boundary of a potential object. Then, each ROI is passed through an usually pre-trained CNN in order to extract features. The feature vector is fed into a set of SVM (Support Vector Machine) classifiers, one for each object class, and they determine whether a specific object is present or not.

Its derivatives are Fast R-CNN, Faster R-CNN,YOLO, SSD, RetinaNet, EfficientDet and MobileNets with SSD. 
And as you can guess from their names, these models try to improve the speed of object detection but also the accuracy compared to the standard R-CNN.

The Detecto package provide a pre-trained *Faster R-CNN ResNet-50 FPN* from PyTorch's model zoo (you can find here: https://pytorch.org/serve/model_zoo.html). 
It allows us to easily build our first object detection model on a custom dataset and see how it works. 

The more images you have the better the model will be.
A common suggestion is to label at least one hundred images per class.
With theses 92 images randomly cropped 3 times each, we get a 276 images dataset (with likely more than one hundred traffic signs).
That's okay for an introduction but for a more professional application you would use more labeled images, like around 400-500 images.
But keep in mind that labeling images takes time!

Here's how to proceed:

1. create your own dataset (manually, with web scraping, from a video, or with an existing dataset)
2. label the images
3. train your model

Of course you can tune some hyperparameters if you want to train several models.
Also, you can get deeper by training models directly with tensorflow or pytorch libraries.

This dataset comes from a video I recorded and split.
Let's see how to label each image.

We're going to use `labelImg` (https://github.com/HumanSignal/labelImg).

Install the package, for example with:

In [None]:
#pip install labelImg

Then, in your conda prompt run it as follows:

In [None]:
#labelImg

A windows will open and you'll have to choose your dataset directory.

In [None]:
Image.open("imgs/labelImg_open_dir.png").convert("RGB")

You'll have your images appearing and just click on `Create RectBox` to label them (or with a shortcut, mine was `W`).

Label the object you want your model to detect by creating a rectangle around it and then associate the class name you want (note all your classes somewhere, you'll need to specify them before training the model).

In [None]:
Image.open("imgs/labelImg_label_imgs.png").convert("RGB")

When you've finished to label an image, juste save it and it'll create a `.xml` file with the same name. Of course if an image doesn't contain any object you want to be detected, just pass to the next one (it just won't create a file).

---

**Note:** If you want to use the dataset I created:
    
I wrote a python function that modifies the xml files in order to make it works. 

When you create a xml with labelImg and save it, the path will be written in each file and then depends on your computer. If you open any xml file you'll see it at the begining:

`<path>project_path\data\google_maps_test\labeled_imgs\0-0.xml</path>`

So if you want to use it, you can either change the "project_path" part into the absolute path where you placed it, e.g.: 

`<path>C:\Users\your_user_name\Documents\project_path\data\google_maps_test\labeled_imgs\0-0.xml</path>`

Or, run the `update_xml_paths` function I created in the `process_data.py` file.

In [None]:
if update_xml_filepaths:
    process_data.update_xml_paths()
    print("xml paths updated.")
else:
    print("xml paths not updated.")

## 2. Training

I didn't tune the 3 models I trained. I just focused on right labeling and it took times! But feel free to try some other epcohs number, learning rates, custom transformations and even try with tensorflow or pytorch models.

For the first model, I labeled two classes: `traffic_signs` and `vehicles` and train the model for 10 epochs.

For the second one, I just added one class: `google_maps` because of some google maps icons you can find at some places. And deleted some labels on traffic signs that might be to small and that I didn't want to be predicted by the model (thinking it would bring confusion with some other small objects). I trained it for 10 epochs too.

And the third was just like the second one but with 20 epochs.

Here's the few lines you have to write in order to train your model:

* Load your dataset just by giving the folderpath:

In [None]:
#dataset = core.Dataset('data/google_maps_test/labeled_imgs/')

* Initialize your model by giving the class names you gave during the labeling step:

In [None]:
#model = core.Model(['traffic_sign', 'vehicle'])

* Train the model:

In [None]:
#model.fit(dataset)

Despite the warning you can have if you don't have a GPU installed: 

`It looks like you're training your model on a CPU. Consider switching to a GPU; otherwise, this method can take hours upon hours or even days to finish.`

you can try anyway because it could be not that long. For example for my two first models trained for 10 epochs each, it lasted around 3h 45 min.

<br> 

Don't forget to save your model in order to be able to use it later and do predictions:

In [None]:
#model.save('models/object_detection/model.pth')

After, you could load it this way:

In [None]:
#model = core.Model.load('models/object_detection/model.pth', ['traffic_sign', 'vehicle'])

## 3. Predictions

We have one or several trained models and now we want to see how the predictions are.

First, the model predictions from an image look like this: 

* a list of classes
* a list of associated boxes coordinates (xmin, ymin, xmax, and ymax)
* a list of associated probabilities

Here's how you can get the predictions:

In [None]:
#labels, boxes, scores = model.predict(image)

Depending of what you're interested in, you can filter the predictions from the criteria you want.

For example: you could want to just extract a single class, with a probability above 95% and also with predictions box sizes that aren't below 30px in width and height.

(Note that the list of boxes and probabilities are `torch.Tensor` objects, if you want to work with numpy you just have to apply `.numpy()` to convert it)

I've trained 3 models in order to improve the predictions of the previous one.
It worked a bit because of relabeling some too small images, also by training for more epochs.
But the best way to improve it would surely be to label more images.

Let's take some images and look at the predictions.

The third model without any filter gave these predictions:

In [None]:
display.Image("imgs/start_video2.gif")

There are a lot, and most of them - especially for traffic signs - aren't good ones.

So I've selected only the `traffic_sign` class and tried with several probability thresholds.

The best one was to get probability above 95%.
I still have false predictions but way less and I don't lose really obvious ones (which happened with a 99% threshold).

Here's the predictions with the 95% filter:

In [None]:
display.Image("imgs/start_video3.gif")

We can see some false predictions like triangle shaped roofs. 
But fortunately there aren't the majority!
Also, we could guess that the classifier we're going to train in the next part wouldn't give it a high probability and we could just filter them.

Globaly, all the traffic signs have been detected.

To create labeled images with the red rectangles with class names and the associate probability, see the `labeled_image_with_object_detection_predictions` function in the `plots.py` file.

To create gif from a set of images: `make_gif` in the `plots.py` file.