<h1> Visualization YOLO efficiency </h1>

Be sure to have run the following commands before running this notebook:

In [None]:
! python pre_process_lisa.py

Also be sure to have the following folder architecture:
- LISA_TS/
    - aiua120214-0/
    - aiua120214-1/
    - ...
    - readme.txt
    - videoSources.txt
    
    
- LISA_TS_extension/
    - 2014-04-24_10-59/
    - 2014-04-24_11-43/
    - ...
    - 2014-07-11_13-47/
    - allTrainingAnnotations.csv
    
    
- weights/
    - trained_weights_final.h5

(To obtain *trained_weights_final.h5* either:
- train the network by running *python train.py* (follow the readme instructions for more information, it needs a GPU and takes ~2 days for 150 epochs)
- or ask Louis for the last trained weights
)

<h2> Utils definition </h2>

In [124]:
import os
from tqdm import tqdm_notebook as tqdm
import numpy as np
import pandas as pd
from keras import backend as K
from PIL import Image, ImageFont, ImageDraw
from timeit import default_timer as timer

from yolo import YOLO
from yolo3.utils import letterbox_image
from yolo3.model import yolo_eval

In [125]:
class YOLOPlus(YOLO):
    def __init__(self, **kwargs):
        super(YOLOPlus, self).__init__(**kwargs)

    def detect_image_plus(self, image):
        start = timer()

        if self.model_image_size != (None, None):
            assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
            assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
            boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))
        else:
            new_image_size = (image.width - (image.width % 32),
                              image.height - (image.height % 32))
            boxed_image = letterbox_image(image, new_image_size)
        image_data = np.array(boxed_image, dtype='float32')

        #print(image_data.shape)
        image_data /= 255.
        image_data = np.expand_dims(image_data, 0)  # Add batch dimension.

        out_boxes, out_scores, out_classes = self.sess.run(
            [self.boxes, self.scores, self.classes],
            feed_dict={
                self.yolo_model.input: image_data,
                self.input_image_shape: [image.size[1], image.size[0]],
                K.learning_phase(): 0
            })

        #print('Found {} boxes for {}'.format(len(out_boxes), 'img'))

        font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
                    size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
        thickness = (image.size[0] + image.size[1]) // 300

        for i, c in reversed(list(enumerate(out_classes))):
            predicted_class = self.class_names[c]
            box = out_boxes[i]
            score = out_scores[i]

            label = '{} {:.2f}'.format(predicted_class, score)
            draw = ImageDraw.Draw(image)
            label_size = draw.textsize(label, font)

            top, left, bottom, right = box
            top = max(0, np.floor(top + 0.5).astype('int32'))
            left = max(0, np.floor(left + 0.5).astype('int32'))
            bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
            right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
            #print(label, (left, top), (right, bottom))

            if top - label_size[1] >= 0:
                text_origin = np.array([left, top - label_size[1]])
            else:
                text_origin = np.array([left, top + 1])

            # My kingdom for a good redistributable image drawing library.
            for i in range(thickness):
                draw.rectangle(
                    [left + i, top + i, right - i, bottom - i],
                    outline=self.colors[c])
            draw.rectangle(
                [tuple(text_origin), tuple(text_origin + label_size)],
                fill=self.colors[c])
            draw.text(text_origin, label, fill=(0, 0, 0), font=font)
            del draw
            
            # for now only considers the first prediction because the LISA dataset
            # has a maximum of 1 labeled sign per image, /!\ need to change this
            return image, c, score, (left, top, right, bottom)  # x_min, y_min, x_max, y_max
        
        return image, None, None, None # (returns this if no sign is found)
        # end = timer()
        # print(end - start)
        # return image
        

In [126]:
def detect_img(img, yolo):
    try:
        image = Image.open(img)
    except:
        print('Open Error! Try again!')
        return
    else:
        r_image, label, score, box = yolo.detect_image_plus(image)
        #r_image.show()
    return r_image, label, score, box
    #yolo.close_session() x_min y_min x_max y_max

In [127]:
def IoU(boxA, boxB):
    # determine the (x, y)-coordinates of the intersection rectangle
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[2], boxB[2])
    yB = min(boxA[3], boxB[3])
 
    # compute the area of intersection rectangle
    interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
 
    # compute the area of both the prediction and ground-truth
    # rectangles
    boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
    boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
 
    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = interArea / float(boxAArea + boxBArea - interArea)
 
    # return the intersection over union value
    return iou

def prediction_ok(score, true_label, true_box, label, box):
    # Wrong prediction if there is no box predicted,
    # (/!\ might need to change this if considering the "no label" images)
    if score is None or label is None or box is None:
        return False
    
    if score < 0.3:
        return False
    
    if label != true_label:
        return False
    
    if IoU(true_box, box) < 0.5:
        return False
    
    return True

<h2> Accuracy computing </h2>

Load the yolo model

In [None]:
model_path = "weights/trained_weights_final.h5"
classes_path = "model_data/lisa_classes.txt"

yolo = YOLOPlus(model_path=model_path, classes_path=classes_path)

<h3> Accuracy on train + validation dataset </h3>

In [128]:
train_df = pd.read_csv("train_lisa.txt", sep=' |,', names=["file_path", "x_min", "y_min", "x_max", "y_max", "label"])
train_df.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,file_path,x_min,y_min,x_max,y_max,label
0,LISA_TS/aiua120214-0/frameAnnotations-DataLog0...,862,104,916,158,0
1,LISA_TS/aiua120214-0/frameAnnotations-DataLog0...,425,197,438,213,1
2,LISA_TS/aiua120214-0/frameAnnotations-DataLog0...,922,88,982,148,0
3,LISA_TS/aiua120214-0/frameAnnotations-DataLog0...,447,193,461,210,2
4,LISA_TS/aiua120214-0/frameAnnotations-DataLog0...,469,189,483,207,2


In [132]:
n_examples = 100
indices = np.random.choice(len(train_df), n_examples, replace=False)

n_good_predictions = 0
for index in tqdm(indices):
    input_path = train_df["file_path"][index]
    r_image, label, score, box = detect_img(input_path, yolo)

    true_box = (train_df["x_min"][index],
                train_df["y_min"][index],
                train_df["x_max"][index],
                train_df["y_max"][index])
    true_label = train_df["label"][index]

    n_good_predictions += prediction_ok(score, true_label, true_box, label, box)
    
    # *** Uncomment these 2 lines to print the wrong predictions ***
    # if not prediction_ok(score, true_label, true_box, label, box):
    #     r_image.show()

print("average accuracy (train): ", round((n_good_predictions * 100)/n_examples, 2), "%")

HBox(children=(IntProgress(value=0), HTML(value='')))

average accuracy:  74.0 %


In [138]:
print("average accuracy (train): ", round((n_good_predictions * 100)/n_examples, 2), "%")
# average accuracy:  79.89

average accuracy (train):  74.0 %


<h3> Accuracy on extension dataset </h3>

In [163]:
test_df = pd.read_csv("test_lisa.txt", sep=' |,', names=["file_path", "x_min", "y_min", "x_max", "y_max", "label"])
test_df.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,file_path,x_min,y_min,x_max,y_max,label
0,LISA_TS_extension\2014-04-24_10-59/frameAnnota...,1071,384,1116,432,36.0
1,LISA_TS_extension\2014-04-24_10-59/frameAnnota...,1129,367,1182,425,36.0
2,LISA_TS_extension\2014-04-24_10-59/frameAnnota...,691,429,728,467,18.0
3,LISA_TS_extension\2014-04-24_10-59/frameAnnota...,703,426,743,467,18.0
4,LISA_TS_extension\2014-04-24_10-59/frameAnnota...,720,417,764,466,18.0


In [177]:
n_examples = len(test_df)
indices = np.random.choice(len(test_df), n_examples, replace=False)

n_good_predictions = 0
for index in tqdm(indices):
    input_path = test_df["file_path"][index]
    r_image, label, score, box = detect_img(input_path, yolo)

    true_box = (test_df["x_min"][index],
                test_df["y_min"][index],
                test_df["x_max"][index],
                test_df["y_max"][index])
    true_label = test_df["label"][index]

    n_good_predictions += prediction_ok(score, true_label, true_box, label, box)
    
    # *** Uncomment these 2 lines to print the wrong predictions ***
    #if not prediction_ok(score, true_label, true_box, label, box):
    #    r_image.show()

print("average accuracy (test): ", round((n_good_predictions * 100)/n_examples, 2), "%")

HBox(children=(IntProgress(value=0, max=3672), HTML(value='')))

average accuracy (test):  48.56 %


In [179]:
print("average accuracy (test): ", round((n_good_predictions * 100)/n_examples, 2), "%")

average accuracy (test):  48.56 %


<h2> Results summary </h2>

| Metric | train + validation | test        |
|--------|--------------------|-------------|
|Accuracy| **74.0%**          | **48.56%**   |

- Accuracy on the train+validation dataset overall (the original LISA dataset): 74.0%
- Accuracy on the test dataset (the extension LISA dataset): 48.56%

<h2> Analysis & comments </h2>

The model is clearly overfitting the original dataset -> the data augmentation will help a lot, the extension dataset was not used for training so we can try training it on it as well and see the improvement of the accuracy (on the LISA dataset as well as on real world pictures).

The main reasons that lower the accuracy are the following:
- There are quite some pictures where there are two signs or more (but still only 1 is labelled), YOLO will detect all signs or only some of them, and in the accuracy computation, only the first detected sign is considered
- YOLO will sometimes say there is no sign on a picture that contains signs
- YOLO will sometimes identify the sign and its position right but predict a wrong close class (like speed limit 35 instead of speed limit 25)

Since there is no obvious way to compute the accuracy for a detection task, a custom function was used to say whether or not a prediction is considered good, it might need to be changed.

These are the criterions to consider a prediction good (Only the first prediction is considered, all others are ignored (there must a smarter way indeed...)):
- There must be at least 1 prediction
- The predicted label must be the same as the true label
- The confidence score must be greater than 0.3
- The Intersection over Union of the predicted and ground truth bounding boxes must be greater than 0.5

/!\ The accuracy was only computed for images that actually contain signs