###**Définition Yolo :**

YOLO (You Only Look Once)  est un algorithme de détection multi-objets extrêmement rapide qui utilise un réseau de neurones convolutif (CNN) pour détecter et identifier les objets.

###**Yolo V3 Architecture :**

![](https://drive.google.com/uc?id=1ZF9U7tt9YBd0Fb_krEweYdYK0xT48CAX)

**Single forward pass :**

In the below example, we see that our input image is divided into 13 x 13 grid cells.  
Now, let us understand what happens with taking just a single grid cell.  
Due to the multi-scale detection feature of YOLO v3, a detection kernel of three different sizes is applied at three different places, hence the 3 boxes(i.e B=3).

**yolo_82** : 13x13x255.  
**yolo_94** : 26x26x255.   
**yolo_106** : 52x52x255.

![](https://drive.google.com/uc?id=1sd66UoqruTPQQ9Qe4ivaND3NPv-9BZ6e)

In [None]:
import numpy as np
import cv2
import time

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


###**Reading input video**

In [None]:
video = cv2.VideoCapture('/content/gdrive/MyDrive/datasets/YOLO-3/videos/video-foot.mp4')

# Preparing variable for writer
# that we will use to write processed frames
writer = None

# Preparing variables for spatial dimensions of the frames
h, w = None, None

###**Créer un YOLO v3 network :**
YOLO v3 was trained on the COCO dataset with 80 object categories or classes

In [None]:
with open('/content/gdrive/MyDrive/datasets/YOLO-3/yolo-coco-data/coco.names') as f:
    labels = [line.strip() for line in f]

print('Labels :')
print()
print(labels)

# Loading trained YOLO v3 Objects Detector
network = cv2.dnn.readNetFromDarknet('/content/gdrive/MyDrive/datasets/YOLO-3/yolo-coco-data/yolov3.cfg',
                                     '/content/gdrive/MyDrive/datasets/YOLO-3/yolo-coco-data/yolov3.weights')

layers_names_all = network.getLayerNames()


print()
print(layers_names_all)

layers_names_output = \
    [layers_names_all[i[0] - 1] for i in network.getUnconnectedOutLayers()]

print()
print(layers_names_output)  # ['yolo_82', 'yolo_94', 'yolo_106']

# Pour éliminer les prédictions qui sont mauvaises
probability_minimum = 0.5

# Setting threshold for filtering weak bounding boxes
# with non-maximum suppression
threshold = 0.3

# Generer 80 couleurs pour représenter chaque objet detecté
colours = np.random.randint(0, 255, size=(len(labels), 3), dtype='uint8')


Labels :

['person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa', 'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

['conv_0', 'bn_0', 'relu_0', 'conv_1', 'bn_1', 'relu_1', 'conv_2', 'bn_2', 'relu_2', 'conv_3', 'bn_3', 'relu_3', 'shortcut_

###**Boucle principale**

In [None]:
# compteur de frames
f = 0

# compteur du temps de tous les frames du video
t = 0

# Defining loop for catching frames
while True:
    # Capturing frame-by-frame
    ret, frame = video.read()

    # si le frame n'est pas récupéré, on sort de la boucle
    if not ret:
        break

    # les dimensions spatiales du frameG
    if w is None or h is None:
        h, w = frame.shape[:2]

    """
    Début de:
    Getting blob from current frame
    """
    # blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size, mean, swapRB=True)
    blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
                                 swapRB=True, crop=False)

    """
    Fin de:
    Getting blob from current frame
    """

    """
    Début de:
    Forward pass
    """
    # Implementer le forward pass avec notre blob à travers les output layers
    network.setInput(blob)
    start = time.time()
    output_from_network = network.forward(layers_names_output)
    end = time.time()

    f += 1
    t += end - start

    # temps passé pour une seule frame
    print('Frame number {0} took {1:.5f} seconds'.format(f, end - start))

    """
    Fin de :
    Forward pass
    """

    """
    Début de:
    Getting bounding boxes : cadres de délimitation
    Layer yolo_82 : (507,85)
    output_network[i][j] = [x_center,y_center,width_box,height_box,confidence,
                        confidence_c1,......,confidence_cn]

    """

    # Preparing lists for detected bounding boxes,
    # obtained confidences and class's number
    bounding_boxes = []
    confidences = []
    class_numbers = []

    for result in output_from_network:
        for detected_objects in result:
            scores = detected_objects[5:]
            # index de la classe avec une probabilité maximale
            class_current = np.argmax(scores)
            confidence_current = scores[class_current]

            if confidence_current > probability_minimum:
                box_current = detected_objects[0:4] * np.array([w, h, w, h])
                # Now, from YOLO data format, we can get top left corner coordinates
                # that are x_min and y_min
                x_center, y_center, box_width, box_height = box_current
                x_min = int(x_center - (box_width / 2))
                y_min = int(y_center - (box_height / 2))

                # concaténer les résultats
                bounding_boxes.append([x_min, y_min,
                                       int(box_width), int(box_height)])
                confidences.append(float(confidence_current))
                class_numbers.append(class_current)

    """
    Fin de:
    Getting bounding boxes
    """

    """
    Start of:
    Non-maximum suppression :
    Eliminer les bounding boxes si leurs confidences sont faibles ou s'il y'a
    un autre bounding box pour cette région avec une confidence meilleure
    
    """

    results = cv2.dnn.NMSBoxes(bounding_boxes, confidences,
                               probability_minimum, threshold)

    """
    End of:
    Non-maximum suppression
    """

    """
    Start of:
    Drawing bounding boxes and labels
    """

    # Vérifier s'il y'a au moins un objet detecté
    if len(results) > 0:
        for i in results.flatten():
            x_min, y_min = bounding_boxes[i][0], bounding_boxes[i][1]
            box_width, box_height = bounding_boxes[i][2], bounding_boxes[i][3]

            colour_box_current = colours[class_numbers[i]].tolist()

            # Drawing bounding box 
            cv2.rectangle(frame, (x_min, y_min),
                          (x_min + box_width, y_min + box_height),
                          colour_box_current, 2)

            # text with label and confidence for bounding box
            text_box_current = '{}: {:.4f}'.format(labels[int(class_numbers[i])],
                                                   confidences[i])

            # Putting text with label and confidence on the original image
            cv2.putText(frame, text_box_current, (x_min, y_min - 5),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, colour_box_current, 2)

    """
    End of:
    Drawing bounding boxes and labels
    """

    """
    Start of:
    Writing processed frame into the file
    """

    # Initializing writer
    # we do it only once from the very beginning
    # when we get spatial dimensions of the frames
    if writer is None:
        # Constructing code of the codec
        # to be used in the function VideoWriter
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')

        # Writing current processed frame into the video file
        writer = cv2.VideoWriter('/content/gdrive/MyDrive/datasets/YOLO-3/videos/result.mp4', fourcc, 30,
                                 (frame.shape[1], frame.shape[0]), True)

    # Write processed current frame to the file
    writer.write(frame)

    """
    End of:
    Writing processed frame into the file
    """

Frame number 1 took 2.31020 seconds
Frame number 2 took 1.75818 seconds
Frame number 3 took 1.74134 seconds
Frame number 4 took 1.75920 seconds
Frame number 5 took 1.74248 seconds
Frame number 6 took 1.77252 seconds
Frame number 7 took 1.73431 seconds
Frame number 8 took 1.75792 seconds
Frame number 9 took 1.81000 seconds
Frame number 10 took 1.74864 seconds
Frame number 11 took 1.78209 seconds
Frame number 12 took 1.75312 seconds
Frame number 13 took 1.76254 seconds
Frame number 14 took 1.73499 seconds
Frame number 15 took 1.76249 seconds
Frame number 16 took 1.73985 seconds
Frame number 17 took 1.72545 seconds
Frame number 18 took 1.71955 seconds
Frame number 19 took 1.75319 seconds
Frame number 20 took 1.74332 seconds
Frame number 21 took 1.73740 seconds
Frame number 22 took 1.73029 seconds
Frame number 23 took 1.71398 seconds
Frame number 24 took 1.73478 seconds
Frame number 25 took 1.74353 seconds
Frame number 26 took 1.72533 seconds
Frame number 27 took 1.73502 seconds
Frame numb

###**Results**

In [None]:
print()
print('Total number of frames', f)
print('Total amount of time {:.5f} seconds'.format(t))
print('FPS:', round((f / t), 1))


Total number of frames 2750
Total amount of time 4803.90394 seconds
FPS: 0.6


###**Releasing video reader and writer**

In [None]:
video.release()
writer.release()