# Video Object Detection with YOLOV3 (OpenCV) 

In this notebook, we will show step by step how to detect objects in a video (i.e. a real-time video from a highway camera).   

## Importing Libraries

First step, we need to import OpenCV, Numpy and Matplotlib libraries.

In [116]:
import cv2 
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Video

## Video

Let's have a look at the video...

Link (click on "download" and watch it):   [https://github.com/buropas/Object_Detection/blob/main/test.mp4](https://github.com/buropas/Object_Detection/blob/main/test.mp4)



In [117]:
#Take a look at the input video
Video("test.mp4")

## Loading Model Configuration and Pre-trained Weights

Yolov3 pre-trained weights can be downloaded here: https://pjreddie.com/media/files/yolov3.weights.

Let's load Yolov3 model configuration and pre-trained weights by reading "yolov3.cfg" and yolov3.weights".
We also need to extract class names from the file "coco.names".

In [3]:
## Loading network configuration and pre-trained weights 
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg") 

## Save all the class names in a list  (80 CLASSES)
with open("coco.names", "r") as f:     
    classes = [word.strip() for word in f.readlines()] 
    
## Get layer names of the network 
layer_names = net.getLayerNames() 

## Determine the output layer names from the YOLO model  
# (net.getUnconnectedOutLayers() gives the index position of the layers)
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] 

print("YOLOv3 LOADED SUCCESSFULLY")

YOLOv3 LOADED SUCCESSFULLY


This is all we need in order to load the model configuration and pre-trained weights.  

## Capturing video 
Next step is to load the video and define the VideoWriter object in order to save our final video with detected objects. The output video will be saved as "output.avi".

In the VideoWriter object we specify:
- the output file name (output.avi), 
- the FourCC code (a 4-byte code used to specify the video codec), 
- the number of frames per second (fps), 
- the frame size.

In [148]:
## Loading video 
filename = "test.mp4"                         # filename
cap = cv2.VideoCapture(filename)              # loading video

# We get the resolution of our video (width and height) and we convert from float to integer
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))

# We create VideoWriter object and define the codec. The output is stored in 'output.avi' file.
out_video = cv2.VideoWriter("output.avi",                                # output name
                            cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'),  # 4-byte code used to specify the video codec
                                                                         # (we pass MJPG)
                            15,                                          # number of frames per second (fps) 
                            (frame_width, frame_height)                  # frame size
                            )

# set font and color of text and bounding boxes
font = cv2.FONT_HERSHEY_PLAIN     # font
color = (0, 255, 0)               # green color

## Object Detection

The goal is to perform object detection only in a specific region (RoI - Region of Interest) of the video: we want to detect the incoming vehicles in the right part of the video.     
So, the idea is that we process the video frame by frame.   
For each frame:
- we select the region of interest, preprocess it and pass it as input into the network,
- we obtain detected objects in the region of interest as output of our network,
- we discard objects detected with a confidence score lower than a specific threshold and we perform Non-maximum Suppression (NMS), which is a technique to filter the predictions,
- we show detected objects with bounding boxes, classes and confidence scores.      

At the end, we save the video with detected objects.

In [149]:
while cap.isOpened():     # while the capture is correctly initialized...

    # We process the video frame-by-frame
    
    ret, img = cap.read()           # we read each frame (img) from the video
                                    # we also retrieve ret, which is a boolean value. 
                                    # ret is True if the frame is read correctly
    
    if ret == True:    # if the frame is read correctly, go on...
        
    
        ## EXTRACT REGION OF INTEREST(ROI)
        roi = img[120:, 450:]           # consider only a slice in pixels of the entire frame 

        height, width, _ = roi.shape    # retrieve height and width from the region of interest
                                        # (we need height and width to build bounding boxes later)

        ## IMAGE PREPROCESSING
        # The cv2.dnn.blobFromImage function returns a blob which is our input image after
        # scaling by a scale factor, and channel swapping.
        # The input image that we need to pass to the Yolo algorithm must be 416x416
        blob = cv2.dnn.blobFromImage(roi, 1/255.0, (416,416), (0,0,0), swapRB=True, crop= False)

        ## OBJECT DETECTION
        net.setInput(blob)                      # set blob as input to the network
        outs = net.forward(output_layers)       # runs a forward pass to compute the network output  


        boxes = []
        confidences = []
        class_ids = []

        for out in outs:            # for each output...
            for detection in out:               # for each detection...
                scores = detection[5:]             # array with 80 scores (1 score for each class)
                class_id = np.argmax(scores)       # take the id of the maximum score
                confidence = scores[class_id]      # confidence of the class with the maximum score

                if confidence > 0.5:    # if the confidence of the detected object is above the threshold
                                        # we start to create the bounding box...
                    # Object detected
                    center_x = int(detection[0] * width)             # x of the center point
                    center_y = int(detection[1] * height)            # y of the center point
                    w = int(detection[2] * width)                    # width of the detected object
                    h = int(detection[3] * height)                   # height of the detected object

                    # Rectangle coordinates
                    x = int(center_x - w / 2)                         # x of the top left point
                    y = int(center_y - h / 2)                         # y of the top left point

                    boxes.append([x, y, w, h])
                    confidences.append(float(confidence))
                    class_ids.append(class_id)


        ## NMS - NON-MAXIMUM SUPPRESSION
        # We use NMS function in opencv to perform Non-maximum Suppression  
        # The function performs non maximum suppression, given boxes and corresponding confidence scores
        # We give it score threshold and nms threshold as arguments:
        # score_threshold: keep only boxes with a confidence score higher than the threshold
        # nms threshold: threshold used in non maximum suppression (IoU)
        # The function returns indices of bounding boxes survived after NMS.
        indexes = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold = 0.5, nms_threshold = 0.4)


        ## DRAW IMAGE WITH BOUNDING BOXES, CLASSES AND CONFIDENCE SCORES
        for i in indexes.flatten():               # for each detected object...

            x,y,w,h = boxes[i]                                      # bounding box coordinates
            label = str(classes[class_ids[i]])                      # class label
            confidence = str(round(confidences[i], 2))              # confidence score
            cv2.rectangle(roi, (x, y), (x + w, y + h), color, 2)    # drawing rectangular bounding box
                                                                # (x,y) is the top left corner of the box
                                                                # (x + w, y + h) is the bottom right corner of the box
            cv2.putText(roi, label + " " + confidence, (x, y - 5), font, 0.8, color, 2)   # text of the box 
            
            cv2.imshow("out", img)      # display the current frame with detected objects  

        out_video.write(img)        # the frame is saved for the final video

        key = cv2.waitKey(1)        # wait 1 millisecond between each frame
        if key == 27:               # if exit button, break and close
            break
    
    
    else:   # if the frame is not read correctly, break...
        break
    
# Release everything when job is finished
cap.release()
out_video.release()
cv2.destroyAllWindows()

Now, let's have a look at the final result...

## Video with Detected Objects

Link to Output Video (click on "download" and just watch or download it):   

[https://github.com/buropas/Object_Detection/blob/main/output.avi](https://github.com/buropas/Object_Detection/blob/main/output.avi)