## Detectiong Objects (on video) with OpenCV and YOLO v3 in Python

**References** 

* [Deep Learning based Object Detection using YOLOv3 with OpenCV](https://www.learnopencv.com/deep-learning-based-object-detection-using-yolov3-with-opencv-python-c/) 
* [YOLO Object Detection with OpenCV and Python](https://www.arunponnusamy.com/yolo-object-detection-opencv-python.html)

In [1]:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib

print( "OpenCV version:", cv2.__version__)

Using matplotlib backend: Qt5Agg
OpenCV version: 3.4.3


### YOLO files

The files used here were download from:
 * configuration file: https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg?raw=true
 * pre-trained weights file: https://pjreddie.com/media/files/yolov3.weights
 * text file containing class names: https://github.com/pjreddie/darknet/blob/master/data/coco.names?raw=true
 
You can use the method `filename = wget.download(url)` to downloand them.

In [2]:
# Path to yolo configuration file
model_configuration = "../yolo_files/yolov3.cfg"

# Path to yolo pre-trained weights
model_weights = "../yolo_files/yolov3.weights"

# Path to text file containing class names
classes_file = "../yolo_files/coco.names"

**Read class names from text file**

In [3]:
classes = None
with open(classes_file, "r") as f:
    classes = [line.strip() for line in f.readlines()]

In [4]:
print( classes[:5], "of", len(classes) )

['person', 'bicycle', 'car', 'motorbike', 'aeroplane'] of 80


Generate different colors for different classes

In [5]:
colors = np.random.uniform( 0, 255, size = ( len(classes), 3 ) )

In [6]:
colors[:3]

array([[1.48137798e+02, 8.12286869e-02, 1.02033610e+02],
       [2.52689469e+02, 2.21725636e+02, 2.25909836e+02],
       [4.50061552e+01, 2.07078898e+02, 2.09335059e+02]])

**Read pre-trained model and config file**

In [7]:
net = cv2.dnn.readNetFromDarknet( model_configuration, model_weights)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

### Auxiliary functions

**Drawing boundin box**

`draw_bounding_box()` function draws rectangle over the given predicted region and writes class name and confidence value over the box. 

In [8]:
# Function to draw bounding box on the detected object with class name
def draw_bounding_box(img, class_id, confidence, x, y, x_plus_w, y_plus_h):
    
    label = '%.2f' % confidence
    color = colors[class_id]
         
    # Get the label for the class name and its confidence
    if classes:
        assert(class_id < len(classes))
        label = '%s:%s' % (classes[class_id], label)
        
    # Draw a bounding box.
    cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)
    
    #Display the label at the top of the bounding box    
    cv2.putText(img, label, (x-10,y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
    return img

**Post-processing the network’s output**

`post_process()` function remove the bounding boxes with low confidence using non-maxima suppression.

In [9]:
def post_process( frame, outputs,  conf_threshold = 0.5, nms_threshold = 0.4):
    frame_height, frame_width = frame.shape[:2]
    
    class_ids = []
    confidences = []
    bounding_boxes = []
    
    # For each detetion from each output layer get the confidence, 
    # class id, bounding box params and ignore weak detections
    for out in outputs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax( scores )
            conf = scores[ class_id ]
            if conf > conf_threshold:
                center_x = int(detection[0] * frame_width)
                center_y = int(detection[1] * frame_height)
                w = int(detection[2] * frame_width)
                h = int(detection[3] * frame_height)
                x = center_x - w / 2
                y = center_y - h / 2

                class_ids.append( class_id )
                confidences.append( float(conf) )
                bounding_boxes.append( [x, y, w, h] )
    
    # Perform non maximum suppression to eliminate redundant overlapping boxes with
    # lower confidences.
    indices = cv2.dnn.NMSBoxes( bounding_boxes, confidences, conf_threshold, nms_threshold )

    frame_out = frame.copy()
        
    for i in indices:
        i = i[0]
        x, y, w, h = bounding_boxes[i]
        frame_out = draw_bounding_box( frame_out, class_ids[i], confidences[i], 
                                       round(x), round(y), round(x+w), round(y+h) )
    
    return frame_out   

**Output layer**

Generally in a sequential CNN network there will be only one output layer at the end. In the YOLO v3 architecture we are using there are multiple output layers giving out predictions. `get_output_layers()` function gives the names of the output layers. An output layer is not connected to any next layer.

In [10]:
# Function to get output layers names in the architecture
def get_output_layers( net ):
    layer_names = net.getLayerNames( )
    output_layers = [ layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers() ]
    return output_layers

In [11]:
get_output_layers( net )

['yolo_82', 'yolo_94', 'yolo_106']

### Processing a video

In [15]:
inputFile = "../data/run.mp4"
outputFile = "yolo_out_py.avi"

inpWidth = 416       #Width of network's input image
inpHeight = 416      #Height of network's input image

In [16]:
import time

# Open the video file
if os.path.isfile( inputFile ):
    
    cap = cv2.VideoCapture( inputFile )
    outputFile = inputFile[:-4]+'_yolo_out_py.avi'
    
    video_width = round( cap.get(cv2.CAP_PROP_FRAME_WIDTH) )
    video_height = round( cap.get(cv2.CAP_PROP_FRAME_HEIGHT) )
    vid_writer = cv2.VideoWriter( outputFile, cv2.VideoWriter_fourcc('M','J','P','G'), 30, (video_width, video_height) )
else:
    print("Input image file ", inputFile, " doesn't exist") 

start_time = time.time()
print( "Processing..." )

# get frame from the video
hasFrame, frame = cap.read()

while hasFrame:
    # Create a 4D blob from a frame.
    blob = cv2.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop=False)
 
    # Sets the input to the network
    net.setInput(blob)
 
    # Runs the forward pass to get output of the output layers
    outs = net.forward( get_output_layers(net) )
 
    # Remove the bounding boxes with low confidence
    frame = post_process(frame, outs)
    
    # Put efficiency information. The function getPerfProfile returns the 
    # overall time for inference(t) and the timings for each of the layers(in layersTimes)
    t, _ = net.getPerfProfile()
    label = 'Inference time: %.2f ms' % (t * 1000.0 / cv2.getTickFrequency())
    cv2.putText(frame, label, (0, 15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
    
    vid_writer.write(frame.astype(np.uint8))
    
    hasFrame, frame = cap.read()

# Release device
cap.release()

print( "Done processing !!!" )
print( "Output file is stored as ", outputFile )
print( "--- %s seconds ---" % (time.time() - start_time) )

Processing...
Done processing !!!
Output file is stored as  ../data/run_yolo_out_py.avi
--- 166.96527457237244 seconds ---


**Outputs**

<img src="../data/run_yolo_out.PNG" width="800" />

<img src="../data/pigeons_yolo_out.PNG" width="800" />