# Detecting Objects on Video with OpenCV deep learning library


by Vishal Shasi





I have tried to explain each step as clearly as possible guys, take a look at it and let me know!!!!!







Make sure to have the 3 files, the .cfg, the .weight file and coco file in a folder like i have used here as yolo-coco-data containing the 3 files coco.names, yolov3.cfg, yolov3.weights




Make sure you have all the files in a single directory or use the complete location of each file in your PC

# Algorithm:
Reading input video --> Loading YOLO v3 Network -->
--> Reading frames in the loop --> Getting blob from the frame -->
--> Implementing Forward Pass --> Getting Bounding Boxes -->
--> Non-maximum Suppression --> Drawing Bounding Boxes with Labels -->
--> Writing processed frames



Result:
#New video file with Detected Objects, Bounding Boxes and Labels

In [17]:
# Importing needed libraries
import numpy as np
import cv2
import time

# Reading input video

In [18]:
# Defining 'VideoCapture' object
# and reading video from a file
# Pay attention! If you're using Windows, the path might looks like:
# r'videos\traffic-cars.mp4'
# or:
# 'videos\\traffic-cars.mp4'
video = cv2.VideoCapture('videos/06-19_first_team_data_Trim.mp4')
# Preparing variable for writer
# that we will use to write processed frames
writer = None
# Preparing variables for spatial dimensions of the frames
h, w = None, None

# Loading YOLO v3 network

In [19]:
# Loading COCO class labels from file
# Opening file
with open('yolo-coco-data/coco.names') as f:
    # Getting labels reading every line
    # and putting them into the list
    labels = [line.strip() for line in f]
network = cv2.dnn.readNetFromDarknet('yolo-coco-data/yolov3.cfg',
                                     'yolo-coco-data/yolov3.weights')


# Setting minimum probability to eliminate weak predictions

In [20]:
# Setting minimum probability to eliminate weak predictions
probability_minimum = 0.5

# Setting threshold for filtering weak bounding boxes
# with non-maximum suppression
threshold = 0.3

# Generating colours for representing every detected object
# with function randint(low, high=None, size=None, dtype='l')
colours = np.random.randint(0, 255, size=(len(labels), 3), dtype='uint8')

# Defining a variable to show number of frames processed and time taken

In [21]:
# Defining variable for counting frames
# To show total amount of processed frames
f = 0

# Defining variable for counting total time
# To show time spent for processing all frames
t = 0

# Defining loop for catching the frames

In [22]:
# Defining loop for catching frames
while True:
    # Capturing frame-by-frame
    ret, frame = video.read()

    # If the frame was not retrieved
    # e.g.: at the end of the video,
    # then we break the loop
    if not ret:
        break

# Getting spatial dimensions of the frame, all other frames have the same dimension
    if w is None or h is None:
# Slicing from tuple only first two elements
        h, w = frame.shape[:2]
# Getting blob from current frame, the 'cv2.dnn.blobFromImage' function returns 4-dimensional blob from current
# frame after mean subtraction, normalizing, and RB channels swapping
# Resulted shape has number of frames, number of channels, width and height
# E.G.:
# blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size, mean, swapRB=True)
    blob = cv2.dnn.blobFromImage(frame,
                                 1 / 255.0, (416, 416),
                                 swapRB=True,
                                 crop=False)
# Implementing forward pass with our blob and only through output layers
# Calculating at the same time, needed time for forward pass
    network.setInput(blob)  # setting blob as input to the network
    start = time.time()
    output_from_network = network.forward(layers_names_output)
    end = time.time()
# Increasing counters for frames and total time
    f += 1
    t += end - start
# Showing spent time for single current frame
    print('Frame number {0} took {1:.5f} seconds'.format(f, end - start))
# Preparing lists for detected bounding boxes,
# obtained confidences and class's number
    bounding_boxes = []
    confidences = []
    class_numbers = []
# Going through all output layers after feed forward pass
    for result in output_from_network:
# Going through all detections from current output layer
        for detected_objects in result:
            scores = detected_objects[5:]
            # Getting index of the class with the maximum value of probability
            class_current = np.argmax(scores)
            # Getting value of probability for defined class
            confidence_current = scores[class_current]
            # Eliminating weak predictions with minimum probability
            if confidence_current > probability_minimum:
                # Scaling bounding box coordinates to the initial frame size
                # YOLO data format keeps coordinates for center of bounding box
                # and its current width and height
                # That is why it is justified to multiply them elementwise
                # to the width and height
                # of the original frame and in this way get coordinates for center
                # of bounding box, its width and height for original frame
                box_current = detected_objects[0:4] * np.array([w, h, w, h])

                # Now, from YOLO data format, we can get top left corner coordinates
                # that are x_min and y_min
                x_center, y_center, box_width, box_height = box_current
                x_min = int(x_center - (box_width / 2))
                y_min = int(y_center - (box_height / 2))

                # Adding results into prepared lists
                bounding_boxes.append(
                    [x_min, y_min,
                     int(box_width),
                     int(box_height)])
                confidences.append(float(confidence_current))
                class_numbers.append(class_current)
    # Implementing non-maximum suppression of given bounding boxes
    # With this technique we can exclude some of bounding boxes if their
    # corresponding confidences are low or there is another
    # bounding box for this region with higher confidence
    # It is needed to make sure that data type of the boxes is 'int'
    # and data type of the confidences is 'float'
    results = cv2.dnn.NMSBoxes(bounding_boxes, confidences,
                               probability_minimum, threshold)
    # Checking if there is at least one detected object
    # after non-maximum suppression
    if len(results) > 0:
        # Going through indexes of results
        for i in results.flatten():
            # Getting current bounding box coordinates,
            # its width and height
            x_min, y_min = bounding_boxes[i][0], bounding_boxes[i][1]
            box_width, box_height = bounding_boxes[i][2], bounding_boxes[i][3]
            # Preparing colour for current bounding box
            # and converting from numpy array to list
            colour_box_current = colours[class_numbers[i]].tolist()
            # Drawing bounding box on the original current frame
            cv2.rectangle(frame, (x_min, y_min),
                          (x_min + box_width, y_min + box_height),
                          colour_box_current, 2)
            # Preparing text with label and confidence for current bounding box
            text_box_current = '{}: {:.4f}'.format(
                labels[int(class_numbers[i])], confidences[i])
            # Putting text with label and confidence on the original image
            cv2.putText(frame, text_box_current, (x_min, y_min - 5),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, colour_box_current, 2)
    # Initializing writer
    # we do it only once from the very beginning
    # when we get spatial dimensions of the frames
    if writer is None:
        # Constructing code of the codec
        # to be used in the function VideoWriter, What is a FOURCC?
#     FOURCC is short for "four character code" - an identifier for a video codec,
#     compression format, colour or pixel format used in media files. 
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
       # Writing current processed frame into the video file
#           filename - Name of the output video file.
#     fourcc - 4-character code of codec used to compress the frames.
#     fps	- Frame rate of the created video.
#     frameSize - Size of the video frames.x
#     isColor	- If it True, the encoder will expect and encode colour frames.
        writer = cv2.VideoWriter('videos/End_result.mp4', fourcc, 30,
                                 (frame.shape[1], frame.shape[0]), True)
   # Write processed current frame to the file
    writer.write(frame)

Frame number 1 took 0.53065 seconds
Frame number 2 took 0.38507 seconds
Frame number 3 took 0.39795 seconds
Frame number 4 took 0.38543 seconds
Frame number 5 took 0.37977 seconds
Frame number 6 took 0.40109 seconds
Frame number 7 took 0.38565 seconds
Frame number 8 took 0.38512 seconds
Frame number 9 took 0.38666 seconds
Frame number 10 took 0.38744 seconds
Frame number 11 took 0.37559 seconds
Frame number 12 took 0.39971 seconds
Frame number 13 took 0.36868 seconds
Frame number 14 took 0.36861 seconds
Frame number 15 took 0.38510 seconds
Frame number 16 took 0.40506 seconds
Frame number 17 took 0.36948 seconds
Frame number 18 took 0.38509 seconds
Frame number 19 took 0.41373 seconds
Frame number 20 took 0.42106 seconds
Frame number 21 took 0.43321 seconds
Frame number 22 took 0.40136 seconds
Frame number 23 took 0.37650 seconds
Frame number 24 took 0.41936 seconds
Frame number 25 took 0.41524 seconds
Frame number 26 took 0.39662 seconds
Frame number 27 took 0.37945 seconds
Frame numb

Frame number 220 took 0.57767 seconds
Frame number 221 took 0.54882 seconds
Frame number 222 took 0.54690 seconds
Frame number 223 took 0.58514 seconds
Frame number 224 took 0.54119 seconds
Frame number 225 took 0.54502 seconds
Frame number 226 took 0.53994 seconds
Frame number 227 took 0.53581 seconds
Frame number 228 took 0.52231 seconds
Frame number 229 took 0.55440 seconds
Frame number 230 took 0.53213 seconds
Frame number 231 took 0.55130 seconds
Frame number 232 took 0.54296 seconds
Frame number 233 took 0.53956 seconds
Frame number 234 took 0.53965 seconds
Frame number 235 took 0.53567 seconds
Frame number 236 took 0.58450 seconds
Frame number 237 took 0.56332 seconds
Frame number 238 took 0.55119 seconds
Frame number 239 took 0.55484 seconds
Frame number 240 took 0.57872 seconds
Frame number 241 took 0.53266 seconds
Frame number 242 took 0.53823 seconds
Frame number 243 took 0.53809 seconds
Frame number 244 took 0.52900 seconds
Frame number 245 took 0.57246 seconds
Frame number

Frame number 436 took 0.54024 seconds
Frame number 437 took 0.53610 seconds
Frame number 438 took 0.53223 seconds
Frame number 439 took 0.57083 seconds
Frame number 440 took 0.53830 seconds
Frame number 441 took 0.53248 seconds
Frame number 442 took 0.59848 seconds
Frame number 443 took 0.53643 seconds
Frame number 444 took 0.53025 seconds
Frame number 445 took 0.53508 seconds
Frame number 446 took 0.52940 seconds
Frame number 447 took 0.53193 seconds
Frame number 448 took 0.53633 seconds
Frame number 449 took 0.55302 seconds
Frame number 450 took 0.58132 seconds
Frame number 451 took 0.53539 seconds
Frame number 452 took 0.56445 seconds
Frame number 453 took 0.55337 seconds
Frame number 454 took 0.56214 seconds
Frame number 455 took 0.57539 seconds
Frame number 456 took 0.56012 seconds
Frame number 457 took 0.55261 seconds
Frame number 458 took 0.54866 seconds
Frame number 459 took 0.56702 seconds
Frame number 460 took 0.56056 seconds
Frame number 461 took 0.58830 seconds
Frame number

Frame number 652 took 0.57053 seconds
Frame number 653 took 0.57962 seconds
Frame number 654 took 0.57083 seconds
Frame number 655 took 0.55941 seconds
Frame number 656 took 0.56550 seconds
Frame number 657 took 0.55487 seconds
Frame number 658 took 0.56935 seconds
Frame number 659 took 0.55399 seconds
Frame number 660 took 0.55448 seconds
Frame number 661 took 0.56627 seconds
Frame number 662 took 0.55613 seconds
Frame number 663 took 0.54720 seconds
Frame number 664 took 0.60268 seconds
Frame number 665 took 0.63576 seconds
Frame number 666 took 0.60929 seconds
Frame number 667 took 0.57083 seconds
Frame number 668 took 0.58774 seconds
Frame number 669 took 0.57227 seconds
Frame number 670 took 0.72810 seconds
Frame number 671 took 0.62773 seconds
Frame number 672 took 0.63795 seconds
Frame number 673 took 0.58872 seconds
Frame number 674 took 0.59252 seconds
Frame number 675 took 0.62295 seconds
Frame number 676 took 0.68522 seconds
Frame number 677 took 0.66662 seconds
Frame number

Frame number 868 took 0.58641 seconds
Frame number 869 took 0.51055 seconds
Frame number 870 took 0.53253 seconds
Frame number 871 took 0.52450 seconds
Frame number 872 took 0.52313 seconds
Frame number 873 took 0.51698 seconds
Frame number 874 took 0.53230 seconds
Frame number 875 took 0.58608 seconds
Frame number 876 took 0.55226 seconds
Frame number 877 took 0.54812 seconds
Frame number 878 took 0.52857 seconds
Frame number 879 took 0.53407 seconds
Frame number 880 took 0.56379 seconds
Frame number 881 took 0.57066 seconds
Frame number 882 took 0.53176 seconds
Frame number 883 took 0.51746 seconds
Frame number 884 took 0.53295 seconds
Frame number 885 took 0.51686 seconds
Frame number 886 took 0.53261 seconds
Frame number 887 took 0.51655 seconds
Frame number 888 took 0.52013 seconds
Frame number 889 took 0.53230 seconds
Frame number 890 took 0.53191 seconds
Frame number 891 took 0.52356 seconds
Frame number 892 took 0.52349 seconds
Frame number 893 took 0.52327 seconds
Frame number

Frame number 1082 took 0.54792 seconds
Frame number 1083 took 0.51719 seconds
Frame number 1084 took 0.53228 seconds
Frame number 1085 took 0.51660 seconds
Frame number 1086 took 0.53231 seconds
Frame number 1087 took 0.50669 seconds
Frame number 1088 took 0.51659 seconds
Frame number 1089 took 0.53195 seconds
Frame number 1090 took 0.55512 seconds
Frame number 1091 took 0.54778 seconds
Frame number 1092 took 0.53203 seconds
Frame number 1093 took 0.57053 seconds
Frame number 1094 took 0.53896 seconds
Frame number 1095 took 0.53889 seconds
Frame number 1096 took 0.56380 seconds
Frame number 1097 took 0.55487 seconds
Frame number 1098 took 0.53897 seconds
Frame number 1099 took 0.52343 seconds
Frame number 1100 took 0.52272 seconds
Frame number 1101 took 0.52314 seconds
Frame number 1102 took 0.52363 seconds
Frame number 1103 took 0.50795 seconds
Frame number 1104 took 0.51629 seconds
Frame number 1105 took 0.51690 seconds
Frame number 1106 took 0.53231 seconds
Frame number 1107 took 0.

Frame number 1293 took 0.52318 seconds
Frame number 1294 took 0.52330 seconds
Frame number 1295 took 0.52308 seconds
Frame number 1296 took 0.52305 seconds
Frame number 1297 took 0.53931 seconds
Frame number 1298 took 0.53225 seconds
Frame number 1299 took 0.51646 seconds
Frame number 1300 took 0.51536 seconds
Frame number 1301 took 0.53284 seconds
Frame number 1302 took 0.54403 seconds
Frame number 1303 took 0.51683 seconds
Frame number 1304 took 0.53894 seconds
Frame number 1305 took 0.55501 seconds
Frame number 1306 took 0.53188 seconds
Frame number 1307 took 0.53822 seconds
Frame number 1308 took 0.53232 seconds
Frame number 1309 took 0.53282 seconds
Frame number 1310 took 0.58589 seconds
Frame number 1311 took 0.51676 seconds
Frame number 1312 took 0.57044 seconds
Frame number 1313 took 0.52343 seconds
Frame number 1314 took 0.52322 seconds
Frame number 1315 took 0.51692 seconds
Frame number 1316 took 0.53269 seconds
Frame number 1317 took 0.51682 seconds
Frame number 1318 took 0.

Frame number 1504 took 0.57123 seconds
Frame number 1505 took 0.55891 seconds
Frame number 1506 took 0.54837 seconds
Frame number 1507 took 0.53782 seconds
Frame number 1508 took 0.55171 seconds
Frame number 1509 took 0.54693 seconds
Frame number 1510 took 0.54480 seconds
Frame number 1511 took 0.51795 seconds
Frame number 1512 took 0.53239 seconds
Frame number 1513 took 0.53572 seconds
Frame number 1514 took 0.53320 seconds
Frame number 1515 took 0.56894 seconds
Frame number 1516 took 0.56738 seconds
Frame number 1517 took 0.56500 seconds
Frame number 1518 took 0.57954 seconds
Frame number 1519 took 0.58839 seconds
Frame number 1520 took 0.54402 seconds
Frame number 1521 took 0.57898 seconds
Frame number 1522 took 0.55886 seconds
Frame number 1523 took 0.58646 seconds
Frame number 1524 took 0.53916 seconds
Frame number 1525 took 0.54241 seconds
Frame number 1526 took 0.53284 seconds
Frame number 1527 took 0.54479 seconds
Frame number 1528 took 0.58462 seconds
Frame number 1529 took 0.

Frame number 1715 took 0.52296 seconds
Frame number 1716 took 0.52641 seconds
Frame number 1717 took 0.52361 seconds
Frame number 1718 took 0.51636 seconds
Frame number 1719 took 0.53229 seconds
Frame number 1720 took 0.53197 seconds
Frame number 1721 took 0.51669 seconds
Frame number 1722 took 0.55094 seconds
Frame number 1723 took 0.52585 seconds
Frame number 1724 took 0.51663 seconds
Frame number 1725 took 0.55396 seconds
Frame number 1726 took 0.52087 seconds
Frame number 1727 took 0.56434 seconds
Frame number 1728 took 0.56584 seconds
Frame number 1729 took 0.54826 seconds
Frame number 1730 took 0.54793 seconds
Frame number 1731 took 0.55463 seconds
Frame number 1732 took 0.57112 seconds
Frame number 1733 took 0.51770 seconds
Frame number 1734 took 0.55161 seconds
Frame number 1735 took 0.55556 seconds
Frame number 1736 took 0.53176 seconds
Frame number 1737 took 0.53174 seconds
Frame number 1738 took 0.52398 seconds
Frame number 1739 took 0.51901 seconds
Frame number 1740 took 0.

Frame number 1926 took 0.53243 seconds
Frame number 1927 took 0.51585 seconds
Frame number 1928 took 0.51807 seconds
Frame number 1929 took 0.53181 seconds
Frame number 1930 took 0.51662 seconds
Frame number 1931 took 0.53238 seconds
Frame number 1932 took 0.54770 seconds
Frame number 1933 took 0.53170 seconds
Frame number 1934 took 0.53273 seconds
Frame number 1935 took 0.54831 seconds
Frame number 1936 took 0.52315 seconds
Frame number 1937 took 0.53850 seconds
Frame number 1938 took 0.52313 seconds
Frame number 1939 took 0.51065 seconds
Frame number 1940 took 0.57075 seconds
Frame number 1941 took 0.59773 seconds
Frame number 1942 took 0.56363 seconds
Frame number 1943 took 0.54798 seconds
Frame number 1944 took 0.54856 seconds
Frame number 1945 took 0.56299 seconds
Frame number 1946 took 0.54857 seconds
Frame number 1947 took 0.53197 seconds
Frame number 1948 took 0.55363 seconds
Frame number 1949 took 0.53918 seconds
Frame number 1950 took 0.52829 seconds
Frame number 1951 took 0.

Frame number 2137 took 0.53263 seconds
Frame number 2138 took 0.53259 seconds
Frame number 2139 took 0.53346 seconds
Frame number 2140 took 0.51663 seconds
Frame number 2141 took 0.55432 seconds
Frame number 2142 took 0.53231 seconds
Frame number 2143 took 0.53225 seconds
Frame number 2144 took 0.51725 seconds
Frame number 2145 took 0.51651 seconds
Frame number 2146 took 0.53232 seconds
Frame number 2147 took 0.57039 seconds
Frame number 2148 took 0.53245 seconds
Frame number 2149 took 0.54734 seconds
Frame number 2150 took 0.55418 seconds
Frame number 2151 took 0.52356 seconds
Frame number 2152 took 0.60105 seconds
Frame number 2153 took 0.57047 seconds
Frame number 2154 took 0.60165 seconds
Frame number 2155 took 0.54822 seconds
Frame number 2156 took 0.53401 seconds
Frame number 2157 took 0.53239 seconds
Frame number 2158 took 0.52343 seconds
Frame number 2159 took 0.53632 seconds
Frame number 2160 took 0.53884 seconds
Frame number 2161 took 0.52360 seconds
Frame number 2162 took 0.

'\nSome comments\n\nWhat is a FOURCC?\n    FOURCC is short for "four character code" - an identifier for a video codec,\n    compression format, colour or pixel format used in media files.\n    http://www.fourcc.org\n\n\nParameters for cv2.VideoWriter():\n    filename - Name of the output video file.\n    fourcc - 4-character code of codec used to compress the frames.\n    fps\t- Frame rate of the created video.\n    frameSize - Size of the video frames.x\n    isColor\t- If it True, the encoder will expect and encode colour frames.\n'

# Printing the frame and time taken

In [23]:
# Printing final results
print()
print('Total number of frames', f)
print('Total amount of time {:.5f} seconds'.format(t))
print('FPS:', round((f / t), 1))




Total number of frames 2336
Total amount of time 1272.92444 seconds
FPS: 1.8


# Writing the new video File

In [24]:
# Releasing video reader and writer
video.release()
writer.release()