<a href="https://colab.research.google.com/github/AnjaliSidharthanD/YOLOv3/blob/main/YOLOV3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Introduction to Object detection using YOLO v3**

Author: Anjali

Object detection is a computer vision technique that involves detecting the presence, location and type of one or more objects in an image. Yolo which stands for ‘you only look once’ is a real-time object detection algorithm that uses deep convolutional neural network. In this notebook, we will discuss YOLOv3, a variant of the original YOLO model that achieves near state-of-the-art(SOTA) result. It is one of the fastest algorithm compared to R - CNN family.

>The official neural net implementation from the ground up in C from the author is available at [Darnet](https://pjreddie.com/darknet/) . It is available on [github](https://github.com/pjreddie/darknet) for people to use.

>We have two options to get started with object detection:

*  Using the pre-trained model
*  Training custom object detector from scratch

In this notebook, we will be looking at creating an object detector using the pre-trained model for videos. Let us dive into the code.

*Useful intro about [Colab](https://colab.research.google.com/notebooks/welcome.ipynb)*

*Useful intro about [OpenCV](https://opencv.org/)*


## Setting up our notebook

### Mounting Google Drive locally

In [1]:
from google.colab import drive
drive.mount('/content/gdrive/')

Mounted at /content/gdrive/


### Choose the Necessary directory from drive



In [2]:
!pwd

/content


In [3]:
%cd /content/gdrive/MyDrive/yoloV3

/content/gdrive/MyDrive/yoloV3


## Python code .py files



### Scripting yolo.py

In [18]:
%%writefile yolo.py
##### SCRIPT STARTS HERE #####
#!usr/bin/bash python
# Importing required packages
# python yolo_video.py --input videos/airport.mp4 --output output/airport_output.avi --yolo yolo-coco

# import the necessary packages
import numpy as np
import argparse
import time
import cv2
import os

# Initialize the parameters
confidenceThreshold = 0.5  #Confidence threshold
nmsThreshold = 0.4         #Non-maximum suppression threshold
inputWidth = 416           #Width of network's input image
inputHeight = 416          #Height of network's input image

# construct the argument parse and parse the arguments
parser = argparse.ArgumentParser(description='Object Detection using YOLO in OPENCV')
parser.add_argument('--image', help="True/False", default=False)
parser.add_argument('--video', help="Path to video file", default="videos/car_on_road.mp4")
parser.add_argument('--verbose', help="To print statements", default=True)
args = parser.parse_args()

#Load YOLO V3

def loadYolo():

    # load our YOLO object detector trained on COCO dataset (80 classes)
    print("[INFO] loading YOLO from disk...")
    # derive the paths to the YOLO weights and model configuration
    configPath = '/content/gdrive/MyDrive/yoloV3/yolov3.cfg'
    weightsPath = '/content/gdrive/MyDrive/yoloV3/yolov3.weights'
    net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
    #cv2.dnn.readNetFromDarknet("yolo3.weights","yolov3.cfg")

    # load the COCO class labels our YOLO model was trained on
    classes=[]
    # load the COCO class labels our YOLO model was trained on
    classesPath = '/content/gdrive/MyDrive/yoloV3/coco.names'
    with open(classesPath, "r") as f:
      classes = [line.strip() for line in f.readlines()]

    # initialize a list of colors to represent each possible class label
    np.random.seed(42)
    colors = np.random.uniform(0, 255, size=(len(classes), 3))
    #np.random.randint(0, 255, size=(len(LABELS), 3),dtype="uint8")
    

    # and determine only the *output* layer names that we need from YOLO
    
    # Get the names of all the layers in the network
    layersNames = net.getLayerNames()
    # Get the names of the output layers, i.e. the layers with unconnected outputs
    outputLayers = [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
    
    return net, classes, colors, outputLayers

def detectObjects(image,net,outputLayers):
    # Create a 4D blob from a frame and then perform a forward
    # pass of the YOLO object detector, giving us our bounding boxes
    # and associated probabilities
    blob = cv2.dnn.blobFromImage(image, scalefactor=0.00392, size=(inputWidth, inputHeight), mean=(0, 0, 0), swapRB=True, crop=False)
    net.setInput(blob)
    layerOutputs = net.forward(outputLayers)
    return blob, layerOutputs

def getBoundingbox(layerOutputs, height, width):
    # initialize our lists of detected bounding boxes, confidences,
    # and class IDs, respectively
    boxes = []
    confidences = []
    classIDs = []

    # loop over each of the layer outputs
    for output in layerOutputs:
        #loop over each of the detections
        for detection in output:
            # extract the class ID and confidence (i.e., probability)
            # of the current object detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]

            #filter out weak predictions by ensuring the detected
            # probability is greater than the minimum probability

            if confidence > confidenceThreshold:
                # scale the bounding box coordinates back relative to
                # the size of the image, keeping in mind that YOLO
                # actually returns the center (x, y)-coordinates of
                # the bounding box followed by the boxes' width and
                # height
                box = detection[0:4] * np.array([width, height, width, height])
                (centerX, centerY, width, height) = box.astype("int")
                
                # use the center (x, y)-coordinates to derive the top
                # and and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))

                # update our list of bounding box coordinates,
                # confidences, and class IDs
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)

    return boxes, confidences, classIDs


# Draw the predicted bounding boxes

#def drawPredictedBB(boxes, confidences, colors, classIDs,classes, image):
    # apply non-maxima suppression to suppress weak, overlapping
    # bounding boxes with lower confidences
 #   indices = cv2.dnn.NMSBoxes(boxes, confidences, confidenceThreshold, nmsThreshold)
 #   font = cv2.FONT_HERSHEY_PLAIN
    
    # ensure at least one detection exists
 #   if len(indices) > 0:
        #loop over indices we are keeping
 #       for index in indices.flatten():
            # extract the bounding box coordinates
 #           (x, y) = (boxes[index][0], boxes[index][1])
 #           (width,height) = (boxes[index][2], boxes[index][3])

            # Draw a bounding box
 #           text = "{}: {:.4f}".format(classes[classIDs[index]],confidences[index])
 #           color = [int(c) for c in colors[classIDs[index]]]
 #           cv2.rectangle(image, (x,y), (x+width, y+height), color,2)
 #           cv2.putText(image, text, (x, y - 5), font, 1, color, 1)
 #
def videoToFrames(videoPath): 
    !rm -r inputFrames/*
    !mkdir inputFrames/  
    frameCount=0
    inputFrames=[]
    capture = cv2.VideoCapture(videoPath)
    while(True):
        # Capture the video frame by frame from the file
        (grabbed,frame) = capture.read()
        # if the frame was not grabbed, then we have reached the end
        #of the stream
        if not grabbed:
          print("[INFO] All frames appended !!!")
          break
        inputFrames.append(frame)
        frameCount = frameCount + 1
        cv2.imwrite('frames/'+str(frameCount)+'.png', frame)
    return inputFrames 

def framesToVideo():
  outputFrames = os.listdir('output/')
  outputFrames.sort(key=lambda f: int(re.sub('\D', '', f)))
  frames=[]
  writer = None
  (width,height)= (None, None)
  for index in range(len(outputFrames)):
    #reading each files
    image = cv2.imread('output/'+outputFrames[index])
    height, width = image.shape[:2]
    size = (width,height)
    
    #inserting the frames into an image array
    frames.append(image)
  
  size = (width,height)
  outputFile = videoPath[:-4]+'_yolo_out_py.mp4'
  # initialize our video writer
  fourcc = cv2.VideoWriter_fourcc(*'DIVX')
  writer = cv2.VideoWriter(outputFile,fourcc, 29, size)

  for index in range(len(frames)):
    # writing to a image array
         writer.write(frames[index])
  writer.release()


def startVideo(videoPath):
    outputFile = "yolo_out_py.avi"
    net, classes, colors, outputLayers = loadYolo()
    # initialize the video stream, pointer to output video file, and
    # frame dimensions
    capture = cv2.VideoCapture(videoPath)
    outputFile = videoPath[:-4]+'_yolo_out_py.avi'
    writer = None
    (width,height) = (None, None)
    i= 0
    # loop over frames from the video file stream
    while(True):
        # Capture the video frame by frame from the file
        (grabbed,frame) = capture.read()
        # if the frame was not grabbed, then we have reached the end
        #of the stream
        if not grabbed:
          print("Done processing !!!")
          print("Output file is stored as ", outputFile)
          break

        # if the frame dimensions are empty, grab them
        if height is None or width is None:
            height,width = frame.shape[:2]
            
        # construct a blob from the input frame and then perform a forward
	      # pass of the YOLO object detector, giving us our bounding boxes
	      # and associated probabilities
        blob, layerOutputs = detectObjects(frame, net, outputLayers)
        boxes, confidences, classIDs = getBoundingbox(layerOutputs, height, width)
        # apply non-maxima suppression to suppress weak, overlapping
        # bounding boxes with lower confidences
        indices = cv2.dnn.NMSBoxes(boxes, confidences, confidenceThreshold, nmsThreshold)
        font = cv2.FONT_HERSHEY_PLAIN
        # ensure at least one detection exists
        if len(indices) > 0:
            #loop over indices we are keeping
            for index in indices.flatten():
                # extract the bounding box coordinates
                (x, y) = (boxes[index][0], boxes[index][1])
                (width,height) = (boxes[index][2], boxes[index][3])

                # Draw a bounding box
                text = "{}: {:.4f}".format(classes[classIDs[index]],confidences[index])
                color = [int(c) for c in colors[classIDs[index]]]
                cv2.rectangle(frame, (x,y), (x+width, y+height), color,2)
                cv2.putText(frame, text, (x, y - 5), font, 1, color, 1)
   
        #Check if the vivideo writer is None
        if writer is None:
            # initialize our video writer
            fourcc = cv2.VideoWriter_fourcc(*'DIVX')
            writer = cv2.VideoWriter(outputFile,fourcc, 29, size)

        # write the output frame to disk
        writer.write(frame.astype(np.uint8))
        print("[INFO] frame",i)
        i +=1
        if cv2.waitKey(1) & 0xFF == ord('s'):
            break
    # release the file pointers
    print("[INFO] cleaning up...")
    writer.release()
    capture.release()

startVideo(args.video)
# Closes all the frames
cv2.destroyAllWindows()
   
print("The video was successfully saved")


Overwriting yolo.py




### Scripting yoloF.py


In [91]:
%%writefile yoloF.py
##### SCRIPT STARTS HERE #####
#!usr/bin/bash python
# Importing required packages
#!python3 yoloF.py --video=/content/gdrive/MyDrive/yoloV3/PP.mp4

# import the necessary packages
import numpy as np
import argparse
import time
import cv2
import os
import re
from google.colab.patches import cv2_imshow

# Initialize the parameters
confidenceThreshold = 0.3  #Confidence threshold
nmsThreshold = 0.4         #Non-maximum suppression threshold
inputWidth = 416           #Width of network's input image
inputHeight = 416          #Height of network's input image

# construct the argument parse and parse the arguments
parser = argparse.ArgumentParser(description='Object Detection using YOLO in OPENCV')
parser.add_argument('--image', help="True/False", default=False)
parser.add_argument('--video', help="Path to video file", default="videos/car_on_road.mp4")
parser.add_argument('--verbose', help="To print statements", default=True)
args = parser.parse_args()

#Load YOLO V3

def loadYolo():

    # load our YOLO object detector trained on COCO dataset (80 classes)
    print("[INFO] loading YOLO from disk...")
    # derive the paths to the YOLO weights and model configuration
    configPath = '/content/gdrive/MyDrive/yoloV3/yolov3.cfg'
    weightsPath = '/content/gdrive/MyDrive/yoloV3/yolov3.weights'
    net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
    #cv2.dnn.readNetFromDarknet("yolo3.weights","yolov3.cfg")

    # load the COCO class labels our YOLO model was trained on
    classes=[]
    # load the COCO class labels our YOLO model was trained on
    classesPath = '/content/gdrive/MyDrive/yoloV3/coco.names'
    with open(classesPath, "r") as f:
      classes = [line.strip() for line in f.readlines()]

    # initialize a list of colors to represent each possible class label
    np.random.seed(42)
    colors = np.random.uniform(0, 255, size=(len(classes), 3))
    #np.random.randint(0, 255, size=(len(LABELS), 3),dtype="uint8")
    

    # and determine only the *output* layer names that we need from YOLO
    
    # Get the names of all the layers in the network
    layersNames = net.getLayerNames()
    # Get the names of the output layers, i.e. the layers with unconnected outputs
    outputLayers = [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
    
    return net, classes, colors, outputLayers

def detectObjects(image,net,outputLayers):
    # Create a 4D blob from a frame and then perform a forward
    # pass of the YOLO object detector, giving us our bounding boxes
    # and associated probabilities
    blob = cv2.dnn.blobFromImage(image, 1 / 255.0, size=(inputWidth, inputHeight), mean=(0, 0, 0), swapRB=True, crop=False)
    net.setInput(blob)
    layerOutputs = net.forward(outputLayers)
    return blob, layerOutputs

def getBoundingbox(layerOutputs, height, width):
    # initialize our lists of detected bounding boxes, confidences,
    # and class IDs, respectively
    boxes = []
    confidences = []
    classIDs = []

    # loop over each of the layer outputs
    for output in layerOutputs:
        #loop over each of the detections
        for detection in output:
            # extract the class ID and confidence (i.e., probability)
            # of the current object detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]

            #filter out weak predictions by ensuring the detected
            # probability is greater than the minimum probability

            if confidence > confidenceThreshold:
                # scale the bounding box coordinates back relative to
                # the size of the image, keeping in mind that YOLO
                # actually returns the center (x, y)-coordinates of
                # the bounding box followed by the boxes' width and
                # height
                box = detection[0:4] * np.array([width, height, width, height])
                (centerX, centerY, width, height) = box.astype("int")
                
                # use the center (x, y)-coordinates to derive the top
                # and and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))

                # update our list of bounding box coordinates,
                # confidences, and class IDs
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)

    return boxes, confidences, classIDs


# Draw the predicted bounding boxes

#def drawPredictedBB(boxes, confidences, colors, classIDs,classes, image):
    # apply non-maxima suppression to suppress weak, overlapping
    # bounding boxes with lower confidences
 #   indices = cv2.dnn.NMSBoxes(boxes, confidences, confidenceThreshold, nmsThreshold)
 #   font = cv2.FONT_HERSHEY_PLAIN
    
    # ensure at least one detection exists
 #   if len(indices) > 0:
        #loop over indices we are keeping
 #       for index in indices.flatten():
            # extract the bounding box coordinates
 #           (x, y) = (boxes[index][0], boxes[index][1])
 #           (width,height) = (boxes[index][2], boxes[index][3])

            # Draw a bounding box
 #           text = "{}: {:.4f}".format(classes[classIDs[index]],confidences[index])
 #           color = [int(c) for c in colors[classIDs[index]]]
 #           cv2.rectangle(image, (x,y), (x+width, y+height), color,2)
 #           cv2.putText(image, text, (x, y - 5), font, 1, color, 1)
 #
def videoToFrames(videoPath): 
    frameCount=0
    inputFrames=[]
    capture = cv2.VideoCapture(videoPath)
    while(True):
        # Capture the video frame by frame from the file
        (grabbed,frame) = capture.read()
        # if the frame was not grabbed, then we have reached the end
        #of the stream
        if not grabbed:
          print("[INFO] All frames appended !!!")
          break
        inputFrames.append(frame)
        frameCount = frameCount + 1
        cv2.imwrite('inputFrames/'+str(frameCount)+'.png', frame)
    print("[INFO] inputFrames directory formed successfully")
    return inputFrames 

def framesToVideo():
  outputFrames = os.listdir('output/')
  outputFrames.sort(key=lambda f: int(re.sub('\D', '', f)))
  frames=[]
  writer = None
  (width,height)= (None, None)
  for index in range(len(outputFrames)):
    #reading each files
    image = cv2.imread('output/'+outputFrames[index])
    height, width = image.shape[:2]
    size = (width,height)
    
    #inserting the frames into an image array
    frames.append(image)
  
  size = (width,height)
  outputFile = videoPath[:-4]+'_yolo_out_py.mp4'
  # initialize our video writer
  fourcc = cv2.VideoWriter_fourcc(*'DIVX')
  writer = cv2.VideoWriter(outputFile,fourcc, 29, size)

  for index in range(len(frames)):
    # writing to a image array
         writer.write(frames[index])
  writer.release()
  print("[INFO] Frames are stitched to video successfully")


def personDetection(videoPath):
  net, classes, colors, outputLayers = loadYolo()
  person = []
  for i in range(len(inputFrames)):
    frame = cv2.imread('inputFrames/'+str(i+1)+'.png')
    height,width = frame.shape[:2]
    # construct a blob from the input frame and then perform a forward
	  # pass of the YOLO object detector, giving us our bounding boxes
	  # and associated probabilities
    blob, layerOutputs = detectObjects(frame, net, outputLayers)
    boxes, confidences, classIDs = getBoundingbox(layerOutputs, height, width)
    # apply non-maxima suppression to suppress weak, overlapping
    # bounding boxes with lower confidences
    indices = cv2.dnn.NMSBoxes(boxes, confidences, confidenceThreshold, nmsThreshold)
    font = cv2.FONT_HERSHEY_PLAIN

    for index in range(len(boxes)):
      if index in indices:
        label = str(classes[classIDs[index]])
        if label == 'person':
          # extract the bounding box coordinates
          (x, y) = (boxes[index][0], boxes[index][1])
          (w,h) = (boxes[index][2], boxes[index][3])
            
          # Draw a bounding box
          text = "{}: {:.4f}".format(classes[classIDs[index]],confidences[index])
          color = colors[index]
          cv2.rectangle(frame, (x,y), (x+w, y+h), color,2)
          cv2.putText(frame, text, (x, y - 5), font, 1, color, 1)
          person.append([i,x, y, int(w), int(h)])  

      cv2.imwrite('output/'+str(i+1)+'.png', frame)  

videoPath = args.video
inputFrames= videoToFrames(videoPath)
person = personDetection(videoPath)
framesToVideo()

# Closes all the frames
cv2.destroyAllWindows()
   
print("The video was successfully saved")


Overwriting yoloF.py


In [None]:
!python3 yoloF.py --video=/content/gdrive/MyDrive/yoloV3/car_chase_02.mp4



## Explaining yoloF.py
### Importing Necessary packages

```
import numpy as np 
```


In [92]:
import numpy as np
import argparse
import time
import cv2
import os
import re
# from google.colab.patches import cv2_imshow

### Pre requisites

1.   Weights (yolov3.weights) : https://pjreddie.com/darknet/yolo
2.   Configuration (yolov3.cfg): https://pjreddie.com/darknet/yolo
3. Classes(coco.names) :  https://github.com/pjreddie/darknet/blob/master/data/coco.names

The model has been trained for different sizes of images. We will download the weights and cfg files for YOLOv3–416 for now.

Saved it in /content/gdrive/MyDrive/yoloV3.



### Initialize the parameters

In [93]:
# Initialize the parameters
confidenceThreshold = 0.3  #Confidence threshold
nmsThreshold = 0.4         #Non-maximum suppression threshold
inputWidth = 416           #Width of network's input image
inputHeight = 416          #Height of network's input image

### Define functions


1.  loadYolo



*   INPUT :
*   OUTPUT : net, classes, colors, outputLayers


We will load **YoloV3 weights and configuration file**
with the help of **dnn module of OpenCV**. The **coco.names** file contains the names of the different objects that our model has been trained to identify. We store them in a list called **classes**. 

Steps are explained along with the code.

---

2.   detectObjects

*   INPUT : image,net,outputLayers
*   OUTPUT : blob, layerOutputs

----

3.  getBoundingbox


*   INPUT : layerOutputs, height, width
*   OUTPUT : boxes, confidences, classIDs

---

4.  videoToFrames

*   INPUT : path to the video file
*   OUTPUT : input frames will be stored in inputFrames folder and we will get the array of inputFrames

We can forgo the storage of frames in inputFrame file if not required.


---

5.   framesToVideo


*   INPUT : path to the video file
*   OUTPUT : output file will be created in the directory

---

6.  personDetection

*   INPUT : inputFrames which is the output of videoToFrames()
*   OUTPUT : person- bounding box coordinates of the people






In [120]:
#Load YOLO V3
def loadYolo():

    # load our YOLO object detector trained on COCO dataset (80 classes)
    print("[INFO] loading YOLO from disk...")
    # derive the paths to the YOLO weights and model configuration
    configPath = '/content/gdrive/MyDrive/yoloV3/yolov3.cfg'
    weightsPath = '/content/gdrive/MyDrive/yoloV3/yolov3.weights'
    net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
    #cv2.dnn.readNetFromDarknet("yolo3.weights","yolov3.cfg")

    # load the COCO class labels our YOLO model was trained on
    classes=[]
    # load the COCO class labels our YOLO model was trained on
    classesPath = '/content/gdrive/MyDrive/yoloV3/coco.names'
    with open(classesPath, "r") as f:
      classes = [line.strip() for line in f.readlines()]

    # initialize a list of colors to represent each possible class label
    np.random.seed(42)
    colors = np.random.uniform(0, 255, size=(len(classes), 3))
    #np.random.randint(0, 255, size=(len(LABELS), 3),dtype="uint8")
    

    # and determine only the *output* layer names that we need from YOLO
    
    # Get the names of all the layers in the network
    layersNames = net.getLayerNames()
    # Get the names of the output layers, i.e. the layers with unconnected outputs
    outputLayers = [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
    
    return net, classes, colors, outputLayers

In [121]:
def detectObjects(image,net,outputLayers):
    # Create a 4D blob from a frame and then perform a forward
    # pass of the YOLO object detector, giving us our bounding boxes
    # and associated probabilities
    blob = cv2.dnn.blobFromImage(image, 1 / 255.0, size=(inputWidth, inputHeight), mean=(0, 0, 0), swapRB=True, crop=False)
    net.setInput(blob)
    layerOutputs = net.forward(outputLayers)
    return blob, layerOutputs

In [122]:
def getBoundingbox(layerOutputs, height, width):
    # initialize our lists of detected bounding boxes, confidences,
    # and class IDs, respectively
    boxes = []
    confidences = []
    classIDs = []

    # loop over each of the layer outputs
    for output in layerOutputs:
        #loop over each of the detections
        for detection in output:
            # extract the class ID and confidence (i.e., probability)
            # of the current object detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]

            #filter out weak predictions by ensuring the detected
            # probability is greater than the minimum probability

            if confidence > confidenceThreshold:
                # scale the bounding box coordinates back relative to
                # the size of the image, keeping in mind that YOLO
                # actually returns the center (x, y)-coordinates of
                # the bounding box followed by the boxes' width and
                # height
                box = detection[0:4] * np.array([width, height, width, height])
                (centerX, centerY, width, height) = box.astype("int")
                
                # use the center (x, y)-coordinates to derive the top
                # and and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))
#(x,y) -----------------------
#|                            |
#|     (centerX, centerY )    |height
#|                            |
#-----------------------------(x + width, y+height)
# <------width--------------->
                # update our list of bounding box coordinates,
                # confidences, and class IDs
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)

    return boxes, confidences, classIDs

In [123]:
def videoToFrames(videoPath): 
    frameCount=0
    inputFrames=[]
    capture = cv2.VideoCapture(videoPath)
    while(True):
        # Capture the video frame by frame from the file
        (grabbed,frame) = capture.read()
        # if the frame was not grabbed, then we have reached the end
        #of the stream
        if not grabbed:
          print("[INFO] All frames appended !!!")
          break
        inputFrames.append(frame)
        frameCount = frameCount + 1
        cv2.imwrite('inputFrames/'+str(frameCount)+'.png', frame)
    print("[INFO] inputFrames directory formed successfully")
    return inputFrames

In [124]:
def framesToVideo(videoPath):
  outputFrames = os.listdir('output/')
  outputFrames.sort(key=lambda f: int(re.sub('\D', '', f)))
  frames=[]
  writer = None
  (width,height)= (None, None)
  for index in range(len(outputFrames)):
    #reading each files
    image = cv2.imread('output/'+outputFrames[index])
    height, width = image.shape[:2]
    size = (width,height)
    
    #inserting the frames into an image array
    frames.append(image)
  
  size = (width,height)
  outputFile = videoPath[:-4]+'_yolo_out_py.mp4'
  # initialize our video writer
  fourcc = cv2.VideoWriter_fourcc(*'DIVX')
  writer = cv2.VideoWriter(outputFile,fourcc, 29, size)

  for index in range(len(frames)):
    # writing to a image array
         writer.write(frames[index])
  writer.release()
  print("[INFO] Frames are stitched to video successfully")

In [125]:
def personDetection(inputFrames):
  net, classes, colors, outputLayers = loadYolo()
  person = []
  for i in range(len(inputFrames)):
    frame = cv2.imread('inputFrames/'+str(i+1)+'.png')
    height,width = frame.shape[:2]
    # construct a blob from the input frame and then perform a forward
	  # pass of the YOLO object detector, giving us our bounding boxes
	  # and associated probabilities
    blob, layerOutputs = detectObjects(frame, net, outputLayers)
    boxes, confidences, classIDs = getBoundingbox(layerOutputs, height, width)
    # apply non-maxima suppression to suppress weak, overlapping
    # bounding boxes with lower confidences
    indices = cv2.dnn.NMSBoxes(boxes, confidences, confidenceThreshold, nmsThreshold)
    font = cv2.FONT_HERSHEY_PLAIN

    for index in range(len(boxes)):
      if index in indices:
        label = str(classes[classIDs[index]])
        if label == 'person':
          # extract the bounding box coordinates
          (x,y) = (boxes[index][0], boxes[index][1])
          (w,h) = (boxes[index][2], boxes[index][3])
            
          # Draw a bounding box
          text = "{}: {:.4f}".format(classes[classIDs[index]],confidences[index])
          color = colors[index]
          cv2.rectangle(frame, (x,y), (x+w, y+h), color,2)
          cv2.putText(frame, text, (x, y - 5), font, 1, color, 1)
          person.append([i,x, y, int(w), int(h)])  

      cv2.imwrite('output/'+str(i+1)+'.png', frame)
  return person

In [130]:
def finalResult(videoPath):
  inputFrames= videoToFrames(videoPath)
  person = personDetection(inputFrames)
  framesToVideo(videoPath)
  print("[INFO] Image detection done successfully")
  return person

### Desired result
finalResult("videoPath")



In [174]:
personBB = finalResult("car_chase_02.mp4")

[INFO] All frames appended !!!
[INFO] inputFrames directory formed successfully
[INFO] loading YOLO from disk...
[INFO] Frames are stitched to video successfully
[INFO] Image detection done successfully


### The Bounding box coordinates of the persons in the video

[index, x, y, width, height]

In [None]:
personBB

In [170]:
personPedestrians = finalResult("pedestrians.mp4")

[INFO] All frames appended !!!
[INFO] inputFrames directory formed successfully
[INFO] loading YOLO from disk...
[INFO] Frames are stitched to video successfully
[INFO] Image detection done successfully


### Checking out whether the bounding box result is right

In [None]:
from google.colab.patches import cv2_imshow
image = cv2.imread('inputFrames/'+str(100)+'.png')
cv2_imshow(image)

In [None]:
image=cv2.rectangle(image, (100,100), (300, 300), (0,0,255), 2)
cv2_imshow(image)

In [138]:
personPark = finalResult("park.mp4")

[INFO] All frames appended !!!
[INFO] inputFrames directory formed successfully
[INFO] loading YOLO from disk...
[INFO] Frames are stitched to video successfully
[INFO] Image detection done successfully


In [None]:
personPark

### Distance meassurement using centroids 



We have an array of arrays with elements [index, x, y, width, height] 
where

> *    index - frame index
*   (x, y) - coordinates to derive the top and left corner of the bounding box
*   width, height - width and height of the bounding box



In [141]:
personBB[0]

[0, 639, 375, 67, 93]

In [None]:
def distance()

In [None]:
def distanceObject(boundingBoxList){
    
}
x, y, w, h = bboxes[0], bboxes[1], bboxes[2], bboxes[3]