# Object Detection

In this notebook I will be using the powerful YOLOv3 model to detect objects. Many of the ideas in this notebook are described in the YOLO v3 paper: Redmon et al., 2018 (https://arxiv.org/abs/1804.02767). THe particular model I will be using is YOLO trained on the COCO dataset.

THe COCO dataset consits of 80 labels, including, but not limited to:

* People
* Bicycles
* Cars and Trucks
* Airplanes
* Stop Signs and Fire Hydrants
* Animals, including cats, dogs, birds, horses, cows, and sheep, to name a few
* Kitchen and dining objects, such as wine glasses, cups, forks, knives, spoons, etx. 
* ...and much more!

You can find a full list of what YOLO trained on the COCO dataset can detect using this link (https://github.com/pjreddie/darknet/blob/master/data/coco.names).

**Objective**:
- Use object detection on a car detection dataset
- Deal with bounding boxes
- Use object dection on various objects

### Steps to Accomplish This:
1. Define path to image we will train on and configuration files relevant to YOLO model (This step requires user input)
2. Define Labels, colors for bounding boxes, and load YOLOv3 along with it's pretrained weights
3. Load images and send through network
4. Calculate object bounding boxes
5. Apply non-max suppression to overlapping bounding boxes
6. Display Image with Bouding Boxes

In [1]:
# Import the required packages
import numpy as np
import argparse
import time
import cv2
import os

In [2]:
#Function to request paths for video to train on and necessary YOLO model configuration files
def enter_required_paths():
    
    video_path = input("Base path to Image we will train on ")
    yolo_labels_path = input("Base path to YOLO stored labels , looking for coco.names ")
    yolo_weights_path = input("Base path to YOLO pretrained weights, looking for yolov3.weights ")
    yolo_config_path = input("Base path to YOLO configuration file, looking for yolov3.cfg ")
    
    return video_path,yolo_labels_path,yolo_weights_path,yolo_config_path

In [3]:
#Initialize rrequired paths
video_path,yolo_labels_path,yolo_weights_path,yolo_config_path = enter_required_paths()

Base path to Image we will train on ./video/overpass.mp4
Base path to YOLO stored labels , looking for coco.names yolo-coco/coco.names
Base path to YOLO pretrained weights, looking for yolov3.weights yolo-coco/yolov3.weights
Base path to YOLO configuration file, looking for yolov3.cfg yolo-coco/yolov3.cfg


### Define Labels, colors for bounding boxes, and load YOLOv3 along with it's pretrained weights

In [4]:
#Define path to label, read file and store labels in 'LABELS'
labelsPath = os.path.abspath(yolo_labels_path)
classLabels = open(labelsPath).read().strip().split("\n")

# # initialize a list of colors to represent each possible class label
np.random.seed(42)
classColors = np.random.randint(0,255, size=(len(classLabels),3),dtype="uint8")

In [17]:
# derive the paths to the YOLO weights and model configuration
weightsPath = os.path.abspath(yolo_weights_path)
configPath = os.path.abspath(yolo_config_path)
 
# load our YOLO object detector trained on COCO dataset (80 classes).
#Using OpenCV's DNN function called cv2.dnn.readNetFromDarknet
print("Loading YOLOv3 from disk...")
yolo_net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

# determine only the *output* layer names that we need from YOLO
ln_outer_names = yolo_net.getUnconnectedOutLayersNames()

Loading YOLOv3 from disk...


### Load Video and send through network

In [40]:
video_path = input("Base path to video we will train on ")

Base path to video we will train on /home/mohammad/Documents/Deeplearning_ai/Projects/Object Detection/videos/road_video.mp4


In [41]:
output_path = input("Output path for video we train on ")

Output path for video we train on /home/mohammad/Documents/Deeplearning_ai/Projects/Object Detection/output/road_video.avi


In [42]:
# initialize the video stream, pointer to output video file, and
# frame dimensions

#Open a file point to the video file for reading frames
vs = cv2.VideoCapture(video_path)

#Intialize Video Writer and Frame Dimensions
writer = None
(W, H) = (None, None)

In [43]:
# loop over frames from the video file stream
while True:
    # read the next frame from the file
    (grabbed, frame) = vs.read()

    # if the frame was not grabbed, then we have reached the end
    # of the stream
    if not grabbed:
        break

    # if the frame dimensions are empty, grab them
    if W is None or H is None:
        (H, W) = frame.shape[:2]

    # construct a blob from the input frame and then perform a forward
    # pass of the YOLO object detector, giving us our bounding boxes
    # and associated probabilities
    blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
        swapRB=True, crop=False)
    yolo_net.setInput(blob)
    start = time.time()
    layerOutputs = yolo_net.forward(ln_outer_names)
    end = time.time()

    # initialize our lists of detected bounding boxes, confidences,
    # and class IDs, respectively
    boxes = []
    confidences = []
    classIDs = []

    # loop over each of the layer outputs
    for output in layerOutputs:
        # loop over each of the detections
        for detection in output:
            # extract the class ID and confidence (i.e., probability)
            #Number of current Object Detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]

            # filter out weak predictions by ensuring the detected
            # probability is greater than the minimum probability
            if confidence > 0.5:
                # scale the bounding box coordinates back relative to
                # the size of the image, keeping in mind that YOLO
                # actually returns the center (x, y)-coordinates of
                # the bounding box followed by the boxes' width and
                # height
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype("int")

                # use the center (x, y)-coordinates to derive the top
                # and and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))

                # update our list of bounding box coordinates,
                # confidences, and class IDs
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)

    # apply non-maxima suppression to suppress weak, overlapping
    # bounding boxes
    idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.5,
        0.3)

    # ensure at least one detection exists
    if len(idxs) > 0:
        # loop over the indexes we are keeping
        for i in idxs.flatten():
            # extract the bounding box coordinates
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])

            # draw a bounding box rectangle and label on the frame
            color = [int(c) for c in classColors[classIDs[i]]]
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            text = "{}: {:.4f}".format(classLabels[classIDs[i]],
                confidences[i])
            cv2.putText(frame, text, (x, y - 5),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

    # check if the video writer is None
    if writer is None:
        # initialize our video writer
        fourcc = cv2.VideoWriter_fourcc(*"MJPG")
        writer = cv2.VideoWriter(output_path, fourcc, 30,
            (frame.shape[1], frame.shape[0]), True)

        # some information on processing single frame
        if total > 0:
            elap = (end - start)
            print("[INFO] single frame took {:.4f} seconds".format(elap))
            print("[INFO] estimated total time to finish: {:.4f}".format(
                elap * total))

    # write the output frame to disk
    writer.write(frame)