# Object Detection using OpenCV

This Code Template is for Object Detection through webcam video capture using OpenCV library in python. Object Detection is A computer vision technique that deals with detecting object/s (face, eye, any inanimate object) in an image or video. This technique draws boundary or a bounding box around target object and may also include their target label. It has many real-life applications like image retrieval and video surveillance.

### **Required Packages**

In [None]:
!pip install opencv-python

In [1]:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import imutils
from imutils.video import VideoStream
import time

## **Initialization**

### Image Labels

Here, the COCO dataset is used for image labeling since the model YOLO is trained on it. This dataset is popular for Object Detection and constitutes 80 labels including oven, toaster, bench, car, etc.

The file is downloadable at [coco.names](https://opencv-tutorial.readthedocs.io/en/latest/_downloads/a9fb13cbea0745f3d11da9017d1b8467/coco.names)

In [2]:
classFile = '' # Path to labels

In [3]:
classes = []
with open(classFile, 'rt') as f:
    classes = f.read().rstrip('\n').split('\n')

### Model 
OpenCV uses the function <Code>cv2.dnn.readNet()</Code> to load a pre-trained weights and network configuration of supported format and build object detection model. This function automatically detects an origin framework of trained model and calls an appropriate function such readNetFromCaffe, readNetFromTensorflow, readNetFromTorch or readNetFromDarknet. 

#### Model Tuning Parameters
1. **model**: const String &  	
>Binary file contains trained weights. 

2. **config**: const String &
>Text file contains network configuration

3. **framework**: const String & 
>Explicit framework name tag to determine a format.

More  details at the [API](https://docs.opencv.org/4.5.1/d6/d0f/group__dnn.html)

#### YOLO
In this tutorial, the model used is YOLO — You Only Look Once. It is an extremely fast multi object detection algorithm which uses convolutional neural network (CNN) to detect and identify objects.

>**config_path**: [YOLO Configuration File](https://opencv-tutorial.readthedocs.io/en/latest/_downloads/10e685aad953495a95c17bfecd1649e5/yolov3.cfg)

>**weights_path**: [YOLO Weights](https://pjreddie.com/media/files/yolov3.weights)

Paths to configuration and weight files

In [4]:
config_path = '' # configuration path
weights_path = '' # weights path

In [5]:
net = cv2.dnn.readNet(config_path, weights_path)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)

# determine the output layer
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

### BLOB
OpenCv object detection models take input images as BLOBs. A BLOB is a binary large object (BLOB) is a collection of binary data stored as a single entity.

<Code>cv2.dnn.blobFromImage()</Code> 
creates 4-dimensional blob from image. Optionally resizes and crops image from center, subtract mean values, scales values by scalefactor, swap Blue and Red channels.

#### Parameters

1. **image**: InputArray 	
>input image (with 1-, 3- or 4-channels).

2. **size**: const Size &
>spatial size for output image


3. **scalefactor**:	double 
>multiplier for image values.

4. **swapRB**: bool 
>flag which indicates that swap first and last channels in 3-channel image is necessary.

5. **crop**: bool 	
>flag which indicates whether image will be cropped after resize or not


More details at the [API](https://docs.opencv.org/4.5.1/d6/d0f/group__dnn.html)

## **Inference**
The input video stream is captured through webcam and each frame is processed through the Following code section [[Reference](https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html)] and draws bounding box and a corresponding confidence score around an object. Confidence score is the probability that a bounding box contains an object. 

<Code>cv.dnn.NMSBoxes</Code> performs non maximum suppression given boxes and corresponding scores.

#### Parameters
1. **bboxes**: const std::vector< Rect > &
>a set of bounding boxes to apply NMS.
2. **scores**: 	const std::vector< float > & 
>a set of corresponding confidences.
3. **score_threshold**: const float 
>a threshold used to filter boxes by score.
4. **nms_threshold**: 	const float 
>a threshold used in non maximum suppression.

More details at the [API](https://docs.opencv.org/4.5.1/d6/d0f/group__dnn.html)

In [6]:
# initialize the video stream and allow the camera sensor to warm up
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)

while True:
    # grab the frame from the threaded video stream and resize it
    # to have a maximum width of 400 pixels
    frame = vs.read()
    frame = imutils.resize(frame, width=400)

    boxes = []
    confidences = []
    classIDs = []

    # grab the frame dimensions and convert it to a blob
    (h, w) = frame.shape[:2]

    blob = cv2.dnn.blobFromImage(image = frame, 
                             scalefactor = 1/255.0, 
                             size = (256, 256), 
                             swapRB=True, 
                             crop=False)

    # pass the blob through the network and obtain the detections and
    # predictions
    net.setInput(blob)
    outputs = net.forward(output_layers)

    # random colors for bounding box
    colors = np.random.randint(0, 255, size=(len(classes), 3), dtype='uint8') #np.full((len(classes), 3), 255, dtype='uint8')

    # Bounding Box and Confidence Score
    for output in outputs:
        for detection in output:
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
            if confidence > 0.5:
                box = detection[:4] * np.array([w, h, w, h])
                (centerX, centerY, width, height) = box.astype("int")
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))
                box = [x, y, int(width), int(height)]
                boxes.append(box)
                confidences.append(float(confidence))
                classIDs.append(classID)

    # Non Maximum Suppression 
    indices = cv2.dnn.NMSBoxes(bboxes = boxes, 
                            scores = confidences, 
                            score_threshold = 0.4, 
                            nms_threshold  = 0.3)
    if len(indices) > 0:
        for i in indices.flatten():
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
            color = [int(c) for c in colors[classIDs[i]]]
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            text = "{}: {:.4f}".format(classes[classIDs[i]], confidences[i])
            cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_DUPLEX, 0.7, color, 2)

    # show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

[INFO] starting video stream...


#### Creator: Vamsi Mukkamala , Github: [Profile](https://github.com/vmc99)