# GRIP 2020

### Name    : Ashutosh Dattatray Salekar
### Task 1   : Object Detection in image and video
### Github  : https://github.com/ashutosh-salekar/GRIP-Tasks/tree/main/Task1

# Prerequisite: 
- You Only Look Once or more popularly known as YOLO is one of the fastest real-time object detection algorithm 
   (45 frames per seconds) as compared to R-CNN family (R-CNN, Fast R-CNN, Faster R-CNN, etc.)
   
   
- We have two options to get started with object detection:
   - Using the pre-trained model
   - Training custom object detector from scratch
   
   
- In this code, we will be looking at creating an object detector using the pre-trained model for images, 
   videos and real-time webcam.
   
   
- As we are going with pre-trained model approach we will also need to download a couple of files which includes
   the pre-trained weights of YoloV3, the configuration file and names file.


In [1]:
import numpy as np     # version : 1.19.2
import cv2             # version : 4.4.0

In [2]:
# coco.names file contains the names of the different objects that our model has been trained to identify.
f = open("coco.names", 'r')
classes = f.read().rstrip("\n").split("\n")

In [3]:
# I am loading the YoloV3 weights and configuration file with the help of dnn module of OpenCV.
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')

In [4]:
cap = cv2.VideoCapture(r"car.mp4")    # Load the video file 

while True:
    _, img = cap.read()
    height, width, _ = img.shape

    # We need to preprocess our data hence we useblobFromImage. The function perform scaling, mean subtraction and channel swap
    blob = cv2.dnn.blobFromImage(img, 1 / 255, (416, 416), (0, 0, 0), swapRB=True, crop=False)

    """
    # optinal part
    for b in blob:
        for n,img_blob in enumerate(b):
            cv2.imshow(str(n),img_blob)
    """

    net.setInput(blob)
    output_layers_names = net.getUnconnectedOutLayersNames()
    
    """
    The forward() function of cv2.dnn module returns a nested list containing information about all the detected objects
    which includes the x and y coordinates of the centre of the object detected, height and width of the bounding box,
    confidence and scores for all the classes of objects listed in coco.names.
    """
    layerOutputs = net.forward(output_layers_names)

    boxes = []
    confidences = []
    class_ids = []

    for output in layerOutputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)  # returns index of max value
            confidence = scores[class_id]

            if confidence > 0.5:
                """
                #This are normalize values because of "blob" function hence we have to scale it them to original
                 values again. For that we will multiply them with width and height of original image.

                center_x  = int(detection[0]) 
                center_y =  int(detection[1])
                w = int(detection[2])
                h = int(detection[3]) 
                """
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)

                x = int(center_x - w / 2)
                y = int(center_y - h / 2)

                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)
    
    """
    Although we removed the low confidence bounding boxes, there is a possibility that we will still 
    have duplicate detections around an object.
    To fix this situation we’ll need to apply Non-Maximum Suppression (NMS). We pass in confidence threshold value 
    and NMS threshold value as parameters to select one bounding box.
    
    """
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)  # Removing unnecessary boxes

    
    font = cv2.FONT_HERSHEY_PLAIN
    colors = np.random.uniform(0, 255, size=(len(boxes), 3))  # Random colors for bounding boxes

    for i in indexes.flatten():
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        confidence = str(round(confidences[i], 2))
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, label + " " + confidence, (x, y - 5), font, 0.8, (255, 255, 255), 2)

    cv2.imshow("image", img)
    key = cv2.waitKey(1)
    if key ==27:
        break

cap.release()
cv2.destroyAllWindows()