# Detecting objects with YOLO3 on Coco dataset

YOLO (you only look once) is a model of object detection proposed by [Redmon et al. in 2015](https://pjreddie.com/media/files/papers/yolo_1.pdf). From your first version, YOLO had modification in your backbone, nowadays the YOLO family counts with five different versions (the two last version do not come from the first authors, [YOLOv4](https://arxiv.org/pdf/2004.10934.pdf) and [YOLOv5](https://github.com/ultralytics/yolov5)). 

YOLO is an unified model, it means, the model consider one single Convolutional Neural Network  to detect objects and classify them. Before YOLO, there were some models of object detection, but these models detect object in two step, first they found the possible bounding box and then, they make your class classification. 

YOLO has changed the way to detect object, becoming one of the most powerful real-time object detection, more stronger and faster than other models as SSD or Faster R-CNN. On the other hand, YOLO might not work well for small objects and, the accuracy sometimes tends to worse in comparison with SDD and Faster R-CNN. 

This notebook do not intends to give a detailed explanation about how the model is built and trained, on the contrary, here we consider a pre-trained [YOLOv3 from DarkNet](https://pjreddie.com/media/files/papers/YOLOv3.pdf) to make prediction over the classes contained in the COCO dataset, where the model was trained. The main objective is to explore the trained model and, use OpenCV to print out the detection on images and video streaming.

**OBS.:** We do not find the file with the YOLOv3 weights on this GitHub repository due to your size extension, to download it, [click here](https://pjreddie.com/media/files/yolov3.weights).

## Importing libraries

In [1]:
import numpy as np
import time
import cv2
import os
import imutils
import matplotlib.pyplot as plt
from imutils import paths

## Setting the model

In [2]:
# Load the COCO class labels that our YOLO model was trained on
yolo = "yolo-coco"
labelsPath = os.path.sep.join([yolo, "coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")
#Coco dataset has 80 classes

In [3]:
# Initialize the color list to represent each possible class label
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3), dtype="uint8") # color to the bounding box

In [4]:
# Derive the paths to the YOLO weights and model configuration
weightsPath = os.path.sep.join([yolo, "yolov3.weights"])
configPath = os.path.sep.join([yolo, "yolov3.cfg"])

In [5]:
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath) # defining the model with opencv dnn

In [6]:
# setting the paths for all images to be tested
dataset = "../dataset/images"
pathImages = list(paths.list_images(dataset))

## Main loop over the image

The Object Detection with a pre-trained model on OpenCV is made by some steps

- Read the image and grab the height and width
- Get the final layer of the YOLOv3 to make prediction
- Pass the input image to extract the blobs (images transformation for the detection) and set it in the model
- Make the predictions with the attribute forward, this returns all prediction made by the model
- Loop to extract all detection score, classID and confidence associated to a certain class
- If condition to select the highest confidences, extract the coordinates of the bounding box, normalize the coordinates, creating lists of boxes, confidences and classID.
- Once we have all detections, we consider NomMax suppress to eliminate the boxes that exceeds, selecting the box with the highest IOU score
- From the list gave by the NonMax suppress, if there's considerable detection, we extract the box coordinates and print it on the image
- Finally, we visualize the detection with your respective bounding box and confidence.

In [None]:
plt.figure(figsize=(120, 120))
count = 1
for img in pathImages:
    
    image = cv2.imread(img)
    (H, W) = image.shape[:2]
    # Determine only the output layer names that we need from YOLO
    ln = net.getLayerNames()
    ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
    # Construct a blob from the input image, perform a forward pass of the YOLO object detector and that will give us
    # bounding boxes alongside its associated probabilities
    blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
    net.setInput(blob)
    layerOutputs = net.forward(ln)
    # Show timing information on YOLO
    # Initialize the list of detected bounding boxes, confidences and class IDs respectively
    boxes = []
    confidences = []
    classIDs = []
    # Loop over each one of the layer outputs
    for output in layerOutputs:
        # loop over each one of the detections
        for detection in output:
            # extract the class ID and confidence (i.e, probability) of the current object detection
            scores = detection[5:]
            classID = np.argmax(scores)
            confidence = scores[classID]
            # filter out weak predictions by ensuring the detected probabilityy is greater than the minimum probability
            if confidence > 0.5:
                # scale the bounding box coordinates back relative to the size of the image, keepin in mind that YOLO
                # actually returns the center (x,y) coordinates of the bounding box followed by the boxes width and height
                box = detection[0:4] * np.array([W, H, W, H])
                (centerX, centerY, width, height) = box.astype("int")
                # use the center (x,y) coordinates to derive the top and left corner of the bounding box
                x = int(centerX - (width / 2))
                y = int(centerY - (height / 2))
                # update the list of bounding box coordinates, confidences and class IDs
                boxes.append([x, y, int(width), int(height)])
                confidences.append(float(confidence))
                classIDs.append(classID)
   # Apply non-maxima suppression to suppress weak, overlapping bounding boxes
    idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.3)
    # Ensure at least on detection exists
    if len(idxs) > 0:
        # loop over the indexes we are keeping
        for i in idxs.flatten():
            # extract the bounding box coordinates
            (x, y) = (boxes[i][0], boxes[i][1])
            (w, h) = (boxes[i][2], boxes[i][3])
            # draw a bounding box rectangle and label on the image
            color = [int(c) for c in COLORS[classIDs[i]]]
            cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
            text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
            cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2)
    ax = plt.subplot(4, 3, count)
    plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    plt.axis("off")
    count += 1
    #If you want to visualize with OpenCV, uncoment the two above lines
    #cv2.imshow("prediction", image)
    #cv2.waitKey(0)
plt.savefig("detections.jpg")
plt.show()
cv2.destroyAllWindows()

## Conclusions

Here, we've presented twelve images to detection, the results are very accurate in general. YOLO is fast and strong, evidently there's your drawbacks and limitations. For example, in the image number 11, there's some small objects that were note recognized by the model, this is a problem with the YOLOv3. Other problem is, when we have some objects very closed, the model has difficult to performs well, as the last example image. 

On the other hand, the good predictions are predominant over all images, if we look with attention, the image with the two little boys, one of them is clearly blur, but the model recognized him. YOLO, in my opinion is the better model for object detection, evidently we must consider other models, but for me, I'll look for other models when YOLO does not work well.