<h1 style="font-size40:px;">Object Detection using YOLOv5</h1>

In this module, we will learn how to use YOLOv5 — a state of the art object detector — with OpenCV DNN module.

## 1. Why YOLOv5?

YOLOv5 was released in 2020 just a month after YOLOv4. Unlike previous versions of YOLO, it is based on PyTorch framework. Although YOLOv5 is not the upgarded version of YOLOv4, it does have several advantages.
 - Plethora of models
 - Multi-platform support
 - Faster than previous versions
 - Highly active development
 - PyTorch community is larger than Darknet(YOLOv4 farmework)
 - Very easy to implement

You can choose from five different models, nano, small, medium, large, and extra large. Each having speed-accuracy tradeoff. YOLOv5 officially support [11 platforms](https://github.com/ultralytics/yolov5/releases) so far. Recently, it also extended support to OpenCV. By default, YOLOv5 provides models in PyTorch(.pt) format. However, OpenCV DNN module requires the model to be in ONNX(.onnx) format. So let's begin by converting the PyTorch models to ONNX.


## 2. Convert PyTorch Models to ONNX

We are going to use [git](https://git-scm.com/downloads) and [wget](https://www.gnu.org/software/wget/). Install using the links provided if not already installed.

### 2.0 Dependancy Update [22/07/2022]

In [1]:
# Tested with torch==1.11.
# Tested with opencv-contrib-python versions [4.5.4.58 to 4.6.0.66].
!pip install torch==1.11 torchvision
!pip install opencv-contrib-python==4.6.0.66

[31mERROR: Could not find a version that satisfies the requirement torch==1.11 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0)[0m[31m
[0m[31mERROR: No matching distribution found for torch==1.11[0m[31m
[0mCollecting opencv-contrib-python==4.6.0.66
  Obtaining dependency information for opencv-contrib-python==4.6.0.66 from https://files.pythonhosted.org/packages/34/45/c8bc145b1541d1fbbf25d5494cd76453d9855971cfe571b9ad7e13cdb4c8/opencv_contrib_python-4.6.0.66-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading opencv_contrib_python-4.6.0.66-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Downloading opencv_contrib_python-4.6.0.66-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (67.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.1/67.1 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: o

### 2.1 Clone UltraLytics YOLOv5 repository

In [None]:
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt

Cloning into 'yolov5'...
remote: Enumerating objects: 16824, done.[K
remote: Total 16824 (delta 0), reused 0 (delta 0), pack-reused 16824[K
Receiving objects: 100% (16824/16824), 15.53 MiB | 3.64 MiB/s, done.
Resolving deltas: 100% (11543/11543), done.
/home/xabier/Git_repos/MyOpenCVProjects/JupyterLab/yolov5


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


Collecting gitpython>=3.1.30 (from -r requirements.txt (line 5))
  Obtaining dependency information for gitpython>=3.1.30 from https://files.pythonhosted.org/packages/e9/bd/cc3a402a6439c15c3d4294333e13042b915bbeab54edc457c723931fed3f/GitPython-3.1.43-py3-none-any.whl.metadata
  Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting opencv-python>=4.1.1 (from -r requirements.txt (line 8))
  Obtaining dependency information for opencv-python>=4.1.1 from https://files.pythonhosted.org/packages/3f/a4/d2537f47fd7fcfba966bd806e3ec18e7ee1681056d4b0a9c8d983983e4d5/opencv_python-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading opencv_python-4.10.0.84-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting pillow>=10.3.0 (from -r requirements.txt (line 9))
  Obtaining dependency information for pillow>=10.3.0 from https://files.pythonhosted.org/packages/ba/e5/8c68ff608a4203085158cff5cc2a3c534ec384536d9438c40

### 2.2 Download YOLOv5 PyTorch Models

In [None]:
%cd models
!wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5n.pt -nv
!wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt -nv
!wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5m.pt -nv
!wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5l.pt -nv
!wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5x.pt -nv
%cd ..

/content/yolov5/models
2024-05-10 09:55:42 URL:https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/e3cd2b74-b7e5-491d-901a-58234d5f948f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240510T095541Z&X-Amz-Expires=300&X-Amz-Signature=4a1502545521854a6ff761f8c74d44640f8831d437c25608c064003970b3caf2&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=264818686&response-content-disposition=attachment%3B%20filename%3Dyolov5n.pt&response-content-type=application%2Foctet-stream [4062133/4062133] -> "yolov5n.pt" [1]
2024-05-10 09:55:42 URL:https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/76813c2d-b52b-47af-95fb-e92c1b0b2783?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240510T095542Z&X-Amz-Expires=300&X-Amz-Signature=30ab7ebdd9ac3fb5a4ef3b93f0ac30dc53527ca45dbe0c7

### 2.3 Export to ONNX

In [None]:
# The default input size is 640x640.
!python export.py --weights models/yolov5n.pt --include onnx --opset 12
!python export.py --weights models/yolov5s.pt --include onnx --opset 12
!python export.py --weights models/yolov5m.pt --include onnx --opset 12
!python export.py --weights models/yolov5l.pt --include onnx --opset 12
!python export.py --weights models/yolov5x.pt --include onnx --opset 12

# Add the flag --imsz to export for custom input size.
# !python export.py --weights models/yolov5n.pt --include onnx --imsz 320 320

# Add the flag --dyanamic for dynamic input size. Compatible with ONNX runtime.
# !python export.py --weights models/yolov5n.pt --include onnx --dynamic

[34m[1mexport: [0mdata=data/coco128.yaml, weights=['models/yolov5n.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, per_tensor=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLOv5 🚀 v7.0-307-g920c721e Python-3.10.12 torch-1.11.0+cu102 CPU

Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients

[34m[1mPyTorch:[0m starting from models/yolov5n.pt with output shape (1, 25200, 85) (3.9 MB)
[31m[1mrequirements:[0m Ultralytics requirement ['onnx>=1.12.0'] not found, attempting AutoUpdate...
Collecting onnx>=1.12.0
  Downloading onnx-1.16.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.9/15.9 MB 70.0 MB/s eta 0:00:00
Installing collected packages: onnx
Successfully installed onnx

## 3. Object Detection with YOLOv5 and OpenCV DNN module
Let us now see how to use YOLOv5 in OpenCV to perform object detection.

### 3.1 Why use YOLOv5 with OpenCV?
1. **Easy integration with an OpenCV application:** If your application already uses OpenCV and you simply want to use YOLOv5, you don't have to worry about building it from source.
2. **Faster:** OpenCV is Optimized for Intel CPUs. We can see significant performance leap using OpenCV DNN module.
3. **C++ support:** As mentioned above, YOLOv5 is Python based. Although it  is a popular language, industry grade solutions are still being built in C++ due to its efficiency. We can not implement YOLOv5 in C++ directly but using OpenCV, it's possible.

### 3.2 Download Resources

In [None]:
print("Downloading Resources...")
!wget https://www.dropbox.com/s/hhx3pbacs7rixid/street.jpg?dl=1 -O street.jpg -nv
!wget https://www.dropbox.com/s/wb5nkwuml526bqa/coco.names?dl=1 -O coco.names -nv

### 3.3 Import Libraries

In [None]:
import numpy as np
import cv2
import sys
import requests
from os import path

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (15.0,15.0)
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 14

### 3.4: Initialize the parameters
The YOLOv5 algorithm generates bounding boxes as the predicted detection outputs. Every predicted box is associated with a confidence score. In the first stage, all the boxes below the confidence threshold parameter are ignored for further processing.

Every detection also has a class scores associated with it. Class scores are the probability of the detection being the object from the dataset it is trained on. YOLOv5 is trained on COCO dataset 2017 which has 80 classes.

Each object can have multiple bounding boxes. Since we need only one, rest of the boxes are passed through non-maximum suppression. **Non-maximum suppression** is controlled by a parameter **NMS_THRESHOLD**. You can try to change these values and see how the number of output predicted boxes changes.

Next, the default values for the input width (**INPUT_WIDTH**) and height (**INPUT_HEIGHT**) for the network’s input image are set. We set each of them to 640 (default). You can use smaller size(Multiple of 32) to increase the speed.


In [None]:
# Constants.
INPUT_WIDTH = 640             # Width of network's input image, larger is slower but more accurate
INPUT_HEIGHT = 640            # Height of network's input image, larger is slower but more accurate
SCORE_THRESHOLD = 0.5         # Class score threshold, accepts only if score is above the threshold.
NMS_THRESHOLD = 0.45          # Non-maximum suppression threshold, higher values result in duplicate boxes per object
CONFIDENCE_THRESHOLD = 0.45   # Confidence threshold, high values filter out low confidence detections

In [None]:
# Text parameters.
FONT_FACE = cv2.FONT_HERSHEY_SIMPLEX
FONT_SCALE = 2.5
THICKNESS = 4

In [None]:
# Colors
BLACK  = (0,0,0)
BLUE   = (255,178,50)
YELLOW = (0,255,255)
RED = (0,0,255)
WHITE = (255,255,255)

### 3.5 Function to pre-process the image

The function pre–process takes the image and the network as arguments. At first, the image is converted to a blob. Then it is set as input to the network. The function `getUnconnectedOutLayerNames()` provides the names of the output layers. It has features of all the layers, through which the image is forward propagated to acquire the detections. After processing, it returns the detection results.

In [None]:
def pre_process(input_image, net):
    # Create a 4D blob from a frame.
    blob = cv2.dnn.blobFromImage(input_image, 1/255, (INPUT_WIDTH, INPUT_HEIGHT), [0,0,0], 1, crop=False)

    # Sets the input to the network.
    net.setInput(blob)

    # Runs the forward pass to get output of the output layers.
    output_layers = net.getUnconnectedOutLayersNames()
    outputs = net.forward(output_layers)
    # print(outputs[0].shape)

    return outputs


### 3.6 Post-processing the network’s output

The network outputs bounding boxes are each represented by 5 elements + a vector of 80 classes. The first 4 elements represent the **center_x**, **center_y**, **width** and **height**. The fifth element represents the confidence that the bounding box encloses an object.

The rest of the elements are the confidence associated with each class (i.e. object type). The box itself is assigned to the class corresponding to the highest score. The highest score for a box is also called its **confidence**. If the confidence of a box is less than the given threshold, the bounding box is dropped and not considered for further processing.

The boxes with their confidence equal to or greater than the confidence threshold are then subjected to Non Maximum Suppression. This would reduce the number of overlapping boxes.

In [None]:
def post_process(input_image, outputs):
    # Lists to hold respective values while unwrapping.
    class_ids = []
    confidences = []
    boxes = []

    # Rows.
    rows = outputs[0].shape[1]

    image_height, image_width = input_image.shape[:2]

    # Resizing factor.
    x_factor = image_width / INPUT_WIDTH
    y_factor =  image_height / INPUT_HEIGHT

    # Iterate through 25200 detections.
    for r in range(rows):
        row = outputs[0][0][r]
        confidence = row[4]

        # Discard bad detections and continue.
        if confidence >= CONFIDENCE_THRESHOLD:
            classes_scores = row[5:]

            # Get the index of max class score.
            class_id = np.argmax(classes_scores)

            #  Continue if the class score is above threshold.
            if (classes_scores[class_id] > SCORE_THRESHOLD):
                confidences.append(confidence)
                class_ids.append(class_id)

                cx, cy, w, h = row[0], row[1], row[2], row[3]

                left = int((cx - w/2) * x_factor)
                top = int((cy - h/2) * y_factor)
                width = int(w * x_factor)
                height = int(h * y_factor)

                box = np.array([left, top, width, height])
                boxes.append(box)

    # Perform non maximum suppression to eliminate redundant overlapping boxes with
    # lower confidences.
    indices = cv2.dnn.NMSBoxes(boxes, confidences, CONFIDENCE_THRESHOLD, NMS_THRESHOLD)
    for i in indices:
        box = boxes[i]
        left = box[0]
        top = box[1]
        width = box[2]
        height = box[3]
        cv2.rectangle(input_image, (left, top), (left + width, top + height), BLUE, 4*THICKNESS)
        label = "{}:{:.2f}".format(classes[class_ids[i]], confidences[i])
        draw_label(input_image, label, left, top)

    return input_image

The Non Maximum Suppression is controlled by the nmsThreshold parameter. If nmsThreshold is set too low, e.g. 0.1, we might not detect overlapping objects of same or different classes. But if it is set too high e.g. 1, then we get multiple boxes for the same object. So we used an intermediate value of 0.4 in our code above. The gif below shows the effect of varying the NMS threshold.

<br>
<center>
<a href="https://courses.opencv.org/asset-v1:OpenCV+101+Beginners+type@asset+block@nms-car.gif"><img src="https://courses.opencv.org/asset-v1:OpenCV+101+Beginners+type@asset+block@nms-car.gif"/> </a>
</center>
<br>

<center>non maximum suppression threshold object detection</center>

### 3.7 Function to draw labels

In [None]:
def draw_label(input_image, label, left, top):
    """Draw text onto image at location."""

    # Get text size.
    text_size = cv2.getTextSize(label, FONT_FACE, FONT_SCALE, THICKNESS)
    dim, baseline = text_size[0], text_size[1]
    # Use text size to create a BLACK rectangle.
    cv2.rectangle(input_image, (left, top), (left + dim[0], top + dim[1] + baseline), BLACK, cv2.FILLED);
    # Display text inside the rectangle.
    cv2.putText(input_image, label, (left, top + dim[1]), FONT_FACE, FONT_SCALE, YELLOW, THICKNESS, cv2.LINE_AA)

### 3.8 Function to put efficiency information

In [None]:
def put_efficiency(input_img, net):
  t, _ = net.getPerfProfile()
  label = 'Inference time: %.2f ms' % (t * 1000.0 / cv2.getTickFrequency())
  print(label)
  cv2.putText(input_img, label, (20, 80), FONT_FACE, FONT_SCALE, RED, THICKNESS, cv2.LINE_AA)

### 3.9 Load image and class names

In [None]:
frame = cv2.imread('street.jpg')

classesFile = "coco.names"
classes = None
with open(classesFile, 'rt') as f:
  classes = f.read().rstrip('\n').split('\n')

plt.imshow(frame[...,::-1]);

# 4. RESULTS


### 4.1 YOLOv5 Nano

In [None]:
# Load the network.
modelWeights = "models/yolov5n.onnx"
net = cv2.dnn.readNet(modelWeights)

# Process image.
detections = pre_process(frame, net)
img = post_process(frame.copy(), detections)

# Put efficiency information.
put_efficiency(img, net)

plt.imshow(img[...,::-1]);

### 4.2 YOLOv5 Small

In [None]:
# Load the network.
modelWeights = "models/yolov5s.onnx"
net = cv2.dnn.readNet(modelWeights)

# Process image.
detections = pre_process(frame, net)
img = post_process(frame.copy(), detections)

# Put efficiency information.
put_efficiency(img, net)

plt.imshow(img[...,::-1]);

### 4.3 YOLOv5 Medium

In [None]:
# Load the network.
modelWeights = "models/yolov5m.onnx"
net = cv2.dnn.readNet(modelWeights)

# Process image.
detections = pre_process(frame, net)
img = post_process(frame.copy(), detections)

# Put efficiency information.
put_efficiency(img, net)

plt.imshow(img[...,::-1]);

### 4.4 YOLOv5 Large

In [None]:
# Load the network.
modelWeights = "models/yolov5l.onnx"
net = cv2.dnn.readNet(modelWeights)

# Process image.
detections = pre_process(frame, net)
img = post_process(frame.copy(), detections)

# Put efficiency information.
put_efficiency(img, net)

plt.imshow(img[...,::-1]);

### 4.5 YOLOv5 Extra Large

In [None]:
# Load the network.
modelWeights = "models/yolov5x.onnx"
net = cv2.dnn.readNet(modelWeights)

# Process image.
detections = pre_process(frame, net)
img = post_process(frame.copy(), detections)

# Put efficiency information.
put_efficiency(img, net)

plt.imshow(img[...,::-1]);