# Object Detection with YOLO and OpenCV

This repo contains notes and implementations based on the [Object Detection with YOLO and OpenCV](https://www.geeksforgeeks.org/object-detection-with-yolo-and-opencv/) tutorial. It covers using `YOLO` for object detection in images and videos, with `OpenCV` for visualization. The guide includes steps for detecting objects in both recorded videos and real-time camera feeds.



### Step 1: Setup the environment

In [3]:
!pip install opencv-python
!pip install ultralytics lapx

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


### Step 2: Importing Necessary Libraries

In [5]:
import cv2 as cv
from ultralytics import YOLO

* `cv2`: It's the OpenCV python library
* We import the YOLO from ultralytics to load the model and work upon it 

### Step 3: Define Function to Get Class Colors
To generate random colours for different classes label and frame for object detection, we use the following method:

In [50]:
# Function to get class colors
def getColors(cls_num):
    # The method takes the RGB extreme values as a tuple array
    base_colors = [
        (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
        (0, 255, 255), (255, 0, 255), (255, 128, 0), (128, 255, 0),
        (0, 128, 255), (255, 128, 128), (128, 255, 128), (128, 128, 255),
        (128, 0, 0), (0, 128, 0), (0, 0, 128)
    ]
    
    # Next we store the index in color_index and choose the appropriate R, G, or B.
    color_index = cls_num % len(base_colors)
    
    # Increments to add some variation to the base colors
    increments = [(1, -2, 1), (-2, 1, -1), (1, -1, 2)]
    
    # Use modulo on color_index to ensure it's within the range of the increments list
    increment_index = color_index % len(increments)
    
    # Create the final color by applying the increments
    color = [base_colors[color_index][i] + increments[increment_index][i] *
        (cls_num // len(base_colors) % 256) for i in range(3)]
    
    return tuple(color)


### Step 4: Load YOLO Model
We will initializes the YOLO object detector with the specified model file (yolov8s.pt), which contains the pre-trained weights and configuration for the YOLOv8s model.

In [53]:
yolo = YOLO('yolov8s.pt')

### Step 5: Open Video Capture

In [61]:
videoCap = cv.VideoCapture(0)

### Step 6: Process Video Frames

In [63]:
while True:
    ret, frame = videoCap.read()
    if not ret:
        continue

    resutls = yolo.track(frame, stream=True)

    for result in resutls:
        # get the class names
        classes_names = result.names

    # iterate over each box
    for box in result.boxes:
        # check if confidence is greater than 40 percent
        if box.conf[0] > 0:
            # get coordinates
            [x1, y1, x2, y2] = box.xyxy[0]

            # convert to int
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            
            # get the class
            cls = int(box.cls[0])

            # get the class name
            class_name = classes_names[cls]

            # get the respective color
            color = getColors(cls)

            # draw the rectangle
            cv.rectangle(frame, (x1, y1), (x2, y2), color, 2)
            cv.putText(frame, f'{classes_names[int(box.cls[0])]} {box.conf[0]:.2f}', (x1, y1 - 10), cv.FONT_HERSHEY_SIMPLEX, 1, color, 2)
            
    # Display frame with detected object bonded boxes.
    cv.imshow("Display Frame",frame)

    # Get the pressed key
    key = cv.waitKey(1) & 0xFF

    # Break The loop in case press `q` key
    if key == ord('q'):
        break
    # Object detection and visualization code
videoCap.release()
cv.destroyAllWindows()


0: 480x640 3 persons, 1 chair, 2 tvs, 1 laptop, 478.2ms
Speed: 4.0ms preprocess, 478.2ms inference, 0.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 3 persons, 1 chair, 2 tvs, 1 laptop, 415.7ms
Speed: 6.3ms preprocess, 415.7ms inference, 0.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 3 persons, 2 chairs, 1 tv, 1 laptop, 460.5ms
Speed: 4.9ms preprocess, 460.5ms inference, 1.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 3 persons, 2 chairs, 2 tvs, 1 laptop, 1 keyboard, 379.8ms
Speed: 8.9ms preprocess, 379.8ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 3 persons, 3 chairs, 2 tvs, 2 laptops, 360.1ms
Speed: 3.0ms preprocess, 360.1ms inference, 0.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 3 persons, 3 chairs, 2 tvs, 1 laptop, 379.6ms
Speed: 4.0ms preprocess, 379.6ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 3 persons, 3 chairs, 2 tvs, 368.6ms
Speed: