# Real-time Object Detection with YOLO and OpenCV

## Overview:

This script demonstrates real-time object detection using the You Only Look Once (YOLO) model, DEtection TRansformers (DETR) model and OpenCV. YOLO is a popular deep learning-based object detection algorithm that is known for its speed and accuracy. DETR is an object detection model that directly predicts object bounding boxes and class labels using transformer-based encoder-decoder architecture. OpenCV is a powerful library used for computer vision tasks, including image processing and object detection.

## Concepts:

### YOLO (You Only Look Once):
   - YOLO is a real-time object detection algorithm that detects objects in images or video frames.
   - It divides the input image into a grid and predicts bounding boxes and class probabilities for each grid cell simultaneously.
   - YOLO can detect multiple objects in a single pass through the neural network, making it extremely fast.

### OpenCV:
   - OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library.
   - It provides a wide range of tools and algorithms for image and video processing tasks.
   - OpenCV is widely used for tasks such as object detection, facial recognition, and image segmentation.


### Object Detection:
   - Object detection is a computer vision task that involves detecting and locating objects within an image or video frame.
   - It differs from image classification, which identifies the main object in an entire image, by providing the precise location of each object along with its class label.
   - Object detection algorithms typically use machine learning techniques, such as deep neural networks, to perform this task.

### Real-time Object Detection vs Batch Object Detection:
   - Real-time object detection refers to the ability to perform object detection on live video streams in real-time, usually at frame rates of at least 30 frames per second (FPS).
   - Batch object detection, on the other hand, involves processing a batch of images or video frames offline, without the constraint of real-time processing.
   - Real-time object detection is often used in applications such as video surveillance, autonomous driving, and augmented reality, where timely detection of objects is critical.

### YOLOv8 Architecture:
   - YOLOv8 (You Only Look Once version 8) is an improvement over previous versions of the YOLO algorithm, known for its efficiency and accuracy in object detection tasks.
   - YOLOv8 is based on a deep convolutional neural network architecture that divides the input image into a grid of cells and predicts bounding boxes and class probabilities for each cell simultaneously.
   - It uses a single neural network to predict multiple bounding boxes and class probabilities for each object in the image, making it extremely fast and efficient.


### Importing Libraries:
  - The script imports necessary libraries including `cv2` for OpenCV, `YOLO` from `ultralytics` for object detection, and `supervision` for annotations.

In [None]:
import cv2
from ultralytics import YOLO
import supervision as sv

### Initializing YOLO Model:
  - The YOLO model is initialized using pre-trained weights (`yolov8s.pt`). These weights are obtained from training on a large dataset and are used to perform object detection.

### Initializing Webcam Capture:
  - The script initializes webcam capture using OpenCV's `VideoCapture` class. If the webcam cannot be opened, an error message is printed and the script exits.

In [None]:
#Initialize yolo model
model = YOLO('yolov8s.pt')
#Initialize webcam capture
# If windows use:
# self.webcam = cv2.VideoCapture(0, cv2.DSHOW)
cap = cv2.VideoCapture(0, cv2.CAP_DSHOW) #cv2.VideoCapture(0)

### Real-time Object Detection Loop:
  - The script enters a while loop to continuously capture frames from the webcam and perform object detection on each frame.
  - A video is a sequence of frames (images)
  - Each frame captured from the webcam is passed through the YOLO model to detect objects.
  - Detected objects are annotated with bounding boxes and labels using the `supervision` library.
  - Annotated frames are displayed in real-time using OpenCV's `imshow` function.
  - The loop continues until the user presses the 'q' key, at which point the webcam is released and OpenCV windows are closed.

In [None]:
#cap.isOpened checks if the camera is connected and can capture(open)
if not cap.isOpened():
    print("Cannot open camera")
    exit()
while True:
    #ret is boolean which is True if the camera stream can be read
    #frame is the picture captured during streaming
    ret, frame =  cap.read()

    #If ret is equal to False the loop will break
    if not ret:
        print("Can't receive frame (stream end?), Exiting ...")
        break
    
    #predict with the model
    results = model(frame)[0]

    #pass the results to the supervision class to process results
    #Show example of results
    detections = sv.Detections.from_ultralytics(results)

    #Define the annotator that will draw the boundingboxes on the image
    bounding_box_annotator = sv.BoundingBoxAnnotator(
        thickness=4
    )

    #Define the label annotator which will add label to the annotations
    label_annotator = sv.LabelAnnotator()

    # Remove human class
    # detections = detections[detections.class_id !=0]

    #Extract the labels from the model object to pass into the label annotator
    #The labels will be in a dict format:
    # {human: 0}
    labels = [
            f"{model.model.names[class_name]} {confidence:.2f}"
            for class_name, confidence
            in zip(detections.class_id, detections.confidence)
        ]

    #Annotate (draw) the fram (image) with boundingboxes
    # A bounding boxes is defined by coordinates like for example:
    # x1, x2, y1, y2
    # These are coordinates 
    annotated_image = bounding_box_annotator.annotate(
        scene=frame, detections=detections)
    
    # Annotate each bounding box with its label
    # Each bounding box will have an id which is linked to each cls (class) which is a int
    # The int is used to lookup in the labels dict
    annotated_image = label_annotator.annotate(
        scene=annotated_image, detections=detections, labels=labels)

    #Show the frame with opencv
    # When pressing  q the popup window with the frame will close
    cv2.imshow("frame", annotated_image)
    if cv2.waitKey(1) == ord("q"):
        break
#disconnect from the camera
cap.release()
#Close all cv2 opened windows
cv2.destroyAllWindows()

### An example of how the code can be structured as a python class
- In this class:
  - **__init__**: Initializes the needed variables
  - **__del__**: Is a function to disconnect from the camera and close all windows opened by cv2.
  - **detect_objects**: Creates a loop where:
    - It checks if the camera is connected, if not then disconnect
    - predicts, and creates annotations with bounding boxes and labels
    - Displays the result with cv2, where you close the opened window by pressing "q"

## 

In [None]:
import cv2
from ultralytics import YOLO  # Import YOLO model from Ultralytics
import supervision as sv  # Import the supervision library for annotations

class ObjectDetectionWithWebcam:
    """
    This class performs real-time object detection using a webcam and YOLO model.

    Attributes:
        model (YOLO): YOLO object detection model.
        webcam (cv2.VideoCapture): Webcam object for capturing frames.
    """

    def __init__(self, model_weights: str = 'yolov8s.pt'):
        """
        Initializes the ObjectDetectionWithWebcam class.

        Args:
            model_weights (str): Path to the YOLO model weights file (default is 'yolov8s.pt').
        """
        self.model = YOLO(model_weights)

        # If windows use:
        self.webcam = cv2.VideoCapture(0, cv2.CAP_DSHOW)
        
        # If Mac use:
        #self.webcam = cv2.VideoCapture(0)

        if not self.webcam.isOpened():
            raise RuntimeError("Cannot open webcam")

    def __del__(self):
        """
        Cleans up resources by releasing the webcam.
        """
        self.webcam.release()
        cv2.destroyAllWindows()

    def detect_objects(self):
        """
        Performs real-time object detection using the webcam and displays the annotated frames.
        """
        while True:
            # Read frame from webcam
            ret, frame = self.webcam.read()

            if not ret:
                print("Can't receive frame (stream end?), Exiting ...")
                break
            
            # Perform object detection on the frame using the YOLO model
            results = self.model(frame)[0]

            # Convert YOLO detections to Supervision Detections format
            detections = sv.Detections.from_ultralytics(results)

            # Create a bounding box annotator with specified thickness
            bounding_box_annotator = sv.BoundingBoxAnnotator(
                thickness=4
            )

            # Create a label annotator
            label_annotator = sv.LabelAnnotator()

            # Filter out detections with class_id not equal to 0 (background class)
            detections = detections[detections.class_id != 0]

            # Get labels for each detected object
            labels = [
                f"{self.model.model.names[class_name]} {confidence:.2f}"
                for class_name, confidence
                in zip(detections.class_id, detections.confidence)
            ]

            # Annotate the frame with bounding boxes
            annotated_image = bounding_box_annotator.annotate(
                scene=frame, detections=detections)

            # Annotate the frame with labels
            annotated_image = label_annotator.annotate(
                scene=annotated_image, detections=detections, labels=labels)

            # Display the annotated frame
            cv2.imshow("Object Detection", annotated_image)

            # Exit loop if 'q' key is pressed
            if cv2.waitKey(1) == ord("q"):
                break

# Usage example:
if __name__ == "__main__":
    # Initialize ObjectDetectionWithWebcam class
    detector = ObjectDetectionWithWebcam()

    # Perform real-time object detection
    detector.detect_objects()
    detector.__del__()