# People Detection Tracking (YOLOv8)

## Download YOLOv8 wright

This notebook requires YOLOv8 weight. Please download YOLOv8 weight from Ultralytics as `yolov8n.pt`.

## Get up and running

Get frames, draw boundary boxes and with OpenCV, YOLO as detection, and DeepSORT for tracking.

In [2]:
import random

import cv2
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np
from ultralytics import YOLO

In [3]:
model = YOLO('yolov8n.pt')  # load pretrained yolov8 wight

Either download a video to inference or stream from a webcam

In [4]:
# from file
!gdown 14n0an0gxm14TJFoYKMdhv3Sz-afffAZ6
cap = cv2.VideoCapture("people.mp4")

# from webcam
# cap = cv2.VideoCapture(0)

Downloading...
From: https://drive.google.com/uc?id=14n0an0gxm14TJFoYKMdhv3Sz-afffAZ6
To: C:\Users\Admin\Downloads\Workspace\putt's works\people-detection-tracking\people.mp4

  0%|          | 0.00/4.23M [00:00<?, ?B/s]
 12%|#2        | 524k/4.23M [00:00<00:02, 1.47MB/s]
 25%|##4       | 1.05M/4.23M [00:00<00:02, 1.54MB/s]
 37%|###7      | 1.57M/4.23M [00:01<00:01, 1.38MB/s]
 50%|####9     | 2.10M/4.23M [00:01<00:01, 1.45MB/s]
 62%|######1   | 2.62M/4.23M [00:01<00:01, 1.51MB/s]
 74%|#######4  | 3.15M/4.23M [00:02<00:00, 1.54MB/s]
 87%|########6 | 3.67M/4.23M [00:02<00:00, 1.55MB/s]
 99%|#########9| 4.19M/4.23M [00:02<00:00, 1.58MB/s]
100%|##########| 4.23M/4.23M [00:02<00:00, 1.53MB/s]


Initialize DeepSORT tracker for tracking

In [5]:
# docs: https://pypi.org/project/deep-sort-realtime/
tracker = DeepSort()  # initialize deep sort (tracker)

Randomize tracked object if not recognize, otherwise random a color for each tracked object.

In [6]:
def get_track_color(colors, track_id):
    """If tracked object is exists, use the existed color, otherwise random a color for each tracked object"""
    if not track_id in colors:
        colors[track_id] = (
            random.randint(0, 255),
            random.randint(0, 255),
            random.randint(0, 255)
        )
    return colors[track_id]

Get each from video capture, save all bounding boxes, classes, and confidence (with 50% threshold), and track with DeepSORT, then draw each corresponding object from the tracker to the image.
After the video capture is closed whether it's streamed all contents or stopped by the user's input, releases the video capture resource and destroy all OpenCV's windows.

In [7]:
WIDTH = 2
track_colors = {}

while cap.isOpened():
    ok, frame = cap.read()  # read frame from the video capture
    if not ok:
        break

    results = model(frame, conf=0.5)

    # detections contain bounding box, confidence, and class ID
    detections = []
    for result in results:
        for box in result.boxes:
            x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()  # get bounding box
            confidence = box.conf.cpu().numpy()
            class_id = box.cls.cpu().numpy()

            # classes: https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml
            # 0 = person
            if class_id != 0:
                continue

            detection = ((x1, y1, x2 - x1, y2 - y1), confidence, class_id)
            detections.append(detection)

    # update tracker with current detections
    tracked_objs = tracker.update_tracks(detections, frame=frame)

    for obj in tracked_objs:
        if not obj.is_confirmed():
            continue

        track_id = obj.track_id

        # get left-top-right-bottom bounding box in integers
        ltrb = np.array(obj.to_ltrb()).astype(np.int32)
        x1, y1, x2, y2 = ltrb

        # get color for each tracked object
        color = get_track_color(track_colors, track_id)

        # draw a bounding box to the image
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, WIDTH)
        
        # draw a text to the image with tracked ID
        cv2.putText(frame, f'ID {track_id}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, color, WIDTH)

    cv2.imshow('output', frame)

    # press 'q' to exit the loop
    if cv2.waitKey(1) & 0xFF == ord('q'):
        print('user stopped the process')
        break

cap.release()  # release video capture resource
cv2.destroyAllWindows()


0: 384x640 22 persons, 2 birds, 76.3ms
Speed: 3.7ms preprocess, 76.3ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 20 persons, 2 birds, 62.3ms
Speed: 2.0ms preprocess, 62.3ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 20 persons, 2 birds, 71.9ms
Speed: 3.1ms preprocess, 71.9ms inference, 2.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 20 persons, 2 birds, 58.8ms
Speed: 2.0ms preprocess, 58.8ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 21 persons, 2 birds, 56.0ms
Speed: 1.0ms preprocess, 56.0ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 20 persons, 2 birds, 58.1ms
Speed: 1.5ms preprocess, 58.1ms inference, 1.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 20 persons, 2 birds, 62.3ms
Speed: 2.0ms preprocess, 62.3ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 22 persons, 2 birds, 72.1ms
S

---