<a href="https://colab.research.google.com/github/Mechanics-Mechatronics-and-Robotics/CV-2025/blob/main/Week_12/Lab9_object_tracking_multi_object.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 9. Object Tracking. YOLO and DeepSORT.

Let's start with installing packages, it takes some time...

In [None]:
!pip install ultralytics
!git clone https://github.com/nwojke/deep_sort.git

Collecting ultralytics
  Downloading ultralytics-8.3.104-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8.0->ultralytics)
  Downloading n

Import libraries...

In [None]:
import random
import cv2
from ultralytics import YOLO

from deep_sort.deep_sort.tracker import Tracker as DeepSortTracker
from deep_sort.tools import generate_detections as gdet
from deep_sort.deep_sort import nn_matching
from deep_sort.deep_sort.detection import Detection
import numpy as np

More details about YOLO and DeepSORT:

https://habr.com/ru/articles/514450/

![](https://habrastorage.org/r/w1560/webt/q_/6e/hz/q_6ehzuef4yaip17avcipq8wevg.png)

Upload video and weights for

In [None]:
!gdown 1jcWKM2sIPqIdNtNj0rKmLPgHVvCP1djj
!gdown 1JJDZMgj6zPYI7YcRazwB3nz7bg6tnMW4

Downloading...
From: https://drive.google.com/uc?id=1jcWKM2sIPqIdNtNj0rKmLPgHVvCP1djj
To: /content/people.mp4
100% 4.23M/4.23M [00:00<00:00, 25.7MB/s]
Downloading...
From: https://drive.google.com/uc?id=1JJDZMgj6zPYI7YcRazwB3nz7bg6tnMW4
To: /content/mars-small128.pb
100% 11.2M/11.2M [00:00<00:00, 29.4MB/s]


Let's download our model.

Some details for training and how to work with YOLO can be found here:

https://colab.research.google.com/github/ultralytics/ultralytics/blob/main/examples/tutorial.ipynb

In [None]:
class Tracker:
    tracker = None
    encoder = None
    tracks = None

    def __init__(self):
        max_cosine_distance = 0.4
        nn_budget = None

        encoder_model_filename = 'mars-small128.pb'

        metric = nn_matching.NearestNeighborDistanceMetric("cosine", max_cosine_distance, nn_budget)
        self.tracker = DeepSortTracker(metric)
        self.encoder = gdet.create_box_encoder(encoder_model_filename, batch_size=1)

    def update(self, frame, detections):

        if len(detections) == 0:
            self.tracker.predict()
            self.tracker.update([])
            self.update_tracks()
            return

        bboxes = np.asarray([d[:-1] for d in detections])
        bboxes[:, 2:] = bboxes[:, 2:] - bboxes[:, 0:2]
        scores = [d[-1] for d in detections]

        features = self.encoder(frame, bboxes)

        dets = []
        for bbox_id, bbox in enumerate(bboxes):
            dets.append(Detection(bbox, scores[bbox_id], features[bbox_id]))

        self.tracker.predict()
        self.tracker.update(dets)
        self.update_tracks()

    def update_tracks(self):
        tracks = []
        for track in self.tracker.tracks:
            if not track.is_confirmed() or track.time_since_update > 1:
                continue
            bbox = track.to_tlbr()

            id = track.track_id

            tracks.append(Track(id, bbox))

        self.tracks = tracks


class Track:
    track_id = None
    bbox = None

    def __init__(self, id, bbox):
        self.track_id = id
        self.bbox = bbox

In [None]:
video_path = 'people.mp4'
video_out_path = 'out.mp4'

cap = cv2.VideoCapture(video_path)
ret, frame = cap.read()

cap_out = cv2.VideoWriter(video_out_path, cv2.VideoWriter_fourcc(*'MP4V'), cap.get(cv2.CAP_PROP_FPS),
                          (frame.shape[1], frame.shape[0]))

model = YOLO("yolov8n.pt")

tracker = Tracker()

colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) for j in range(10)]

detection_threshold = 0.5

while ret:

    results = model(frame)

    for result in results:
        detections = []
        for r in result.boxes.data.tolist():
            x1, y1, x2, y2, score, class_id = r
            x1 = int(x1)
            x2 = int(x2)
            y1 = int(y1)
            y2 = int(y2)
            class_id = int(class_id)
            if score > detection_threshold:
                detections.append([x1, y1, x2, y2, score])

        tracker.update(frame, detections)

        for track in tracker.tracks:
            bbox = track.bbox
            x1, y1, x2, y2 = bbox
            track_id = track.track_id

            cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (colors[track_id % len(colors)]), 3)

    cap_out.write(frame)
    ret, frame = cap.read()

cap.release()
cap_out.release()
cv2.destroyAllWindows()


0: 384x640 35 persons, 2 birds, 312.8ms
Speed: 18.4ms preprocess, 312.8ms inference, 35.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 34 persons, 3 birds, 131.6ms
Speed: 3.5ms preprocess, 131.6ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 35 persons, 3 birds, 137.7ms
Speed: 3.1ms preprocess, 137.7ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 36 persons, 2 birds, 125.8ms
Speed: 9.8ms preprocess, 125.8ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 37 persons, 2 birds, 130.3ms
Speed: 2.5ms preprocess, 130.3ms inference, 1.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 37 persons, 2 birds, 132.7ms
Speed: 2.4ms preprocess, 132.7ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 35 persons, 3 birds, 132.3ms
Speed: 4.5ms preprocess, 132.3ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 35 persons, 2

In [None]:
import imageio
import matplotlib.animation as animation
from skimage.transform import resize
import matplotlib.pyplot as plt
from IPython.display import HTML

def display_video(video):
    fig = plt.figure(figsize=(3,3))  #Display size specification

    mov = []
    for i in range(len(video)):  #Append videos one by one to mov
        img = plt.imshow(video[i], animated=True)
        plt.axis('off')
        mov.append([img])

    #Animation creation
    anime = animation.ArtistAnimation(fig, mov, interval=50, repeat_delay=1000)

    plt.close()
    return anime

video = imageio.mimread('out.mp4', memtest=False)  #Loading video
video = [resize(frame, (256, 256))[..., :3] for frame in video]    #Size adjustment (if necessary)
HTML(display_video(video).to_html5_video())  #Inline video display in HTML5

# Homework (4 points)

In this homework you need to create your own model for tracking an object based on the custom dataset.

Requirements:

1. You need to choose a model as a detector and a tracking algorithm to it

2. Choose or record your own video (it could be a video stream recorded on your smartphone or on a web camera, etc) for the purpose of detection and tracking

3. Prepare your own dataset and retrain your model so it can be able to detect a chosen class/object

4. Present your results by the following way:

* Explain each part of architecture (detector+tracker) used in your solution. For example, if its YoloV8, what is it special about this model?

* Briefly explain your dataset and object for detection\tracking purposes on the video

* Make a Jupyter Notebook or python script that includes all the steps from reading video and training model to saving video with all the detection/tracking actions

* Make sure that your code works correctly and DON'T forget to note where it should be executed (colab/locally/etc.)

You are allowed to use any framework for detection and tracking steps.

Example:

https://medium.com/@serurays/object-detection-and-tracking-using-yolov8-and-deepsort-47046fc914e9

GOOD LUCK!

In [None]:
# ** YOUR CODE HERE **