# ByteTrack

ByteTrack — это современный алгоритм многопотокового трекинга объектов (Multi-Object Tracking, MOT), который эффективно связывает объекты между кадрами видео, даже если они временно пропадают из-за перекрытий или ложных срабатываний.

Он был представлен в 2021 году как улучшение классического SORT (Simple Online and Realtime Tracking) и DeepSORT, и особенно хорошо работает в сложных сценах с высокой плотностью объектов.

Особенности ByteTrack:

1. Использование "слабых" детекций

* В отличие от SORT/DeepSORT, которые отфильтровывают детекции с низким confidence (например, < 0.5), ByteTrack сохраняет их для сопоставления с "потерянными" треками.

* Это помогает избежать "пропажи" объектов при временных ложных негативах детектора.

2. Двухэтапное сопоставление

* Первая стадия: Сопоставление треков с "надежными" детекциями (высокий confidence).

* Вторая стадия: Сопоставление оставшихся треков с "слабыми" детекциями (низкий confidence).

3. Отслеживание по bounding box (без ReID)

ByteTrack полагается на IoU (Intersection over Union) и движение объектов (через Kalman Filter), но не использует re-identification (ReID) модели, что делает его быстрее DeepSORT.

## Реализация упрощенного трекера ByteTrack

In [None]:
!pip install numpy opencv-python Pillow onemetric
!pip install git+https://github.com/ifzhang/ByteTrack.git --no-deps

Collecting git+https://github.com/ifzhang/ByteTrack.git
  Cloning https://github.com/ifzhang/ByteTrack.git to /tmp/pip-req-build-ia5t5xmk
  Running command git clone --filter=blob:none --quiet https://github.com/ifzhang/ByteTrack.git /tmp/pip-req-build-ia5t5xmk
  Resolved https://github.com/ifzhang/ByteTrack.git to commit d1bf0191adff59bc8fcfeaa0b33d3d1642552a99
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [None]:
!pip install ultralytics



In [None]:
import cv2
import numpy as np
from datetime import datetime
from ultralytics import YOLO

# Своя реализация IoU на NumPy
def iou_batch(bboxes1, bboxes2):
    x11, y11, x12, y12 = np.split(bboxes1, 4, axis=1)
    x21, y21, x22, y22 = np.split(bboxes2, 4, axis=1)

    xA = np.maximum(x11, x21.T)
    yA = np.maximum(y11, y21.T)
    xB = np.minimum(x12, x22.T)
    yB = np.minimum(y12, y22.T)

    interArea = np.maximum(0, xB - xA) * np.maximum(0, yB - yA)
    boxAArea = (x12 - x11) * (y12 - y11)
    boxBArea = (x22 - x21) * (y22 - y21)

    iou = interArea / (boxAArea + boxBArea.T - interArea)
    return iou

class SimpleByteTracker:
    def __init__(self, track_thresh=0.5, match_thresh=0.8, max_misses=5):
        self.track_thresh = track_thresh
        self.match_thresh = match_thresh
        self.max_misses = max_misses  # Макс. число кадров без обновления
        self.tracks = []
        self.next_id = 1

    def update(self, detections, img_size):
        valid_dets = [d for d in detections if d[4] >= self.track_thresh]
        matched = set()
        matched_tracks = set()

        # Увеличиваем счётчик пропусков для всех треков
        for track in self.tracks:
            track['misses'] += 1

        if self.tracks and valid_dets:
            track_boxes = np.array([t['bbox'] for t in self.tracks])
            det_boxes = np.array([d[:4] for d in valid_dets])

            iou_matrix = iou_batch(track_boxes, det_boxes)

            for i, track in enumerate(self.tracks):
                best_match = np.argmax(iou_matrix[i])
                if iou_matrix[i, best_match] > self.match_thresh:
                    track['bbox'] = valid_dets[best_match][:4]
                    track['misses'] = 0  # Сброс счётчика
                    matched.add(best_match)
                    matched_tracks.add(i)

        # Удаляем треки, которые долго не обновлялись
        self.tracks = [
            t for t in self.tracks
            if t['misses'] <= self.max_misses
        ]

        # Добавляем новые треки
        for i, det in enumerate(valid_dets):
            if i not in matched:
                self.tracks.append({
                    'id': self.next_id,
                    'bbox': det[:4],
                    'score': det[4],
                    'misses': 0  # Инициализация счётчика
                })
                self.next_id += 1

        return self.tracks

In [None]:
from google.colab.patches import cv2_imshow

In [None]:
# Инициализация модели YOLOv8
model = YOLO("yolov8n.pt")  # Загрузка nano-модели (можно yolov8s.pt, yolov8m.pt и т.д.)

# Инициализация видео
cap = cv2.VideoCapture('/content/drive/MyDrive/datasets/cv/pedestrian.mp4')  # Или путь к видеофайлу

# Настройки выходного видео
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))

# Создаем имя файла с текущей датой/временем
output_filename = f"tracking_result_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp4"
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_filename, fourcc, fps, (frame_width, frame_height))

tracker = SimpleByteTracker()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Детекция объектов (можно настроить confidence threshold)
    results = model(frame, conf=0.5)  # conf - порог уверенности

    # Преобразуем результаты в формат [x1, y1, x2, y2, conf, class_id]
    detections = []
    for result in results:
        boxes = result.boxes.xyxy.cpu().numpy()  # bounding boxes
        confs = result.boxes.conf.cpu().numpy()  # confidence scores
        class_ids = result.boxes.cls.cpu().numpy()  # class IDs

        for box, conf, cls_id in zip(boxes, confs, class_ids):
            x1, y1, x2, y2 = box
            detections.append([x1, y1, x2, y2, conf, cls_id])


    # Обновление трекера
    tracks = tracker.update(detections, (frame.shape[1], frame.shape[0]))

    # Рисуем результат на кадре
    for track in tracks:
        x1, y1, x2, y2 = map(int, track['bbox'])
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(frame, f"ID: {track['id']}", (x1, y1-10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    # Записываем кадр в видео
    out.write(frame)

# Освобождаем ресурсы
cap.release()
out.release()
# cv2.destroyAllWindows()

print(f"Видео сохранено как: {output_filename}")


0: 384x640 2 persons, 10 cars, 9.7ms
Speed: 2.1ms preprocess, 9.7ms inference, 1.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 10 cars, 8.5ms
Speed: 2.2ms preprocess, 8.5ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 11 cars, 6.9ms
Speed: 2.1ms preprocess, 6.9ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 13 cars, 7.4ms
Speed: 2.1ms preprocess, 7.4ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 10.4ms
Speed: 2.1ms preprocess, 10.4ms inference, 1.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 7.4ms
Speed: 2.0ms preprocess, 7.4ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 10.0ms
Speed: 2.1ms preprocess, 10.0ms inference, 1.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 7.4ms
Speed: 2.0ms preprocess, 

# Используя код выше реализуйте задания ниже (все в одной общей задаче)

Задача 1

Сделайте так, чтобы трекер работал только в заданной области (например, внутри прямоугольника [x1, y1, x2, y2]).

Задача 2

Сделайте так, чтобы цвет bbox менялся в зависимости от misses

Задача 3

Добавьте трекам поле history, хранящее последние N координат, и визуализируйте их с ниспадающей яркостью

Задача 4

Выводите для кадра:

Число активных треков.
Среднюю скорость движения (пикселей/кадр) для каждого объекта.

In [1]:
!pip install numpy opencv-python Pillow onemetric >> None
!pip install git+https://github.com/ifzhang/ByteTrack.git --no-deps >> None
!pip install ultralytics >> None

  Running command git clone --filter=blob:none --quiet https://github.com/ifzhang/ByteTrack.git /tmp/pip-req-build-w3luqt6u


In [2]:
import cv2
import numpy as np
from datetime import datetime
from ultralytics import YOLO
from google.colab.patches import cv2_imshow

Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


In [3]:
def iou_batch(bboxes1, bboxes2):
    x11, y11, x12, y12 = np.split(bboxes1, 4, axis=1)
    x21, y21, x22, y22 = np.split(bboxes2, 4, axis=1)

    xA = np.maximum(x11, x21.T)
    yA = np.maximum(y11, y21.T)
    xB = np.minimum(x12, x22.T)
    yB = np.minimum(y12, y22.T)

    interArea = np.maximum(0, xB - xA) * np.maximum(0, yB - yA)
    boxAArea = (x12 - x11) * (y12 - y11)
    boxBArea = (x22 - x21) * (y22 - y21)

    iou = interArea / (boxAArea + boxBArea.T - interArea)
    return iou

In [4]:
class SimpleByteTracker:
    def __init__(self, track_thresh=0.5, match_thresh=0.8, max_misses=5, history_size=10):
        self.track_thresh = track_thresh
        self.match_thresh = match_thresh
        self.max_misses = max_misses
        self.history_size = history_size
        self.tracks = []
        self.next_id = 1

    def update(self, detections, img_size):
        valid_dets = [d for d in detections if d[4] >= self.track_thresh]
        matched = set()
        matched_tracks = set()

        for track in self.tracks:
            track['misses'] += 1

        if self.tracks and valid_dets:
            track_boxes = np.array([t['bbox'] for t in self.tracks])
            det_boxes = np.array([d[:4] for d in valid_dets])

            iou_matrix = iou_batch(track_boxes, det_boxes)

            for i, track in enumerate(self.tracks):
                best_match = np.argmax(iou_matrix[i])
                if iou_matrix[i, best_match] > self.match_thresh:
                    track['bbox'] = valid_dets[best_match][:4]
                    track['misses'] = 0
                    x1, y1, x2, y2 = track['bbox']
                    cx = int((x1 + x2) / 2)
                    cy = int((y1 + y2) / 2)
                    track['history'].append((cx, cy))
                    if len(track['history']) > self.history_size:
                        track['history'].pop(0)
                    matched.add(best_match)
                    matched_tracks.add(i)

        self.tracks = [
            t for t in self.tracks
            if t['misses'] <= self.max_misses
        ]

        for i, det in enumerate(valid_dets):
            if i not in matched:
                x1, y1, x2, y2 = det[:4]
                cx = int((x1 + x2) / 2)
                cy = int((y1 + y2) / 2)
                self.tracks.append({
                    'id': self.next_id,
                    'bbox': det[:4],
                    'score': det[4],
                    'misses': 0,
                    'history': [(cx, cy)]
                })
                self.next_id += 1

        return self.tracks

In [5]:
roi = (100, 100, 600, 600)
model = YOLO("yolov8n.pt")

cap = cv2.VideoCapture('pedestrian.mp4')

frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))

output_filename = f"tracking_result_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp4"
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_filename, fourcc, fps, (frame_width, frame_height))

tracker = SimpleByteTracker()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    cv2.rectangle(frame, (roi[0], roi[1]), (roi[2], roi[3]), (255, 0, 0), 2)

    results = model(frame, conf=0.5)

    detections = []
    for result in results:
        boxes = result.boxes.xyxy.cpu().numpy()
        confs = result.boxes.conf.cpu().numpy()
        class_ids = result.boxes.cls.cpu().numpy()

        for box, conf, cls_id in zip(boxes, confs, class_ids):
            x1, y1, x2, y2 = box
            cx = (x1 + x2) / 2
            cy = (y1 + y2) / 2
            if roi[0] <= cx <= roi[2] and roi[1] <= cy <= roi[3]:
                detections.append([x1, y1, x2, y2, conf, cls_id])

    tracks = tracker.update(detections, (frame.shape[1], frame.shape[0]))

    active_count = len(tracks)
    cv2.putText(frame, f"Active tracks: {active_count}", (10, 30),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

    for track in tracks:
        x1, y1, x2, y2 = map(int, track['bbox'])
        misses = track['misses']
        ratio = min(misses / tracker.max_misses, 1.0)
        color = (0, int(255 * (1 - ratio)), int(255 * ratio))
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)

        history = track['history']
        if len(history) > 1:
            total_dist = 0.0
            for i in range(1, len(history)):
                dx = history[i][0] - history[i - 1][0]
                dy = history[i][1] - history[i - 1][1]
                total_dist += np.hypot(dx, dy)
            avg_speed = total_dist / (len(history) - 1)
        else:
            avg_speed = 0.0
        cv2.putText(frame, f"ID:{track['id']} S:{avg_speed:.1f}", (x1, y1 - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

        for idx, (hx, hy) in enumerate(history):
            brightness = int(255 * (idx + 1) / len(history))
            cv2.circle(frame, (hx, hy), 3, (brightness, brightness, brightness), -1)

    out.write(frame)

cap.release()
out.release()

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt'...


100%|██████████| 6.25M/6.25M [00:00<00:00, 90.9MB/s]



0: 384x640 2 persons, 10 cars, 46.3ms
Speed: 14.7ms preprocess, 46.3ms inference, 302.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 10 cars, 7.0ms
Speed: 2.4ms preprocess, 7.0ms inference, 1.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 11 cars, 6.9ms
Speed: 2.0ms preprocess, 6.9ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 12 cars, 6.9ms
Speed: 2.6ms preprocess, 6.9ms inference, 1.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 6.8ms
Speed: 2.5ms preprocess, 6.8ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 6.7ms
Speed: 2.5ms preprocess, 6.7ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 9.4ms
Speed: 2.3ms preprocess, 9.4ms inference, 1.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 10 cars, 9.2ms
Speed: 2.1ms preprocess, 

In [6]:
output_filename

'tracking_result_20250606_144834.mp4'

In [7]:
!ffmpeg -i tracking_result_20250606_144834.mp4 -c:v libx264 -c:a aac output.mp4

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab

In [8]:
from IPython.display import Video, display
video_path = 'output.mp4'
display(Video(video_path, embed=True, width=640, height=480))