### Executive Summary: Car Detection with ByteTrack - An Introductory Guide

This guide is designed to provide a beginner-friendly introduction to the application of ByteTrack for car detection in video footage. ByteTrack is an advanced algorithm that leverages the capabilities of the YOLO (You Only Look Once) model for object detection, specifically focusing on tracking objects across video frames.

For more information on YOLO and ultralytics, visit [this link](https://github.com/ultralytics/ultralytics).

For more information on ByteTrack, visit [this link](https://github.com/ifzhang/ByteTrack).

1. **Frame Extraction**: 
   This video is decomposed into frames, transforming continuous video into discrete snapshots for analysis.

2. **Detection and tracking**: 
   We initialize the ByteTracker object and load the pre-trained Yolo model, indicating its parameters. Going through all the frames of the video, the YOLO model enables object detection. Tracking is handled by the ByteTrack algorithm, using the bounding boxes and assigning each of it an ID that enables to track its movement.

3. **Visualization of Tracking**: 
   Recomposing the video from the frames with object detected, writing it in a MP4 format to same folder.


In [None]:
%load_ext autoreload
%autoreload 2
import glob
import matplotlib.pyplot as plt
import cv2
import numpy as np
import pandas as pd

# YOLO and video packages 
from ultralytics import YOLO
from bytetracker import BYTETracker
from bytetracker.basetrack import BaseTrack
from utils import draw_all_bbox_on_image, yolo_results_to_bytetrack_format, scale_bbox_as_xyxy
from IPython.display import Video

In [None]:
# Download the video
VIDEO_PATH = 'videos/traffic.mp4'
!if [ ! -f $VIDEO_PATH ]; then mkdir -p videos && wget https://storage.googleapis.com/bytetrack-data-public/traffic.mp4 -O $VIDEO_PATH; fi

#### Reading video

In [None]:
Video(VIDEO_PATH, width=800,embed=True)

#### 1. Frame Extraction 

In [None]:
# You can run this only once:
# Transform this VIDEO_PATH into a list of frames in this folder under frames/
!mkdir -p frames && ffmpeg -i $VIDEO_PATH -vf fps=12 frames/%d.png -hide_banner -loglevel panic

In [None]:
# - list and sort PNG frames in the 'frames' directory, ensuring they are ordered numerically for subsequent processing.
# - usinglob to find all PNG files and sorts them based on the numeric part of their filenames, avoiding lexicographic order issues

In [None]:
available_frames = glob.glob("frames/*.png")
available_frames = sorted(available_frames, key=lambda x: int(x.split("/")[-1].split(".")[0]))

In [None]:
%matplotlib inline

MODEL_WEIGHTS = "yolov8m.pt"

model = YOLO(MODEL_WEIGHTS)
results = model(available_frames[0])[0]

plt.imshow(cv2.cvtColor(results.plot(), cv2.COLOR_BGR2RGB))
plt.show()

#### Classes for prediction, indicating which object to detect


In [None]:
### We will track only car 
CAR_CLASS_ID = 2


   #### BYTETracker Parameters
   - `track_thresh`: Threshold for considering a detection as a potential object to track.
   - `track_buffer`: Number of frames to keep tracking information for an object before discarding it.
   - `match_thresh`: Threshold for matching detections between consecutive frames.
   - `frame_rate`: Frame rate of the video or sequence being processed.

In [None]:
tracker = BYTETracker(track_thresh= 0.15, track_buffer = 3, match_thresh = 0.85, frame_rate= 12)
BaseTrack._count = 0

In [None]:
model = YOLO(MODEL_WEIGHTS, task="detect")

#### 2. Detection and tracking

In [None]:
all_tracked_objects  = []
for frame_id, image_filename in enumerate(available_frames):
    img = cv2.imread(image_filename)
    detections = model.predict(img, classes=[CAR_CLASS_ID], conf=0.15, verbose=False)[0]
    detections_bytetrack_format = yolo_results_to_bytetrack_format(detections)
    tracked_objects = tracker.update(detections_bytetrack_format, frame_id)
    if len(tracked_objects) > 0:
        tracked_objects = np.insert(tracked_objects, 0, frame_id, axis=1)
        all_tracked_objects.append(tracked_objects)

#### Scaling the bounding boxes to match with original image size 

In [None]:
df_tracked = pd.DataFrame(np.concatenate(all_tracked_objects), columns=["frame_id", "x1", "y1", "x2", "y2", "track_id", "class", "confidence"])
df_tracked[["x1", "y1", "x2", "y2"]] = df_tracked[["x1", "y1", "x2", "y2"]].apply(
    lambda x: scale_bbox_as_xyxy(x[0:4], detections.orig_shape), axis=1, result_type="expand"
    )


#### 3. Visualization of Tracking

In [None]:
fourcc = cv2.VideoWriter_fourcc(*'H264')
OUTPUT_WITH_BBOX = "videos/traffic_tracked.mp4"
out = cv2.VideoWriter(OUTPUT_WITH_BBOX, fourcc, 12, (1280, 720))
for frame_id, image_filename in enumerate(available_frames):
    image = cv2.imread(image_filename)
    if frame_id in df_tracked.frame_id.astype('int').values:
        df_current_frame = df_tracked[df_tracked.frame_id == frame_id][["x1", "y1", "x2", "y2", "track_id", "class", "confidence"]].to_numpy()
        image = draw_all_bbox_on_image(image, df_current_frame)
    out.write(image)
out.release()
print("Video with bounding box is saved at:", OUTPUT_WITH_BBOX)

In [None]:
print("Number of detected objects: ", len(df_tracked.track_id.unique()))

In [None]:
video_path = "videos/traffic_tracked.mp4"
display(Video(video_path, embed=True, width=800))