# Object Detection and Tracking Tutorial
In this tutorial, we will perform object detection and tracking using a video file. We will annotate the detected objects and save the output as a labeled video.
## Steps Involved:
1. Importing important packages
2. Displaying the video
3. Reading metadata from the video
4. Extracting images from the video
5. Annotating frames with labels
6. Saving the annotated video

### Important Note ⚠️
**Do not change the name of the video file**. The labels are extracted from a CSV file that contains annotations for multiple videos, and the video name is used to match the labels with the correct video. Make sure to use the video file with its original name (e.g., `026c7465-309f6d33.mp4`).

### Step 1: Import Important Packages
We will import the necessary packages such as Pandas, Numpy, OpenCV, and Matplotlib. We will also use IPython for displaying the video within the notebook.

In [None]:
import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt
from glob import glob
import IPython.display as ipd
from tqdm import tqdm
import subprocess
plt.style.use('ggplot')

### Step 2: Display Video
To display the video, we use IPython's display functionality.

In [None]:
ipd.Video('026c7465-309f6d33.mp4', width=700)

### Step 3: Read Video Metadata
We will open the video file using OpenCV, and retrieve its properties like total number of frames, height, width, and frames per second (FPS).

In [None]:
cap = cv2.VideoCapture('026c7465-309f6d33.mp4')
# Get total number of frames
frame_count = cap.get(cv2.CAP_PROP_FRAME_COUNT)
# Get video height and width
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)
# Get frames per second
fps = cap.get(cv2.CAP_PROP_FPS)
cap.release()
print(f'Height: {height}, Width: {width}, FPS: {fps}')

### Step 4: Extract and Display Video Frames
We will now extract individual frames from the video, display them using Matplotlib, and annotate the frames with bounding boxes.

In [None]:
# Helper function to display OpenCV images in a notebook
def display_cv2_img(img, figsize=(10, 10)):
    img_ = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    fig, ax = plt.subplots(figsize=figsize)
    ax.imshow(img_)
    ax.axis('off')

# Extract and display a frame
cap = cv2.VideoCapture('026c7465-309f6d33.mp4')
ret, img = cap.read()
cap.release()
if ret:
    display_cv2_img(img)

### Step 5: Annotate Frames with Labels
We will annotate the detected objects with bounding boxes using OpenCV, and add category labels to the video frames.

In [None]:
# Read the labels from CSV
labels = pd.read_csv('labels.csv', low_memory=False)
video_labels = labels.query('videoName == "026c7465-309f6d33"').reset_index(drop=True)
video_labels['video_frame'] = (video_labels['frameIndex'] * 11.9).round().astype(int)


In [None]:
video_labels["category"].value_counts()

We will annotate and show frame number 1035 (Selected Randomly)

In [None]:

cap = cv2.VideoCapture('026c7465-309f6d33.mp4')
for frame in range(int(cap.get(cv2.CAP_PROP_FRAME_COUNT))):
    ret, img = cap.read()
    if frame == 1035:
        break
cap.release()
# Annotate the frame
frame_labels = video_labels.query('video_frame == 1035')
for _, d in frame_labels.iterrows():
    pt1 = int(d['box2d.x1']), int(d['box2d.y1'])
    pt2 = int(d['box2d.x2']), int(d['box2d.y2'])
    cv2.rectangle(img, pt1, pt2, (0, 0, 255), 3)
display_cv2_img(img)

###Step 6: Displaying Objects Colored by Category

In [None]:
color_map = {
    "car": (0, 0, 255),
    "truck": (0, 0, 100),
    "pedestrian": (255, 0, 0),
    "other vehicle": (0, 0, 150),
    "rider": (200, 0, 0),
    "bicycle": (0, 255, 0),
    "other person": (200, 0, 0),
    "trailer": (0, 150, 150),
    "motorcycle": (0, 150, 0),
    "bus": (0, 0, 100),
}

img_example = img.copy()
frame_labels = video_labels.query('video_frame == 1035')
for i, d in frame_labels.iterrows():
    pt1 = int(d['box2d.x1']), int(d['box2d.y1'])
    pt2 = int(d['box2d.x2']), int(d['box2d.y2'])
    color = color_map[d['category']]
    cv2.rectangle(img_example, pt1, pt2, color, 3)

display_cv2_img(img_example)

###Step 7: Adding Text

In [None]:
frame_labels = video_labels.query("video_frame == @frame")
font = cv2.FONT_HERSHEY_TRIPLEX
img_example = img.copy()
for i, d in frame_labels.iterrows():
    pt1 = int(d["box2d.x1"]), int(d["box2d.y1"])
    pt2 = int(d["box2d.x2"]), int(d["box2d.y2"])
    color = color_map[d["category"]]
    img_example = cv2.rectangle(img_example, pt1, pt2, color, 3)
    pt_text = int(d["box2d.x1"]) + 5, int(d["box2d.y1"] + 10)
    img_example = cv2.putText(img_example, d["category"], pt_text, font, 0.5, color)
display_cv2_img(img_example)
cap.release()

### Step 8: Save Annotated Video
We will write the annotations back to the video and save the output using OpenCV's `VideoWriter`. Finally, we compress the video using `ffmpeg`.

In [None]:
# Function to add annotations to video
def add_annotations(img, frame, video_labels):
    max_frame = video_labels.query('video_frame <= @frame')['video_frame'].max()
    frame_labels = video_labels.query('video_frame == @max_frame')
    for _, d in frame_labels.iterrows():
        pt1 = int(d['box2d.x1']), int(d['box2d.y1'])
        pt2 = int(d['box2d.x2']), int(d['box2d.y2'])
        color = color_map[d["category"]]
        img = cv2.rectangle(img, pt1, pt2, color, 3)
    return img

# Writing annotations to video
out = cv2.VideoWriter('out_test.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 59.94, (1280, 720))
cap = cv2.VideoCapture('026c7465-309f6d33.mp4')
for frame in tqdm(range(int(cap.get(cv2.CAP_PROP_FRAME_COUNT)))):
    ret, img = cap.read()
    if not ret:
        break
    img = add_annotations(img, frame, video_labels)
    out.write(img)
cap.release()
out.release()

## Conclusion
In this tutorial, we performed object detection and tracking using OpenCV and YOLO. We annotated the objects and saved the results in a video file.
[Source](https://www.kaggle.com/code/kirollosashraf/driving-video-object-tracking/notebook#Step1%7C-Import-important-packages) for this tutorial.
