In [None]:
pip install torch torchvision yolov5 deep_sort_realtime

In [None]:
import cv2
import torch
from yolov5 import YOLOv5
from deep_sort_realtime.deepsort_tracker import DeepSort
# Import the necessary function for displaying images in Colab
from google.colab.patches import cv2_imshow

# Initialize YOLOv5 model
model_path = 'yolov5s.pt'  # Adjust path to your YOLOv5 model file
device = 'cuda' if torch.cuda.is_available() else 'cpu'
yolo_model = YOLOv5(model_path, device)

# Initialize Deep SORT tracker
tracker = DeepSort(max_age=30, n_init=3, nn_budget=100)

# Open video file
cap = cv2.VideoCapture('/content/5538137-hd_1920_1080_25fps.mp4')

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Run object detection
    results = yolo_model.predict(frame)

    # Prepare detections for the tracker
    detections = []
    for *xyxy, conf, cls in results.pred[0]:
        bbox = [int(xyxy[0]), int(xyxy[1]), int(xyxy[2]) - int(xyxy[0]), int(xyxy[3]) - int(xyxy[1])]
        detections.append((bbox, conf, cls))

    # Pass detections to the tracker
    tracks = tracker.update_tracks(detections, frame=frame)

    # Loop over the tracks and draw boxes with IDs
    for track in tracks:
        if not track.is_confirmed():
            continue
        track_id = track.track_id
        bbox = track.to_tlbr()  # Get bounding box in (x1, y1, x2, y2) format

        # Draw bounding box and label on the frame
        cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), (0, 255, 0), 2)
        cv2.putText(frame, f"ID: {track_id}", (int(bbox[0]), int(bbox[1]) - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

    # Display the frame using cv2_imshow instead of cv2.imshow
    cv2_imshow(frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Import Required Libraries:

cv2: OpenCV library for handling video and image processing.

torch: PyTorch library to utilize CUDA (GPU) if available.

YOLOv5: The YOLOv5 model for object detection.

DeepSort: Multi-object tracking class that uses a deep association metric for tracking.

cv2_imshow: Colab-specific function for displaying frames.


**Load YOLOv5 Model:**


model_path = 'yolov5s.pt'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
yolo_model = YOLOv5(model_path, device)


model_path: Specifies the path to the pretrained YOLOv5 model file (yolov5s.pt), which is used for detecting objects in each frame.

device: Determines whether to use the GPU (cuda) or CPU (cpu) based on availability.

yolo_model: Instantiates the YOLOv5 model on the specified device.


**Initialize Deep SORT Tracker:**


tracker = DeepSort(max_age=30, n_init=3, nn_budget=100)

max_age: The maximum number of frames a track can be missing before it is deleted.

n_init: Minimum number of frames for a new detection to be confirmed as a track.

nn_budget: Limits the size of the embedding gallery to control memory usage.


**Open the Video File:**


cap = cv2.VideoCapture('/content/5538137-hd_1920_1080_25fps.mp4')

cap: Video capture object to read the video file frame-by-frame.

You can replace the path with any video file in your Colab workspace.

**Process Each Frame:**

The while cap.isOpened() loop ensures the video is open and processes each frame in sequence.

ret, frame = cap.read(): Reads each frame of the video.

if not ret: break: Exits the loop if there are no more frames.


**Run Object Detection:**


results = yolo_model.predict(frame)

Detects objects in the current frame and stores the results, which include bounding box coordinates, confidence scores, and class labels.

**Prepare Detections for Tracking:**


detections = []
for *xyxy, conf, cls in results.pred[0]:
    bbox = [int(xyxy[0]), int(xyxy[1]), int(xyxy[2]) - int(xyxy[0]), int(xyxy[3]) - int(xyxy[1])]
    detections.append((bbox, conf, cls))

Extracts detection information from results.pred[0].

bbox: Bounding box coordinates are formatted as (x1, y1, width, height) for compatibility with the tracker.

detections: Collects bounding boxes, confidence scores, and class labels for each detected object.

**Track Objects Using Deep SORT:**


tracks = tracker.update_tracks(detections, frame=frame)

update_tracks: Deep SORT updates each object’s position and maintains an ID for each tracked object across frames.


**Draw Bounding Boxes and Track IDs:**


for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    bbox = track.to_tlbr()  # Get bounding box in (x1, y1, x2, y2) format

    # Draw bounding box and label on the frame
    cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), (0, 255, 0), 2)
    cv2.putText(frame, f"ID: {track_id}", (int(bbox[0]), int(bbox[1]) - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

For each confirmed track, the code draws a bounding box and assigns a unique ID to each object.

track.to_tlbr(): Converts the bounding box format to (x1, y1, x2, y2) for use with OpenCV drawing functions.

**Display Frame with Detections and Tracks:**


cv2_imshow(frame)

Displays the processed frame with bounding boxes and object IDs in Colab.

**Exit Condition:**


if cv2.waitKey(1) & 0xFF == ord('q'):
    break

Allows the user to quit the video processing loop by pressing the 'q' key (primarily for desktop environments, not necessary in Colab).

**Release Video Capture and Close Windows:**


cap.release()

cv2.destroyAllWindows()

Ensures resources are released and any OpenCV windows are closed (mainly for non-Colab environments).

