<h1>YOLOv5 Person Tracking with Unique ID Assignment</h1>

<p>This notebook demonstrates a YOLOv5-based person tracking system. The goal is to detect individuals in video frames and assign a unique tracking ID to each person, allowing consistent monitoring throughout the video.</p>

<h2>Objective:</h2>
<ul>
    <li>Detect individuals in video frames using YOLOv5.</li>
    <li>Assign a unique ID to each detected person.</li>
    <li>Track each person throughout the video frames based on the assigned ID.</li>
</ul>


<h2>Dataset Description:</h2>
<p>The dataset consists of video recordings where multiple individuals are present. The goal is to track each person throughout the video. Bounding boxes around each individual are generated and a unique ID is assigned to track their movements.</p>

<p>For privacy reasons, the dataset is not publicly available.</p>


In [None]:
from google.colab import files
file = files.upload()

Saving video.mp4 to video.mp4


In [None]:
!pip install tqdm




In [None]:
import cv2
import torch
import numpy as np
import matplotlib.pyplot as plt
from torchvision import models, transforms
from collections import deque
from tqdm import tqdm

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)

cuda


<h2>YOLOv5 Model Description:</h2>
<p>YOLOv5 is used in this notebook for detecting individuals in real-time from video frames. The system assigns a unique ID to each detected individual and tracks their movement across frames.</p>


In [None]:
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True).to(device)
model.eval()


Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:02<00:00, 81.6MB/s]


FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=0.0)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=0.0)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=0.0)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=0.0)
          (relu): ReLU(

<h2>Pre-processing:</h2>
<p>Prior to feeding the video frames into the YOLOv5 model, the following pre-processing steps were applied:</p>
<ul>
    <li>Frames are resized to optimize detection speed and accuracy.</li>
    <li>Data augmentation techniques are applied to improve the model’s ability to generalize.</li>
</ul>


In [None]:

def preprocess_frame(frame):
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
    return transform(frame).unsqueeze(0).to(device)


In [None]:

def detect_persons(frame, model):
    preprocessed_frame = preprocess_frame(frame)
    with torch.no_grad():
        predictions = model(preprocessed_frame)[0]

    threshold = 0.8
    boxes = predictions['boxes'].cpu().numpy()
    labels = predictions['labels'].cpu().numpy()
    scores = predictions['scores'].cpu().numpy()

    indices = np.where(scores > threshold)[0]
    boxes = boxes[indices]
    labels = labels[indices]

    return boxes, labels, scores[indices]


In [None]:

class Tracker:
    def __init__(self, max_lost=30):
        self.next_id = 1
        self.tracks = {}
        self.max_lost = max_lost

    def update(self, boxes):
        new_tracks = {}
        for box in boxes:
            centroid = (int((box[0] + box[2]) / 2), int((box[1] + box[3]) / 2))
            found = False
            for track_id, track in list(self.tracks.items()):
                if np.linalg.norm(np.array(track['centroid']) - np.array(centroid)) < 50:
                    new_tracks[track_id] = {'centroid': centroid, 'box': box, 'lost': 0}
                    found = True
                    break
            if not found:
                new_tracks[self.next_id] = {'centroid': centroid, 'box': box, 'lost': 0}
                self.next_id += 1

        for track_id in list(self.tracks.keys()):
            if track_id not in new_tracks:
                if self.tracks[track_id]['lost'] >= self.max_lost:
                    del self.tracks[track_id]
                else:
                    self.tracks[track_id]['lost'] += 1
                    new_tracks[track_id] = self.tracks[track_id]

        self.tracks = new_tracks
        return self.tracks


In [None]:
video_path = '/content/video.mp4'
cap = cv2.VideoCapture(video_path)

output_path = '/content/output_video.mp4'
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path, fourcc, 30.0, (int(cap.get(3)), int(cap.get(4))))

tracker = Tracker()

total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

for _ in tqdm(range(total_frames), desc="Processing frames"):
    ret, frame = cap.read()
    if not ret:
        break

    boxes, labels, scores = detect_persons(frame, model)
    tracks = tracker.update(boxes)

    for track_id, track in tracks.items():
        box = track['box']
        cv2.rectangle(frame, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (0, 255, 0), 2)
        cv2.putText(frame, f"ID: {track_id}", (int(box[0]), int(box[1]) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    out.write(frame)

cap.release()
out.release()

Processing frames: 100%|██████████| 331/331 [00:45<00:00,  7.29it/s]


<h2>Results:</h2>
<p>The tracking system successfully detects individuals in video frames and assigns unique IDs to track them over time. Below are two sets of videos showing the results:</p>

<p> For the video demo you can see the main README.md file of github repo.