# **Object Tracking and Video Generation - CSI4533 Project**

This notebook implements a simplified version of the ByteTrack algorithm for object tracking as part of the CSI4533 university class. The project was completed by:
**Bruno Kazadi** (Student #: 300210848)

The implementation tracks objects across video frames using an IoU-based approach and creates visualizations of the tracking results. The system is designed to run in Google Colab for easy access to GPU resources.

## Project Overview

This project implements a simplified version of the ByteTrack algorithm and tests it on a sequence from the vivaFF database. The tracking performance is evaluated using the ID-switch metric, which counts how many times the tracker chooses the wrong box during the association phase.

The notebook consists of three main parts:
1. **Data Preparation**: Unzipping annotation and image files from the vivaFF database
2. **Object Tracking**: Implementing a simplified ByteTrack algorithm
3. **Video Generation**: Creating a demonstration video from the tracking results

### Algorithm Specifications

- Use detections from the vivaFF database with confidence scores ≥ 0.5
- Use Intersection over Union (IoU) for similarity measurement
- Predict the next position of an object by moving the last bounding box by (dx,dy)
  - (dx,dy) is the difference between the centers of the last box and the penultimate box
  - If the tracklet contains only one box, then dx=dy=0
- A tracklet that has not been linked for 20 frames is considered finished
- Only tracklets containing more than 50 boxes are included in the final output

## Prerequisites

- Google Colab account
- Annotation file (in JSON format)
- Image dataset (frame sequence)

## Setup Instructions

1. Upload the required files to your Google Colab session:
   - `annotations.zip`: Contains the JSON annotation file
   - `images.zip`: Contains the frame images

2. Run the data preparation cell to unzip these files:
   ```python
   # Unzip annotation and image files
   !unzip -o /content/annotations.zip -d /content/annotations
   !unzip -o /content/images.zip -d /content/images
   ```

## Object Tracking

The tracking algorithm works as follows:

1. Loads annotation file (with object detections)
2. Processes each frame sequentially
3. Associates detections across frames using IoU (Intersection over Union)
4. Creates and manages object tracks
5. Generates annotated images with bounding boxes and track IDs

### Key Parameters

- `IOU_THRESHOLD` (default: 0.5): Minimum IoU required to associate a detection with an existing track
- `MAX_MISSING_FRAMES` (default: 20): Maximum number of consecutive frames a track can be missing before termination
- `MIN_TRACK_LENGTH` (default: 50): Minimum number of detections required for a track to be considered valid

### Output

- Annotated images in `/content/output_images/`
- Tracking results in `/content/output.json`

## Video Generation

The notebook includes functionality to create a video from the annotated images:

### Video Parameters

- `MAX_FRAMES`: Limit the number of frames to include (set to `None` to use all frames)
- `FPS`: Frames per second for the output video (default: 15.0)
- `IMAGE_DIR`: Directory containing the annotated images (default: '/content/output_images')
- `OUTPUT_VIDEO_PATH`: Path for the output video file (default: '/content/tracking_video.mp4')

### Usage

Simply run the video generation cell after the tracking has completed. Modify the parameters at the top of the cell to customize the video output:

```python
# Parameters to modify
MAX_FRAMES = 500  # Set to None to use all frames
FPS = 15.0
```

The video will be automatically downloaded to your local machine when complete.

## Code Structure

1. **Unzip Data**
   - Extracts annotation and image files

2. **Object Tracking**
   - `calculate_intersection_over_union()`: Calculates IoU between bounding boxes
   - `predict_bounding_box()`: Predicts the next position of a track based on motion
   - Main tracking loop: Associates detections with existing tracks or creates new ones
   - Visualization: Draws bounding boxes and track IDs on images

3. **Video Creation**
   - Sorts images by frame number
   - Creates a video file using OpenCV
   - Automatically downloads the video to your local machine

## Troubleshooting

- **Missing images**: Ensure the image paths in the annotation file match the actual image locations
- **Low association quality**: Try adjusting the `IOU_THRESHOLD` parameter
- **Tracks ending prematurely**: Increase the `MAX_MISSING_FRAMES` parameter
- **Too many short tracks**: Increase the `MIN_TRACK_LENGTH` parameter
- **Video creation issues**: Make sure all annotated images were generated successfully

## Example Usage

1. Upload annotation and image files
2. Run data preparation cell
3. Run object tracking cell
4. Inspect a few annotated images to verify tracking quality
5. Run video generation cell with desired parameters
6. Download and review the final video

## Submission Requirements

As per the course requirements, the following items need to be submitted:
1. **Code**: The complete Python implementation
2. **JSON File**: The generated output.json file containing the tracking results
3. **Demo Video** (optional): A short demonstration video showing the tracking implementation



In [49]:
# Unzip annotations
!unzip -o /content/annotations.zip -d /content/annotations

# Unzip images
!unzip -o /content/images.zip -d /content/images

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: /content/images/images/1637434200508810600.jpg  
  inflating: /content/images/images/1637434201383782100.jpg  
  inflating: /content/images/images/1637434206164871600.jpg  
  inflating: /content/images/images/1637434209274141000.jpg  
  inflating: /content/images/images/1637434210367855800.jpg  
  inflating: /content/images/images/1637434213102137900.jpg  
  inflating: /content/images/images/1637434213211511300.jpg  
  inflating: /content/images/images/1637434213867739800.jpg  
  inflating: /content/images/images/1637434214195856200.jpg  
  inflating: /content/images/images/1637434217758236500.jpg  
  inflating: /content/images/images/1637434217867608500.jpg  
  inflating: /content/images/images/1637434218633205300.jpg  
  inflating: /content/images/images/1637434219289433500.jpg  
  inflating: /content/images/images/1637434220273774200.jpg  
  inflating: /content/images/images/1637434220930003800.jpg  
  inf

In [50]:
import os

# Check annotations directory
print("Annotations files:")
print(os.listdir('annotations'))

# Check images directory
print("\nImages files:")
print(os.listdir('images'))

Annotations files:
['2021-11-20_lunch_2_cam0.json']

Images files:
['images']


In [51]:
import json
import os
import cv2
import numpy as np
from collections import defaultdict

# Configuration parameters
# Minimum IoU threshold required to associate a detection with an existing track
IOU_THRESHOLD = 0.5
# Maximum number of consecutive frames a track can be missing before termination
MAX_MISSING_FRAMES = 20
# Minimum number of detections required for a track to be considered valid
MIN_TRACK_LENGTH = 50

# Output directory for annotated images in Google Colab environment
OUTPUT_IMAGES_DIR = "/content/output_images"
os.makedirs(OUTPUT_IMAGES_DIR, exist_ok=True)

def calculate_intersection_over_union(box1, box2):
    """
    Calculate the Intersection over Union (IoU) between two bounding boxes.

    Args:
        box1 (list): First bounding box in format [x, y, w, h]
        box2 (list): Second bounding box in format [x, y, w, h]

    Returns:
        float: IoU score between 0 and 1
    """
    x1, y1, w1, h1 = box1
    x2, y2, w2, h2 = box2

    # Calculate coordinates of box corners
    x1_max, y1_max = x1 + w1, y1 + h1
    x2_max, y2_max = x2 + w2, y2 + h2

    # Calculate intersection area
    x_intersection = max(0, min(x1_max, x2_max) - max(x1, x2))
    y_intersection = max(0, min(y1_max, y2_max) - max(y1, y2))
    intersection_area = x_intersection * y_intersection

    # Calculate union area
    union_area = (w1 * h1) + (w2 * h2) - intersection_area

    # Return IoU score
    return intersection_area / union_area if union_area != 0 else 0

def predict_bounding_box(track):
    """
    Predict the next position of a track's bounding box based on its movement pattern.
    Uses linear motion model based on the last two detections.

    Args:
        track (dict): Dictionary containing track information and past detections

    Returns:
        list: Predicted bounding box in format [x, y, w, h]
    """
    if len(track["detections"]) == 1:
        # If only one detection exists, reuse the same bounding box
        return track["detections"][-1]["bbox"]

    # Get the last two bounding boxes
    last_bbox = track["detections"][-1]["bbox"]
    second_last_bbox = track["detections"][-2]["bbox"]

    # Calculate centers of both bounding boxes
    last_center = (last_bbox[0] + last_bbox[2] / 2, last_bbox[1] + last_bbox[3] / 2)
    second_last_center = (second_last_bbox[0] + second_last_bbox[2] / 2,
                          second_last_bbox[1] + second_last_bbox[3] / 2)

    # Calculate displacement vector
    dx = last_center[0] - second_last_center[0]
    dy = last_center[1] - second_last_center[1]

    # Apply displacement to last bounding box to predict next position
    predicted_bbox = [
        last_bbox[0] + dx,
        last_bbox[1] + dy,
        last_bbox[2],  # Width remains the same
        last_bbox[3]   # Height remains the same
    ]
    return predicted_bbox

# Load annotation file from Colab environment
annotations_file = '/content/annotations/2021-11-20_lunch_2_cam0.json'
with open(annotations_file, 'r') as f:
    annotations = json.load(f)

# Organize detections by frame for easier access
detections_by_frame = defaultdict(list)
for detection in annotations['annotations']:
    # Filter detections by confidence threshold
    if detection['confidence'] >= 0.5:
        detections_by_frame[detection['image_id']].append(detection)

# Get sorted list of frames by image_id
frames = sorted(annotations['images'], key=lambda x: x['id'])

# Initialize tracking data structures
active_object_tracks = []     # Currently active tracks
completed_object_tracks = []  # Tracks that have ended
next_track_id = 1             # Counter for generating unique track IDs

# Process each frame sequentially
for frame in frames:
    image_id = frame['id']
    current_detections = detections_by_frame.get(image_id, [])

    print(f"\n=== Processing frame with ID: {image_id} ===")
    print(f"Number of detections in this frame: {len(current_detections)}")

    # Track whether each detection has been assigned to a track
    detection_assigned = [False] * len(current_detections)

    # First, try to associate each active track with current frame detections
    print(f"Active tracks before association: {len(active_object_tracks)}")
    for track in active_object_tracks:
        # Predict where the track's bounding box should appear in current frame
        predicted_box = predict_bounding_box(track)
        best_iou = 0
        best_detection_idx = -1

        # Find the detection with highest IoU to predicted box
        for i, det in enumerate(current_detections):
            if detection_assigned[i]:
                continue  # Skip already assigned detections

            iou = calculate_intersection_over_union(predicted_box, det['bbox'])
            if iou > best_iou:
                best_iou = iou
                best_detection_idx = i

        # If best IoU exceeds threshold, associate detection with track
        if best_iou >= IOU_THRESHOLD and best_detection_idx != -1:
            det = current_detections[best_detection_idx]
            track["detections"].append({
                "frame": image_id,
                "bbox": det['bbox'],
                "object_id": det['id']
            })
            track["last_frame"] = image_id
            track["missing"] = 0
            detection_assigned[best_detection_idx] = True

            print(f"   → Track {track['tracklet_id']} associated with detection {det['id']} (IoU={best_iou:.2f})")
        else:
            # Increment missing frame counter for this track
            track["missing"] += 1
            if best_iou > 0:
                print(f"   → Track {track['tracklet_id']} NOT associated (max IoU={best_iou:.2f} < threshold={IOU_THRESHOLD})")
            else:
                print(f"   → Track {track['tracklet_id']} NOT associated (no compatible detection)")

    # Terminate tracks inactive for too long
    still_active_tracks = []
    for track in active_object_tracks:
        if track["missing"] > MAX_MISSING_FRAMES:
            completed_object_tracks.append(track)
            print(f"      X Track {track['tracklet_id']} terminated (missing > {MAX_MISSING_FRAMES})")
        else:
            still_active_tracks.append(track)
    active_object_tracks = still_active_tracks

    # Create new tracks for unassigned detections
    for i, det in enumerate(current_detections):
        if not detection_assigned[i]:
            new_object_track = {
                "tracklet_id": next_track_id,
                "detections": [{
                    "frame": image_id,
                    "bbox": det['bbox'],
                    "object_id": det['id']
                }],
                "last_frame": image_id,
                "missing": 0
            }
            active_object_tracks.append(new_object_track)
            print(f"   → New track created: ID={next_track_id} for detection {det['id']}")
            next_track_id += 1

    print(f"Active tracks after association: {len(active_object_tracks)}")

    #--------------------------------------------------------------------------
    # Generate and save annotated images with bounding boxes
    #--------------------------------------------------------------------------

    file_name_fix = frame["file_name"].split("/")[-1]
    # Adjust image path for Colab environment
    image_path = os.path.join("/content/images/images", file_name_fix)
    img = cv2.imread(image_path)

    if img is None:
        # Handle image reading issues (invalid path, etc.)
        print(f"Unable to read image: {image_path}. Skipping to next frame.")
        continue

    # Draw bounding box for each active track with detection in current frame
    for track in active_object_tracks:
        # Skip if track has no detections
        if len(track["detections"]) == 0:
            continue

        last_det = track["detections"][-1]
        # Only visualize if last detection corresponds to current frame
        if last_det["frame"] == image_id:
            (x, y, w, h) = last_det["bbox"]

            # Draw bounding box rectangle
            cv2.rectangle(
                img,
                (int(x), int(y)),
                (int(w), int(h)),
                (0, 255, 0),  # Green color (BGR)
                2              # Line thickness
            )
            # Add track ID text above bounding box
            cv2.putText(
                img,
                f"ID: {track['tracklet_id']}",
                (int(x), int(y) - 5),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.6,          # Text size
                (0, 255, 0),  # Green color (BGR)
                2             # Line thickness
            )

    # Save annotated image to output directory
    output_path = os.path.join(OUTPUT_IMAGES_DIR, f"frame_{image_id}.jpg")
    cv2.imwrite(output_path, img)
    print(f"Annotated image saved: {output_path}")
    #--------------------------------------------------------------------------

# Add any remaining active tracks to completed tracks list
completed_object_tracks.extend(active_object_tracks)

# Filter to keep only tracks with sufficient length
valid_object_tracks = []
for track in completed_object_tracks:
    if len(track["detections"]) >= MIN_TRACK_LENGTH:
        # Store key information from the first detection
        first_det = track["detections"][0]
        valid_object_tracks.append({
            "object_id": first_det["object_id"],
            "tracklet_id": track["tracklet_id"],
            "image_id": first_det["frame"]
        })

# Save final tracking results to JSON file
output = {"tracklets": valid_object_tracks}
output_file = '/content/output.json'
with open(output_file, 'w') as f:
    json.dump(output, f)

print(f"\nTracking completed! {len(valid_object_tracks)} valid tracks have been saved to {output_file}")
print(f"Annotated images are saved in directory: {OUTPUT_IMAGES_DIR}")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
   → Track 1289 associated with detection 189791 (IoU=0.96)
   → Track 1300 associated with detection 189795 (IoU=1.00)
   → Track 1328 associated with detection 189797 (IoU=0.99)
   → Track 1329 associated with detection 189796 (IoU=0.94)
   → Track 1375 associated with detection 189800 (IoU=0.94)
   → Track 1400 associated with detection 189798 (IoU=1.00)
   → Track 1428 associated with detection 189793 (IoU=0.98)
   → Track 1431 associated with detection 189801 (IoU=0.82)
   → Track 1439 NOT associated (no compatible detection)
   → Track 1444 associated with detection 189792 (IoU=0.94)
   → Track 1450 NOT associated (no compatible detection)
Active tracks after association: 13
Annotated image saved: /content/output_images/frame_31153.jpg

=== Processing frame with ID: 31154 ===
Number of detections in this frame: 11
Active tracks before association: 13
   → Track 989 associated with detection 189811 (IoU=0.86)
   → Tr

In [52]:
# Create video from tracking images
import cv2
import os
import re
import numpy as np
from google.colab import files

# ===== PARAMETERS (MODIFY THESE) =====
# Set to a number (e.g., 500) to limit frames, or None to use all images
MAX_FRAMES = 500
# FPS for the output video
FPS = 15.0
# Directory containing the images
IMAGE_DIR = '/content/output_images'
# Output video path
OUTPUT_VIDEO_PATH = '/content/tracking_video.mp4'
# =====================================

# Get all image files
image_files = [f for f in os.listdir(IMAGE_DIR) if f.endswith('.jpg')]

# Sort images by frame number
def get_frame_number(filename):
    # Extract number from "frame_X.jpg" format
    match = re.search(r'frame_(\d+)\.jpg', filename)
    if match:
        return int(match.group(1))
    return 0

image_files.sort(key=get_frame_number)

if not image_files:
    print("No images found in the directory!")
else:
    # Limit the number of frames if MAX_FRAMES is specified
    if MAX_FRAMES is not None and MAX_FRAMES > 0:
        original_count = len(image_files)
        image_files = image_files[:MAX_FRAMES]
        print(f"Limiting video to the first {len(image_files)} images (from {original_count} total images)")

    # Read first image to get dimensions
    first_image_path = os.path.join(IMAGE_DIR, image_files[0])
    first_image = cv2.imread(first_image_path)
    height, width, layers = first_image.shape

    # Define codec and create VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    video = cv2.VideoWriter(OUTPUT_VIDEO_PATH, fourcc, FPS, (width, height))

    # Write each image to the video
    num_frames = len(image_files)
    for i, image_file in enumerate(image_files):
        image_path = os.path.join(IMAGE_DIR, image_file)
        frame = cv2.imread(image_path)

        if frame is not None:
            video.write(frame)

        # Print progress
        if i % 10 == 0 or i == num_frames - 1:
            print(f"Processing: {i+1}/{num_frames} images ({(i+1)/num_frames*100:.1f}%)")

    # Release the video writer
    video.release()

    print(f"Video created successfully at: {OUTPUT_VIDEO_PATH}")
    print(f"Video contains {num_frames} frames at {FPS} FPS (duration: {num_frames/FPS:.2f} seconds)")

    # Download video to local machine
    files.download(OUTPUT_VIDEO_PATH)
    print("Video download initiated. Check your browser's download section.")

Limiting video to the first 500 images (from 10468 total images)
Processing: 1/500 images (0.2%)
Processing: 11/500 images (2.2%)
Processing: 21/500 images (4.2%)
Processing: 31/500 images (6.2%)
Processing: 41/500 images (8.2%)
Processing: 51/500 images (10.2%)
Processing: 61/500 images (12.2%)
Processing: 71/500 images (14.2%)
Processing: 81/500 images (16.2%)
Processing: 91/500 images (18.2%)
Processing: 101/500 images (20.2%)
Processing: 111/500 images (22.2%)
Processing: 121/500 images (24.2%)
Processing: 131/500 images (26.2%)
Processing: 141/500 images (28.2%)
Processing: 151/500 images (30.2%)
Processing: 161/500 images (32.2%)
Processing: 171/500 images (34.2%)
Processing: 181/500 images (36.2%)
Processing: 191/500 images (38.2%)
Processing: 201/500 images (40.2%)
Processing: 211/500 images (42.2%)
Processing: 221/500 images (44.2%)
Processing: 231/500 images (46.2%)
Processing: 241/500 images (48.2%)
Processing: 251/500 images (50.2%)
Processing: 261/500 images (52.2%)
Proces

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Video download initiated. Check your browser's download section.
