# Player Tracking with ByteTrack

**SC549: Neural Networks - Programming Assignment 03**

In this notebook, we'll:
1. Understand multi-object tracking (MOT)
2. Implement ByteTrack algorithm
3. Assign unique IDs to players
4. Handle occlusions and re-identification
5. Generate tracked videos

---

## üéØ What is Object Tracking?

**Object Tracking** maintains identity of objects across video frames:
- **Input**: Detections from each frame
- **Output**: Same object gets same ID across all frames
- **Challenge**: Handle occlusions, camera motion, similar appearances

### Why Tracking is Hard:
1. **Occlusion**: Players overlap ‚Üí disappear temporarily
2. **Similar Appearance**: Players look alike
3. **Fast Motion**: Large displacement between frames
4. **ID Switches**: Tracker might confuse players

### How ByteTrack Works:
1. **Get detections** from YOLO for current frame
2. **Predict** where previous tracks should be now
3. **Match** new detections to predicted positions
4. **Assign IDs** based on matches
5. **Handle unmatched** detections (new players or lost tracks)

## 1. Import Libraries

In [1]:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from tqdm import tqdm
import torch
from collections import defaultdict

# YOLOv8 with tracking
from ultralytics import YOLO

# Set device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

print("‚úÖ Libraries imported successfully!")

Using device: cpu
‚úÖ Libraries imported successfully!


## 2. Setup Paths

In [3]:
# Directories
PROJECT_ROOT = Path('../')
DATA_DIR = PROJECT_ROOT / 'data'
VIDEOS_DIR = DATA_DIR / 'videos'
OUTPUTS_DIR = PROJECT_ROOT / 'outputs'

# Create directories
(OUTPUTS_DIR / 'videos').mkdir(parents=True, exist_ok=True)
(OUTPUTS_DIR / 'screenshots').mkdir(parents=True, exist_ok=True)
(OUTPUTS_DIR / 'metrics').mkdir(parents=True, exist_ok=True)

# Find videos
video_files = list(VIDEOS_DIR.glob('*.mp4')) + list(VIDEOS_DIR.glob('*.avi'))
print(f"üìπ Found {len(video_files)} video(s)")

üìπ Found 6 video(s)


## 3. Load Models

We'll use:
- **YOLOv8** for detection
- **ByteTrack** (built into Ultralytics) for tracking

In [4]:
# Load models
print("üì• Loading models...")

# Detection model
detection_model = YOLO('yolov8s.pt')
detection_model.to(device)

# Pose model (for combined tracking)
pose_model = YOLO('yolov8s-pose.pt')
pose_model.to(device)

print("‚úÖ Models loaded!")
print("   - Detection: YOLOv8s")
print("   - Pose: YOLOv8s-Pose")
print("   - Tracker: ByteTrack (built-in)")

üì• Loading models...
‚úÖ Models loaded!
   - Detection: YOLOv8s
   - Pose: YOLOv8s-Pose
   - Tracker: ByteTrack (built-in)


## 4. Test Tracking on Single Video

Let's see how tracking works on one video.

In [5]:
def track_players_simple(video_path, model, output_path=None, show_trajectory=True):
    """
    Track players in a video with ByteTrack.
    
    Args:
        video_path: Path to input video
        model: YOLO model
        output_path: Path to save output (optional)
        show_trajectory: Draw trajectory trails
    
    Returns:
        dict: Tracking statistics
    """
    # Open video
    cap = cv2.VideoCapture(str(video_path))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    
    # Video writer
    out = None
    if output_path:
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(str(output_path), fourcc, fps, (width, height))
    
    print(f"Tracking: {video_path.name}")
    
    # Track history for trajectory
    track_history = defaultdict(lambda: [])
    
    # Statistics
    unique_ids = set()
    frame_count = 0
    total_detections = 0
    
    pbar = tqdm(total=total_frames, desc="Tracking")
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Run tracking
        # persist=True enables ByteTrack
        # classes=[0] = only track persons
        results = model.track(frame, persist=True, classes=[0], conf=0.3, verbose=False)
        
        # Get tracking results
        if results[0].boxes.id is not None:
            boxes = results[0].boxes.xyxy.cpu().numpy()
            track_ids = results[0].boxes.id.cpu().numpy().astype(int)
            
            total_detections += len(track_ids)
            unique_ids.update(track_ids)
            
            # Draw trajectory
            annotated = results[0].plot()
            
            if show_trajectory:
                for box, track_id in zip(boxes, track_ids):
                    x1, y1, x2, y2 = box
                    cx = int((x1 + x2) / 2)
                    cy = int((y1 + y2) / 2)
                    
                    # Add to history
                    track_history[track_id].append((cx, cy))
                    
                    # Keep only last 30 points
                    if len(track_history[track_id]) > 30:
                        track_history[track_id].pop(0)
                    
                    # Draw trajectory
                    points = track_history[track_id]
                    for i in range(1, len(points)):
                        # FIX: Ensure thickness is always at least 1
                        thickness = max(1, int(np.sqrt(i / 30) * 3))
                        cv2.line(annotated, points[i-1], points[i], (0, 255, 255), thickness)
        else:
            annotated = frame.copy()
        
        # Write frame
        if out:
            out.write(annotated)
        
        frame_count += 1
        pbar.update(1)
    
    cap.release()
    if out:
        out.release()
    pbar.close()
    
    # Stats
    stats = {
        'total_frames': frame_count,
        'unique_ids': len(unique_ids),
        'total_detections': total_detections,
        'avg_detections_per_frame': total_detections / frame_count if frame_count > 0 else 0
    }
    
    print(f"‚úÖ Tracking complete!")
    print(f"   Unique player IDs: {stats['unique_ids']}")
    print(f"   Total detections: {stats['total_detections']}")
    print(f"   Avg per frame: {stats['avg_detections_per_frame']:.2f}\n")
    
    return stats, track_history

Tracking: input_video_1.mp4


Tracking:   0%|‚ñè                                                             | 1/297 [00:02<12:03,  2.44s/it]

error: OpenCV(4.13.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\drawing.cpp:1835: error: (-215:Assertion failed) 0 < thickness && thickness <= MAX_THICKNESS in function 'cv::line'


## 5. Understanding Tracking Results

Let's analyze what tracking gives us beyond detection.

In [None]:
if len(video_files) > 0:
    print("üìä Tracking vs Detection:\n")
    print("Detection Only:")
    print("  - Frame 1: [Person, Person, Person]")
    print("  - Frame 2: [Person, Person, Person]")
    print("  ‚Üí Who is who? No idea!\n")
    
    print("With Tracking:")
    print("  - Frame 1: [Player ID=1, Player ID=2, Player ID=3]")
    print("  - Frame 2: [Player ID=1, Player ID=2, Player ID=3]")
    print("  ‚Üí Same ID = same player!\n")
    
    print("üìà Benefits:")
    print("  1. Count unique players")
    print("  2. Analyze individual player movement")
    print("  3. Create player trajectories")
    print("  4. Measure player-specific statistics")

## 6. Visualize Player Trajectories

Show movement paths of each player.

In [None]:
if len(video_files) > 0 and history:
    # Plot trajectories
    fig, ax = plt.subplots(figsize=(12, 8))
    
    # Get video frame for background
    cap = cv2.VideoCapture(str(test_video))
    ret, frame = cap.read()
    cap.release()
    
    if ret:
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        ax.imshow(frame_rgb, alpha=0.5)
    
    # Plot each player's trajectory
    colors = plt.cm.rainbow(np.linspace(0, 1, len(history)))
    
    for (track_id, points), color in zip(history.items(), colors):
        if len(points) > 1:
            xs = [p[0] for p in points]
            ys = [p[1] for p in points]
            
            # Plot trajectory
            ax.plot(xs, ys, '-o', color=color, linewidth=2, 
                   markersize=4, label=f'Player {track_id}', alpha=0.7)
            
            # Mark start and end
            ax.plot(xs[0], ys[0], 'go', markersize=10, label='_nolegend_')  # Start: green
            ax.plot(xs[-1], ys[-1], 'ro', markersize=10, label='_nolegend_')  # End: red
    
    ax.set_title(f"Player Trajectories - {test_video.name}", fontsize=14)
    ax.legend(loc='best')
    ax.set_xlabel('X position (pixels)')
    ax.set_ylabel('Y position (pixels)')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    # Save
    save_path = OUTPUTS_DIR / 'screenshots' / 'player_trajectories.png'
    plt.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"üíæ Saved: {save_path}")
    print("\nüü¢ Green dot = Start position")
    print("üî¥ Red dot = End position")

## 7. Combined: Detection + Pose + Tracking

Now let's combine everything: detect players, estimate poses, and track with IDs!

In [None]:
def track_with_pose(video_path, pose_model, output_path):
    """
    Full pipeline: detection + pose estimation + tracking.
    
    Args:
        video_path: Path to input video
        pose_model: YOLOv8-Pose model
        output_path: Path to save output
    """
    cap = cv2.VideoCapture(str(video_path))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(str(output_path), fourcc, fps, (width, height))
    
    print(f"Processing: {video_path.name}")
    print("  With: Detection + Pose + Tracking")
    
    pbar = tqdm(total=total_frames, desc="Processing")
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Track with pose estimation
        results = pose_model.track(frame, persist=True, conf=0.3, verbose=False)
        
        # Annotate (includes boxes, IDs, and skeletons)
        annotated = results[0].plot()
        
        out.write(annotated)
        pbar.update(1)
    
    cap.release()
    out.release()
    pbar.close()
    
    print(f"‚úÖ Complete! Saved: {output_path}\n")

# Process all videos with full pipeline
if len(video_files) > 0:
    print("üé¨ Processing all videos with FULL PIPELINE...\n")
    
    for video_path in video_files:
        output_path = OUTPUTS_DIR / 'videos' / f"final_{video_path.name}"
        track_with_pose(video_path, pose_model, output_path)
    
    print("\n‚úÖ All videos processed with full pipeline!")
    print(f"üìÅ Final outputs: {OUTPUTS_DIR / 'videos'}")
else:
    print("‚ö†Ô∏è  No videos found")

## 8. Create Side-by-Side Comparison

Show: Original ‚Üí Detection ‚Üí Pose ‚Üí Full Pipeline

In [None]:
if len(video_files) > 0:
    video_path = video_files[0]
    cap = cv2.VideoCapture(str(video_path))
    
    # Get a middle frame
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    cap.set(cv2.CAP_PROP_POS_FRAMES, total_frames // 2)
    ret, frame = cap.read()
    cap.release()
    
    if ret:
        fig, axes = plt.subplots(2, 2, figsize=(14, 10))
        
        # 1. Original
        axes[0, 0].imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        axes[0, 0].set_title("1. Original Frame", fontsize=12, fontweight='bold')
        axes[0, 0].axis('off')
        
        # 2. Detection only
        det_results = detection_model(frame, classes=[0], conf=0.3, verbose=False)
        det_annotated = det_results[0].plot()
        axes[0, 1].imshow(cv2.cvtColor(det_annotated, cv2.COLOR_BGR2RGB))
        axes[0, 1].set_title("2. Player Detection", fontsize=12, fontweight='bold')
        axes[0, 1].axis('off')
        
        # 3. Pose estimation
        pose_results = pose_model(frame, conf=0.3, verbose=False)
        pose_annotated = pose_results[0].plot()
        axes[1, 0].imshow(cv2.cvtColor(pose_annotated, cv2.COLOR_BGR2RGB))
        axes[1, 0].set_title("3. Pose Estimation", fontsize=12, fontweight='bold')
        axes[1, 0].axis('off')
        
        # 4. Full pipeline with tracking
        track_results = pose_model.track(frame, persist=True, conf=0.3, verbose=False)
        track_annotated = track_results[0].plot()
        axes[1, 1].imshow(cv2.cvtColor(track_annotated, cv2.COLOR_BGR2RGB))
        axes[1, 1].set_title("4. Detection + Pose + Tracking", fontsize=12, fontweight='bold')
        axes[1, 1].axis('off')
        
        plt.suptitle("Complete Player Tracking Pipeline", fontsize=16, fontweight='bold', y=0.98)
        plt.tight_layout()
        
        # Save
        save_path = OUTPUTS_DIR / 'screenshots' / 'pipeline_comparison.png'
        plt.savefig(save_path, dpi=150, bbox_inches='tight')
        plt.show()
        
        print(f"üíæ Saved: {save_path}")

## 9. Tracking Quality Metrics

Analyze tracking consistency and ID switches.

In [None]:
if len(video_files) > 0:
    print("üìä Tracking Quality Metrics:\n")
    
    print("Key Metrics to Evaluate:")
    print("1. **ID Consistency**: Same player keeps same ID")
    print("2. **ID Switches**: When tracker confuses players")
    print("3. **Fragmentation**: Player disappears and reappears with new ID")
    print("4. **MOTA** (Multiple Object Tracking Accuracy)")
    print("5. **MOTP** (Multiple Object Tracking Precision)\n")
    
    print("üí° For this assignment:")
    print("   - Count unique IDs per video")
    print("   - Check if IDs remain consistent visually")
    print("   - Note any obvious ID switches in report")
    print("\n   (Ground truth not available for full MOTA/MOTP)")

## üéì Key Takeaways

### What We Learned:
1. **Multi-Object Tracking**: Assigning persistent IDs to objects
2. **ByteTrack Algorithm**: Motion-based association
3. **Trajectory Visualization**: Understanding player movement
4. **Combined Pipeline**: Detection + Pose + Tracking

### Tracking Challenges:
1. **Occlusion**: Players overlap ‚Üí temporary ID loss
2. **Similar Appearance**: Uniform colors make tracking harder
3. **Fast Motion**: Large movements between frames
4. **Camera Motion**: Background movement confuses tracker

### Applications:
1. **Player Statistics**: Count touches, distance run, speed
2. **Team Formation**: Analyze positioning over time
3. **Event Detection**: Passes, tackles, shots
4. **Heatmaps**: Where players spend most time

---

## ‚úÖ Checklist
- [ ] ByteTrack tracking implemented
- [ ] Unique IDs assigned to players
- [ ] Trajectories visualized
- [ ] Full pipeline working (detection + pose + tracking)
- [ ] Output videos generated with IDs

---

**Next**: Notebook 05 - Performance Evaluation & Metrics