# BananaTracker - Optimized Multi-Object Tracking

This notebook demonstrates the **optimized** BananaTracker pipeline with:
- **Batch YOLO detection** for GPU efficiency
- **FP16 inference** for speed
- **Optimized ECC** camera motion compensation (BoxMOT pattern)
- **SAM2.1 + Cutie** for mask-enhanced tracking

## Optimizations Applied:
| Optimization | Before | After |
|-------------|--------|-------|
| Detection Batching | 1 frame | 16 frames |
| Precision | FP32 | FP16 |
| ECC Iterations | 5000 | 100 |
| ECC Scale | 0.5 (2x) | 0.25 (4x) |
| Kalman motion_cov | Loop | Vectorized |

## Cell 1: Install Dependencies

In [None]:
# Install required packages
!pip install ultralytics opencv-python-headless tqdm
!pip install lap cython_bbox  # For ByteTrack tracker core

# Install SAM2.1 dependencies (HuggingFace transformers)
!pip install transformers>=4.35.0 huggingface_hub

# Install Cutie dependencies
!pip install omegaconf hydra-core

## Cell 2: Clone Repositories

In [None]:
import os

# Clone BananaTracker repository
if not os.path.exists('bananatracker'):
    !git clone https://github.com/USER/bananatracker.git

# Clone Cutie for temporal mask propagation
if not os.path.exists('Cutie'):
    !git clone https://github.com/hkchengrex/Cutie.git

# Create symlink for Cutie in mask_propagation folder
os.makedirs('bananatracker/bananatracker/mask_propagation', exist_ok=True)
if not os.path.exists('bananatracker/bananatracker/mask_propagation/Cutie'):
    os.symlink('/content/Cutie', 'bananatracker/bananatracker/mask_propagation/Cutie')

# Download Cutie weights
os.makedirs('Cutie/weights', exist_ok=True)
if not os.path.exists('Cutie/weights/cutie-base-mega.pth'):
    !wget -P Cutie/weights https://github.com/hkchengrex/Cutie/releases/download/v1.0/cutie-base-mega.pth

# Install BananaTracker in development mode
%cd bananatracker
!pip install -e .
%cd ..

## Cell 3: Configuration with Optimizations

Configure the optimized tracking pipeline with batch detection and improved ECC.

In [None]:
#@title Configuration { display-mode: "form" }

# Model Settings
YOLO_WEIGHTS = "/content/HockeyAI_model_weight.pt"  #@param {type:"string"}
SAM2_MODEL_ID = "facebook/sam2.1-hiera-large"  #@param ["facebook/sam2.1-hiera-tiny", "facebook/sam2.1-hiera-small", "facebook/sam2.1-hiera-base-plus", "facebook/sam2.1-hiera-large"]
CUTIE_WEIGHTS = "/content/Cutie/weights/cutie-base-mega.pth"  #@param {type:"string"}
HF_TOKEN = ""  #@param {type:"string"}

# Optimization Settings
USE_BATCH_DETECTION = True  #@param {type:"boolean"}
DETECTION_BATCH_SIZE = 16  #@param {type:"integer"}
USE_FP16 = True  #@param {type:"boolean"}

# ECC Settings (optimized defaults)
ECC_MAX_ITERATIONS = 100  #@param {type:"integer"}
ECC_EPS = 1e-5  #@param {type:"number"}
ECC_SCALE = 0.25  #@param {type:"number"}

# Mask Settings
ENABLE_MASKS = True  #@param {type:"boolean"}

print("Optimization Settings:")
print(f"  Batch Detection: {USE_BATCH_DETECTION} (batch_size={DETECTION_BATCH_SIZE})")
print(f"  FP16 Inference: {USE_FP16}")
print(f"  ECC: iterations={ECC_MAX_ITERATIONS}, eps={ECC_EPS}, scale={ECC_SCALE} ({int(1/ECC_SCALE)}x downscale)")
print(f"  Mask Module: {ENABLE_MASKS}")

In [None]:
import sys
sys.path.insert(0, '/content/bananatracker')
sys.path.insert(0, '/content/Cutie')

from bananatracker import BananaTrackerConfig

# Optimized configuration
config = BananaTrackerConfig(
    # Detection Settings (with FP16 and batch support)
    yolo_weights=YOLO_WEIGHTS,
    class_names=["Center Ice", "Faceoff", "Goalpost", "Goaltender", "Player", "Puck", "Referee"],
    track_classes=[3, 4, 5, 6],  # Goaltender, Player, Puck, Referee
    special_classes=[5],          # Puck - max-conf only
    detection_conf_thresh=0.4,
    detection_iou_thresh=0.7,
    detection_batch_size=DETECTION_BATCH_SIZE,
    use_half_precision=USE_FP16,

    # Centroid deduplication
    centroid_dedup_enabled=True,
    centroid_dedup_max_distance=36,

    # Tracker Settings
    track_thresh=0.5,
    track_buffer=90,
    match_thresh=0.8,

    # Camera Motion Compensation (optimized ECC)
    cmc_method="ecc",
    ecc_max_iterations=ECC_MAX_ITERATIONS,
    ecc_termination_eps=ECC_EPS,
    ecc_downscale=ECC_SCALE,
    cmc_downscale=2,

    # Mask Module Settings
    enable_masks=ENABLE_MASKS,
    sam2_model_id=SAM2_MODEL_ID,
    cutie_weights_path=CUTIE_WEIGHTS,
    hf_token=HF_TOKEN if HF_TOKEN else None,
    mask_start_frame=1,
    mask_bbox_overlap_threshold=0.6,
    sam2_use_fp16=USE_FP16,

    # Visualization
    class_colors={
        "Goaltender": (255, 165, 0),
        "Player": (255, 0, 0),
        "Puck": (0, 255, 0),
        "Referee": (0, 0, 255),
    },
    show_track_id=True,
    show_masks=True,
    mask_alpha=0.5,
    line_thickness=2,

    # Output
    output_video_path="/content/output_optimized.mp4",
    output_txt_path="/content/results.txt",
    device="cuda:0",
)

print("\nConfiguration created with optimizations!")

## Cell 4: Initialize Pipeline

In [None]:
import torch
from bananatracker import BananaTrackerPipeline

# Check GPU
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
print()

# Initialize pipeline
print("Initializing pipeline...")
pipeline = BananaTrackerPipeline(config)

print("\nPipeline initialized!")
print(f"  - Detector: YOLOv8 ({'FP16' if config.use_half_precision else 'FP32'})")
print(f"  - Batch size: {config.detection_batch_size}")
print(f"  - CMC: ECC (iter={config.ecc_max_iterations}, scale={config.ecc_downscale})")
print(f"  - Mask module: {'Enabled' if pipeline.mask_manager else 'Disabled'}")

## Cell 5: Run Optimized Batch Processing

This uses `process_video_batched()` which batches YOLO inference for GPU efficiency.

In [None]:
import time
import torch

INPUT_VIDEO = "/content/sample_video.mp4"  # Update with your video path

print(f"Processing: {INPUT_VIDEO}")
print(f"Batch size: {config.detection_batch_size}")
print(f"Using {'batched' if USE_BATCH_DETECTION else 'sequential'} detection")
print()

# Clear GPU cache
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()

start_time = time.time()

if USE_BATCH_DETECTION:
    # Use batched processing for GPU efficiency
    all_tracks = pipeline.process_video_batched(
        INPUT_VIDEO,
        batch_size=config.detection_batch_size
    )
else:
    # Standard sequential processing
    all_tracks = pipeline.process_video(INPUT_VIDEO)

elapsed = time.time() - start_time
fps = len(all_tracks) / elapsed

print(f"\n{'='*50}")
print(f"Processing Complete!")
print(f"{'='*50}")
print(f"Frames processed: {len(all_tracks)}")
print(f"Total time: {elapsed:.1f}s")
print(f"FPS: {fps:.1f}")
print(f"Peak GPU memory: {torch.cuda.max_memory_allocated() / 1e9:.1f} GB")
print(f"\nOutput video: {config.output_video_path}")
print(f"MOT results: {config.output_txt_path}")

## Cell 6: Benchmark - Batch vs Sequential

In [None]:
import time
import torch
import cv2

INPUT_VIDEO = "/content/sample_video.mp4"  # Update with your video path
TEST_FRAMES = 100  # Number of frames to benchmark

# Get frames for testing
cap = cv2.VideoCapture(INPUT_VIDEO)
frames = []
for _ in range(TEST_FRAMES):
    ret, frame = cap.read()
    if not ret:
        break
    frames.append(frame)
cap.release()

print(f"Benchmarking with {len(frames)} frames...")
print()

# Benchmark sequential detection
torch.cuda.synchronize()
start = time.time()
for frame in frames:
    _ = pipeline.detector.detect(frame)
torch.cuda.synchronize()
seq_time = time.time() - start
seq_fps = len(frames) / seq_time

# Benchmark batch detection
torch.cuda.synchronize()
start = time.time()
for i in range(0, len(frames), config.detection_batch_size):
    batch = frames[i:i + config.detection_batch_size]
    _ = pipeline.detector.detect_batch(batch)
torch.cuda.synchronize()
batch_time = time.time() - start
batch_fps = len(frames) / batch_time

print("Detection Benchmark Results:")
print(f"{'='*50}")
print(f"Sequential:  {seq_fps:.1f} FPS  ({seq_time:.2f}s)")
print(f"Batched:     {batch_fps:.1f} FPS  ({batch_time:.2f}s)")
print(f"Speedup:     {batch_fps/seq_fps:.1f}x")

## Cell 7: Display Output Video

In [None]:
%%capture
# Compress video for notebook display
OUTPUT_COMPRESSED = "/content/output_compressed.mp4"
!ffmpeg -y -i {config.output_video_path} -vcodec libx264 -crf 28 {OUTPUT_COMPRESSED}

In [None]:
from IPython.display import HTML
from base64 import b64encode

OUTPUT_COMPRESSED = "/content/output_compressed.mp4"

# Read and encode video
mp4 = open(OUTPUT_COMPRESSED, 'rb').read()
data_url = f"data:video/mp4;base64,{b64encode(mp4).decode()}"

# Display video
HTML(f'''
<video width="800" controls>
  <source src="{data_url}" type="video/mp4">
</video>
''')

## Cell 8: GPU Memory Monitoring

In [None]:
import torch

print("GPU Memory Status:")
print(f"{'='*50}")
print(f"Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
print(f"Cached:    {torch.cuda.memory_reserved() / 1e9:.2f} GB")
print(f"Peak:      {torch.cuda.max_memory_allocated() / 1e9:.2f} GB")
print(f"Free:      {(torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated()) / 1e9:.2f} GB")

# Clear cache if needed
# torch.cuda.empty_cache()
# print("\nCache cleared!")

## Cell 9: Fast Mode (No Masks)

In [None]:
# Configuration for maximum speed (no masks)
config_fast = BananaTrackerConfig(
    yolo_weights=YOLO_WEIGHTS,
    class_names=["Center Ice", "Faceoff", "Goalpost", "Goaltender", "Player", "Puck", "Referee"],
    track_classes=[3, 4, 5, 6],
    special_classes=[5],
    detection_conf_thresh=0.4,
    detection_batch_size=32,  # Larger batch for max GPU utilization
    use_half_precision=True,
    
    # Fast CMC
    cmc_method="orb",  # ORB is faster than ECC
    cmc_downscale=4,
    
    # Disable masks for speed
    enable_masks=False,
    
    # Output
    output_video_path="/content/output_fast.mp4",
    device="cuda:0",
)

print("Fast mode config created:")
print(f"  - Batch size: {config_fast.detection_batch_size}")
print(f"  - FP16: {config_fast.use_half_precision}")
print(f"  - CMC: {config_fast.cmc_method} ({config_fast.cmc_downscale}x downscale)")
print(f"  - Masks: Disabled")
print("\nUncomment below to run fast mode:")

# pipeline_fast = BananaTrackerPipeline(config_fast)
# all_tracks_fast = pipeline_fast.process_video_batched(INPUT_VIDEO, batch_size=32)

## Expected Performance

| Configuration | GPU Memory | FPS (estimate) |
|--------------|------------|----------------|
| Original (FP32, no batch, ECC 5000 iter) | ~7 GB | ~5-10 |
| Optimized (FP16, batch=16, ECC 100 iter) | ~15-20 GB | ~25-40 |
| Fast mode (FP16, batch=32, ORB, no masks) | ~10-15 GB | ~50-80 |

Actual performance depends on:
- GPU model (A100, V100, T4, etc.)
- Video resolution
- Number of objects per frame
- Whether masks are enabled