# FitCoach Live Feedback - Benchmark Evaluation

This notebook evaluates the live feedback system on the videos from the QEVD-FIT-COACH benchmark.

IMPORTANT NOTES:
Make sure to update any file paths to match your repository.
This notebook also assumes the [QEVD Dataset Benchmark Data](https://www.qualcomm.com/developer/software/qevd-dataset/downloads) is downloaded and stored at a "QEVD-FIT-COACH-Benchmark" subdirectory.

It is recommended to run on an A-100 GPU.

---

## Step 1: Install Dependencies
Note: You may observe some errors related to pip's dependency resolver, which you can ignore.

Restart Runtime after installing and go directly to Step 2.

In [None]:
# Core dependencies
print("[1/12] Installing PyYAML...")
!pip install -q PyYAML==6.0

print("[2/12] Installing datasets...")
!pip install -q datasets==2.14.6

print("[3/12] Installing evaluate...")
!pip install -q evaluate==0.4.1

print("[4/12] Installing OpenCV...")
!pip install -q opencv-python==4.9.0.80

print("[5/12] Installing transformers...")
!pip install -q transformers==4.36.0

print("[6/12] Installing accelerate...")
!pip install -q accelerate==0.24.1

print("[7/12] Installing peft...")
!pip install -q peft==0.5.0

print("[8/12] Installing bitsandbytes...")
!pip install -q bitsandbytes>=0.44.0

print("[9/12] Installing tqdm...")
!pip install -q tqdm

print("[10/12] Installing rouge_score...")
!pip install -q rouge_score

print("[11/12] Installing bert_score...")
!pip install -q bert_score

# Fix NumPy compatibility (OpenCV requires NumPy 1.x)
print("[12/12] Fixing NumPy compatibility...")
!pip install -q "numpy<2"

print("\n" + "="*60)
print("Setup complete! All required dependencies installed.")
print("="*60)
print("\nIMPORTANT: Restart Runtime")
print("After restart, go directly to Step 2")

[1/12] Installing PyYAML...
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.0/125.0 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build wheel ... [?25l[?25herror
[1;31merror[0m: [1msubprocess-exited-with-error[0m

[31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m See above for output.

[1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
[2/12] Installing datasets...
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m493.7/493.7 kB[

## Step 2: Mount Google Drive and Setup Paths

IMPORTANT NOTES: Make sure to update any file paths to match your repository.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Navigate to shared drive
%cd /content/drive/Shareddrives/'CIS6800 final project'

import os
DRIVE_ROOT = os.getcwd()
BENCHMARK_DIR = os.path.join(DRIVE_ROOT, "QEVD-FIT-COACH-Benchmark")
DOWNLOADS_DIR = os.path.join(DRIVE_ROOT, "downloads")
VIDEOS_DIR = os.path.join(BENCHMARK_DIR, "long_range_videos")
FEEDBACKS_JSON = os.path.join(BENCHMARK_DIR, "feedbacks_long_range.json")

# Create downloads directory
os.makedirs(DOWNLOADS_DIR, exist_ok=True)

print(f"Drive root: {DRIVE_ROOT}")
print(f"Benchmark dir: {BENCHMARK_DIR}")
print(f"Downloads dir: {DOWNLOADS_DIR}")
print(f"\nVerifying paths...")
print(f"Benchmark exists: {os.path.exists(BENCHMARK_DIR)}")
print(f"Videos dir exists: {os.path.exists(VIDEOS_DIR)}")
print(f"Feedbacks JSON exists: {os.path.exists(FEEDBACKS_JSON)}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/Shareddrives/CIS6800 final project
Drive root: /content/drive/Shareddrives/CIS6800 final project
Benchmark dir: /content/drive/Shareddrives/CIS6800 final project/QEVD-FIT-COACH-Benchmark
Downloads dir: /content/drive/Shareddrives/CIS6800 final project/downloads

Verifying paths...
Benchmark exists: True
Videos dir exists: True
Feedbacks JSON exists: True


## Step 3: Clone Repository

In [None]:
# Navigate to /content for repo
%cd /content

# Remove if exists
import shutil
if os.path.exists('/content/FitCoach'):
    print("Removing existing FitCoach directory...")
    shutil.rmtree('/content/FitCoach')

print("Cloning repository...\n")
!git clone -b live-feedback https://github.com/bryanaalfaro/FitCoach.git
%cd FitCoach
print("\nRepository cloned!")

/content
Removing existing FitCoach directory...
Cloning repository...

Cloning into 'FitCoach'...
remote: Enumerating objects: 136, done.[K
remote: Counting objects: 100% (33/33), done.[K
remote: Compressing objects: 100% (14/14), done.[K
remote: Total 136 (delta 23), reused 19 (delta 19), pack-reused 103 (from 1)[K
Receiving objects: 100% (136/136), 922.63 KiB | 1.36 MiB/s, done.
Resolving deltas: 100% (57/57), done.
/content/FitCoach

Repository cloned!


## Step 4: Download Models

### 4a: Login to HuggingFace

**Required:** Get access to LLaMA-2
1. Go to: https://huggingface.co/meta-llama/Llama-2-7b-hf
2. Click "Request access"
3. Get token: https://huggingface.co/settings/tokens

In [None]:
from huggingface_hub import notebook_login

print("Please login with your HuggingFace token:")
print("Get token: https://huggingface.co/settings/tokens\n")

notebook_login()

Please login with your HuggingFace token:
Get token: https://huggingface.co/settings/tokens



VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### 4b: Download LLaMA-2-7B

**Cached in Google Drive** - Downloads only once

In [None]:
from huggingface_hub import snapshot_download
import os

# Re-define paths in case runtime was restarted
DRIVE_ROOT = "/content/drive/Shareddrives/CIS6800 final project"
DOWNLOADS_DIR = os.path.join(DRIVE_ROOT, "downloads")
LLAMA_DIR = os.path.join(DOWNLOADS_DIR, "models/Llama-2-7b-hf")

# # Check if already downloaded - verify actual model files exist
# model_file = os.path.join(LLAMA_DIR, "model-00001-of-00002.safetensors")
# config_file = os.path.join(LLAMA_DIR, "config.json")

# if os.path.exists(config_file) and os.path.exists(model_file):
#     # Check file size to ensure it's complete (~10GB)
#     model_size = os.path.getsize(model_file) / (1024**3)  # Size in GB
#     if model_size > 9.5:  # Should be ~9.98GB
#         print("LLaMA-2-7B already downloaded in Drive")
#         print(f"Location: {LLAMA_DIR}")
#     else:
#         print(f"Partial download detected ({model_size:.2f}GB / 9.98GB)")
#         print("Resuming download...\n")
#         download_needed = True
# else:
#     print("Downloading LLaMA-2-7B (~13GB)...")
#     print("This will take 15-25 minutes...")
#     print("(If download stalls, just stop and re-run this cell - it will resume)\n")
#     download_needed = True

# if 'download_needed' in locals() and download_needed:
#     os.makedirs(LLAMA_DIR, exist_ok=True)

#     # Download with resume capability
#     try:
#         snapshot_download(
#             repo_id="meta-llama/Llama-2-7b-hf",
#             local_dir=LLAMA_DIR,
#             local_dir_use_symlinks=False,
#             resume_download=True,
#             max_workers=4  # Limit parallel downloads to reduce stalling
#         )
#         print("\nLLaMA-2-7B downloaded to Drive!")
#     except Exception as e:
#         print(f"\nDownload interrupted: {e}")
#         print("Re-run this cell to resume download")
#         raise

# Create symlink in FitCoach directory
%cd /content/FitCoach
if os.path.exists("./Llama-2-7b-hf"):
    os.remove("./Llama-2-7b-hf")
os.symlink(LLAMA_DIR, "./Llama-2-7b-hf")
print(f"\nSymlinked to /content/FitCoach/Llama-2-7b-hf")

/content/FitCoach

Symlinked to /content/FitCoach/Llama-2-7b-hf


### 4c: Download 3D CNN weights

In [None]:
import os

CNN_DIR = os.path.join(DOWNLOADS_DIR, "models/ckpts_efficientnet")
CNN_WEIGHTS = os.path.join(CNN_DIR, "fitness_ally_hypermodel/efficientnet4Lite_1.8.3.checkpoint")

if os.path.exists(CNN_WEIGHTS):
    print("3D CNN weights already downloaded in Drive")
    print(f"Location: {CNN_DIR}")
else:
    print("Downloading 3D CNN weights...")
    print("(wget auto-resumes if interrupted - just re-run this cell)\n")
    os.makedirs(CNN_DIR, exist_ok=True)

    # Use Python to download instead of wget to avoid path issues
    import urllib.request
    import shutil

    url = "https://github.com/Qualcomm-AI-research/FitCoach/releases/download/v1.0/efficientnet_3d_cnn_weights.tar.gz"
    tar_file = os.path.join(CNN_DIR, "efficientnet_3d_cnn_weights.tar.gz")

    print("Downloading...")
    urllib.request.urlretrieve(url, tar_file)

    print("\nExtracting (note: file is .tar not .tar.gz despite name)...")
    import tarfile
    with tarfile.open(tar_file, 'r') as tar:
        tar.extractall(path=CNN_DIR)

    print("3D CNN weights downloaded to Drive!")

# Create symlink
%cd /content/FitCoach
if os.path.exists("./ckpts_efficientnet"):
    os.remove("./ckpts_efficientnet")
os.symlink(CNN_DIR, "./ckpts_efficientnet")
print(f"\nSymlinked to /content/FitCoach/ckpts_efficientnet")

3D CNN weights already downloaded in Drive
Location: /content/drive/Shareddrives/CIS6800 final project/downloads/models/ckpts_efficientnet
/content/FitCoach

Symlinked to /content/FitCoach/ckpts_efficientnet


### 4d: Download Stream-VLM weights

In [None]:
import os

STREAMVLM_DIR = os.path.join(DOWNLOADS_DIR, "models/ckpts_streamvlm")
STREAMVLM_WEIGHTS = os.path.join(STREAMVLM_DIR, "ckpts_streamvlm/state_dict.pth.tar")

if os.path.exists(STREAMVLM_WEIGHTS):
    print("Stream-VLM weights already downloaded in Drive")
    print(f"Location: {STREAMVLM_DIR}")
else:
    print("Downloading Stream-VLM weights (6 parts, ~3.5GB total)...")
    print("This will take 5-10 minutes...")
    print("(If interrupted, just re-run this cell)\n")
    os.makedirs(STREAMVLM_DIR, exist_ok=True)

    # Use Python to download to avoid path issues with wget
    import urllib.request

    parts = ['aa', 'ab', 'ac', 'ad', 'ae', 'af']
    base_url = "https://github.com/Qualcomm-AI-research/FitCoach/releases/download/v1.0/streamvlm_weights.tar.gz."

    # Download each part
    for i, part in enumerate(parts, 1):
        print(f"Downloading part {i}/6...")
        url = base_url + part
        dest = os.path.join(STREAMVLM_DIR, f"streamvlm_weights.tar.gz.{part}")
        urllib.request.urlretrieve(url, dest)

    print("\nExtracting...")
    import subprocess

    # Combine and extract
    part_files = [os.path.join(STREAMVLM_DIR, f"streamvlm_weights.tar.gz.{p}") for p in parts]
    combined = os.path.join(STREAMVLM_DIR, "streamvlm_weights.tar.gz")

    # Concatenate parts
    with open(combined, 'wb') as outfile:
        for part_file in part_files:
            with open(part_file, 'rb') as infile:
                outfile.write(infile.read())

    # Extract
    import tarfile
    with tarfile.open(combined, 'r:gz') as tar:
        tar.extractall(path=STREAMVLM_DIR)

    print("Stream-VLM weights downloaded to Drive!")

# Create symlink
%cd /content/FitCoach
if os.path.exists("./ckpts_streamvlm"):
    os.remove("./ckpts_streamvlm")
os.symlink(STREAMVLM_DIR, "./ckpts_streamvlm")
print(f"\nSymlinked to /content/FitCoach/ckpts_streamvlm")

print("\n" + "="*60)
print("All models downloaded! Ready for evaluation.")
print("="*60)

Stream-VLM weights already downloaded in Drive
Location: /content/drive/Shareddrives/CIS6800 final project/downloads/models/ckpts_streamvlm
/content/FitCoach

Symlinked to /content/FitCoach/ckpts_streamvlm

All models downloaded! Ready for evaluation.


## Step 5: Load Model and Initialize Coach

In [None]:
import sys
sys.path.insert(0, '/content/FitCoach')

import yaml
import torch
from src.model_helpers import make_model
from scripts.live_feedback_lightweight import LightweightFeedbackCoach

%cd /content/FitCoach

# Use lightweight config for evaluation
config_path = "/content/FitCoach/configs/live_lightweight.yaml"
print(f"Using config: {config_path}")

with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

print("\nConfig settings:")
print(f"  Feedback interval: {config['evaluator']['sampling_kwargs']['feedback_interval']}s")
print(f"  Feature frequency: {config['evaluator']['sampling_kwargs']['feats_frequency']} fps")
print(f"  Max feedback length: {config['evaluator']['sampling_kwargs']['max_feedback_length']}")

# Load model
print("\nLoading Stream-VLM model...")
llama2_7b_path = config["model"]["llama2_7b_path"]
model_kwargs = config["model"]["kwargs"]
stream_vlm = make_model(llama2_7b_path, **model_kwargs)
stream_vlm.eval()
print("Model loaded")

# Initialize coach
cnn_weights_path = "./ckpts_efficientnet/fitness_ally_hypermodel/efficientnet4Lite_1.8.3.checkpoint"
coach = LightweightFeedbackCoach(
    model=stream_vlm,
    config=config,
    cnn_weights_path=cnn_weights_path,
    max_buffer_size=200
)
print("Coach initialized")

print("\n" + "="*60)
print("Ready for evaluation!")
print("="*60)

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


/content/FitCoach
Using config: /content/FitCoach/configs/live_lightweight.yaml

Config settings:
  Feedback interval: 15.0s
  Feature frequency: 2 fps
  Max feedback length: 48

Loading Stream-VLM model...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


'str' object has no attribute 'model' <class 'function'>  cannot be sent to  cuda
Model loaded
Loading 3D CNN for feature extraction...
3D CNN loaded successfully!
Coach initialized

Ready for evaluation!


## Step 6: Load Benchmark Dataset

In [None]:
import json
import os

def get_feedback_spans(feedbacks):
    """
    Extract unique feedback spans from dense frame-aligned feedback array.
    Based on src.fitness_datasets.fitcoach.get_feedback_span

    Returns list of unique feedback strings (one per span).
    """
    feedback_spans = []
    current_feedback = None

    for feedback in feedbacks:
        # When we encounter a new feedback (different from current)
        if current_feedback is not None and current_feedback != feedback:
            feedback_spans.append(current_feedback)
            current_feedback = None

        # Start tracking a new feedback when we encounter non-empty text
        if feedback and not current_feedback:
            current_feedback = feedback

    # Don't forget the last feedback if video ends while feedback is active
    if current_feedback is not None:
        feedback_spans.append(current_feedback)

    return feedback_spans

# Re-define paths
DRIVE_ROOT = "/content/drive/Shareddrives/CIS6800 final project"
BENCHMARK_DIR = os.path.join(DRIVE_ROOT, "QEVD-FIT-COACH-Benchmark")
VIDEOS_DIR = os.path.join(BENCHMARK_DIR, "long_range_videos")
FEEDBACKS_JSON = os.path.join(BENCHMARK_DIR, "feedbacks_long_range.json")

print("Loading benchmark dataset...")
with open(FEEDBACKS_JSON, 'r') as f:
    benchmark_data = json.load(f)

print(f"Total videos in benchmark: {len(benchmark_data)}")

# Evaluate on all 74 videos
NUM_EVAL_VIDEOS = 74
eval_videos = benchmark_data[:NUM_EVAL_VIDEOS]

print(f"\nEvaluating on first {NUM_EVAL_VIDEOS} videos:")
for i, video_data in enumerate(eval_videos, 1):
    video_file = video_data['long_range_video_file']

    # Extract unique feedback spans from dense array
    feedback_spans = get_feedback_spans(video_data['feedbacks'])

    # Count non-transition feedbacks (is_transition aligns with feedback_timestamps and spans)
    num_feedbacks = sum(
        1 for is_trans in video_data['is_transition']
        if not is_trans
    )

    print(f"  {i}. {os.path.basename(video_file)} - {num_feedbacks} ground truth feedbacks (spans: {len(feedback_spans)})")

Loading benchmark dataset...
Total videos in benchmark: 74

Evaluating on first 74 videos:
  1. 0006.mp4 - 28 ground truth feedbacks (spans: 35)
  2. 0009.mp4 - 34 ground truth feedbacks (spans: 41)
  3. 0010.mp4 - 30 ground truth feedbacks (spans: 37)
  4. 0011.mp4 - 27 ground truth feedbacks (spans: 34)
  5. 0012.mp4 - 31 ground truth feedbacks (spans: 38)
  6. 0013.mp4 - 22 ground truth feedbacks (spans: 29)
  7. 0014.mp4 - 24 ground truth feedbacks (spans: 31)
  8. 0015.mp4 - 31 ground truth feedbacks (spans: 38)
  9. 0016.mp4 - 26 ground truth feedbacks (spans: 33)
  10. 0017.mp4 - 16 ground truth feedbacks (spans: 23)
  11. 0018.mp4 - 31 ground truth feedbacks (spans: 38)
  12. 0019.mp4 - 23 ground truth feedbacks (spans: 30)
  13. 0023.mp4 - 25 ground truth feedbacks (spans: 32)
  14. 0024.mp4 - 13 ground truth feedbacks (spans: 20)
  15. 0025.mp4 - 14 ground truth feedbacks (spans: 21)
  16. 0026.mp4 - 34 ground truth feedbacks (spans: 41)
  17. 0027.mp4 - 24 ground truth feedb

## Step 7: Run Evaluation

Process each video and generate predictions

In [None]:
import cv2
import numpy as np
import time
from tqdm import tqdm

def get_feedback_spans(feedbacks):
    """Extract unique feedback spans from dense frame-aligned array."""
    feedback_spans = []
    current_feedback = None

    for feedback in feedbacks:
        if current_feedback is not None and current_feedback != feedback:
            feedback_spans.append(current_feedback)
            current_feedback = None

        if feedback and not current_feedback:
            current_feedback = feedback

    if current_feedback is not None:
        feedback_spans.append(current_feedback)

    return feedback_spans

# Get evaluation parameters from config
feedback_interval = config['evaluator']['sampling_kwargs']['feedback_interval']
feats_frequency = config['evaluator']['sampling_kwargs']['feats_frequency']
feature_interval = 1.0 / feats_frequency

print(f"Evaluation settings:")
print(f"  Feedback interval: {feedback_interval}s")
print(f"  Feature extraction: {feats_frequency} fps (every {feature_interval:.2f}s)")
print(f"\nStarting evaluation...\n")

all_results = []
system_prompt = "You are an expert fitness coaching AI who coaches users as they exercise. You assess their performance, count repetitions, and proactively provide feedback."

START_IDX = 1
for video_idx, video_data in enumerate(eval_videos[START_IDX-1:], START_IDX):
    video_file_rel = video_data['long_range_video_file']
    video_file = os.path.join(BENCHMARK_DIR, video_file_rel.lstrip('./'))

    print(f"\n{'='*60}")
    print(f"Video {video_idx}/{NUM_EVAL_VIDEOS}: {os.path.basename(video_file)}")
    print(f"{'='*60}")

    # Reset coach state manually (no reset() method)
    coach.feature_buffer.clear()
    coach.feedback_history.clear()

    # Use the FIRST GROUND TRUTH TIMESTAMP as the video start reference
    # (video_timestamps.npy has corrupted values)
    if len(video_data['feedback_timestamps']) > 0:
        video_start_timestamp = video_data['feedback_timestamps'][0]
    else:
        print("Warning: No feedback timestamps found, skipping video")
        continue

    # Open video
    cap = cv2.VideoCapture(video_file)
    if not cap.isOpened():
        print(f"Error: Could not open video {video_file}")
        continue

    fps = cap.get(cv2.CAP_PROP_FPS)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    duration = total_frames / fps if fps > 0 else 0

    # Extract unique feedback spans from dense array
    feedback_spans = get_feedback_spans(video_data['feedbacks'])

    # Filter non-transition feedbacks (is_transition aligns with feedback_timestamps and spans)
    gt_feedbacks = [
        fb for fb, is_trans in zip(feedback_spans, video_data['is_transition'])
        if not is_trans
    ]
    gt_timestamps = [
        ts for ts, is_trans in zip(video_data['feedback_timestamps'], video_data['is_transition'])
        if not is_trans
    ]

    # Normalize timestamps to be relative to video start
    gt_timestamps_normalized = [ts - video_start_timestamp for ts in gt_timestamps]

    print(f"Video info: {total_frames} frames, {fps:.2f} fps, {duration:.1f}s")
    print(f"Ground truth feedbacks: {len(gt_feedbacks)}")
    if len(gt_timestamps_normalized) > 0:
        print(f"GT timestamp range: {gt_timestamps_normalized[0]:.2f}s to {gt_timestamps_normalized[-1]:.2f}s")
    else:
        print("GT timestamp range: N/A (no non-transition feedbacks)")

    # Process video
    predictions = []
    pred_timestamps_normalized = []
    frame_count = 0
    oom_count = 0
    start_time = time.time()

    # Use FRAME-BASED timing for intervals
    last_feedback_elapsed = -999.0
    last_feature_elapsed = -999.0

    with tqdm(total=total_frames, desc="Processing") as pbar:
        while True:
            ret, frame = cap.read()
            if not ret:
                break

            frame_count += 1
            pbar.update(1)

            # Calculate elapsed time from frame count
            elapsed_time = (frame_count - 1) / fps if fps > 0 else 0

            # Extract features at specified rate (every 0.5s for 2 fps)
            if elapsed_time - last_feature_elapsed >= feature_interval:
                try:
                    preprocessed = coach.preprocess_frame(frame)
                    coach.feature_buffer.append(preprocessed)
                    last_feature_elapsed = elapsed_time
                except torch.cuda.OutOfMemoryError:
                    oom_count += 1
                    if torch.cuda.is_available():
                        torch.cuda.empty_cache()
                    while len(coach.feature_buffer) > 20:
                        coach.feature_buffer.popleft()
                except Exception as e:
                    print(f"\nError at frame {frame_count}: {e}")

            # Generate feedback at intervals (every 15s)
            if elapsed_time - last_feedback_elapsed >= feedback_interval:
                try:
                    feedback, _ = coach.generate_feedback(system_prompt)
                    if feedback and feedback.strip():
                        predictions.append(feedback)
                        pred_timestamps_normalized.append(elapsed_time)
                        pbar.set_postfix({'feedbacks': len(predictions)})
                    last_feedback_elapsed = elapsed_time
                except torch.cuda.OutOfMemoryError:
                    oom_count += 1
                    if torch.cuda.is_available():
                        torch.cuda.empty_cache()
                    while len(coach.feature_buffer) > 20:
                        coach.feature_buffer.popleft()
                except Exception as e:
                    print(f"\nFeedback error at frame {frame_count}: {e}")

    cap.release()
    processing_time = time.time() - start_time

    print(f"\nProcessed {frame_count} frames in {processing_time:.1f}s")
    print(f"Generated {len(predictions)} predictions (expected ~{int(duration/feedback_interval)})")
    print(f"Pred timestamp range: {pred_timestamps_normalized[0]:.2f}s to {pred_timestamps_normalized[-1]:.2f}s")
    if oom_count > 0:
        print(f"GPU OOM events: {oom_count}")

    # Store results with normalized timestamps
    all_results.append({
        'video_file': video_file_rel,
        'predictions': predictions,
        'pred_timestamps': pred_timestamps_normalized,
        'ground_truth': {
            'feedbacks': gt_feedbacks,
            'feedback_timestamps': gt_timestamps_normalized
        },
        'stats': {
            'frames_processed': frame_count,
            'processing_time': processing_time,
            'oom_events': oom_count
        }
    })

    # Save checkpoint after each video
    checkpoint_file = os.path.join(DRIVE_ROOT, f"checkpoint_video_{video_idx}.json")
    with open(checkpoint_file, 'w') as f:
        json.dump(all_results, f, indent=2)
    print(f"Checkpoint saved: {checkpoint_file}")

print("\n" + "="*60)
print("Video processing complete!")
print("="*60)

Evaluation settings:
  Feedback interval: 15.0s
  Feature extraction: 2 fps (every 0.50s)

Starting evaluation...


Video 1/74: 0006.mp4
Video info: 4874 frames, 30.00 fps, 162.5s
Ground truth feedbacks: 28
GT timestamp range: 2.89s to 184.13s


Processing: 100%|██████████| 4874/4874 [00:18<00:00, 266.41it/s, feedbacks=10]



Processed 4874 frames in 18.3s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_1.json

Video 2/74: 0009.mp4
Video info: 4700 frames, 30.00 fps, 156.7s
Ground truth feedbacks: 34
GT timestamp range: 5.08s to 178.39s


Processing: 100%|██████████| 4700/4700 [00:17<00:00, 266.57it/s, feedbacks=10]



Processed 4700 frames in 17.6s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_2.json

Video 3/74: 0010.mp4
Video info: 4775 frames, 30.00 fps, 159.2s
Ground truth feedbacks: 30
GT timestamp range: 3.08s to 181.56s


Processing: 100%|██████████| 4775/4775 [00:17<00:00, 274.27it/s, feedbacks=10]



Processed 4775 frames in 17.4s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_3.json

Video 4/74: 0011.mp4
Video info: 4934 frames, 30.00 fps, 164.5s
Ground truth feedbacks: 27
GT timestamp range: 3.18s to 184.96s


Processing: 100%|██████████| 4934/4934 [00:17<00:00, 276.40it/s, feedbacks=10]



Processed 4934 frames in 17.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_4.json

Video 5/74: 0012.mp4
Video info: 4988 frames, 30.00 fps, 166.3s
Ground truth feedbacks: 31
GT timestamp range: 2.38s to 178.37s


Processing: 100%|██████████| 4988/4988 [00:15<00:00, 324.84it/s, feedbacks=10]



Processed 4988 frames in 15.4s
Generated 10 predictions (expected ~11)
Pred timestamp range: 15.00s to 165.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_5.json

Video 6/74: 0013.mp4
Video info: 4818 frames, 30.00 fps, 160.6s
Ground truth feedbacks: 22
GT timestamp range: 4.69s to 169.58s


Processing: 100%|██████████| 4818/4818 [00:15<00:00, 312.74it/s, feedbacks=10]



Processed 4818 frames in 15.4s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_6.json

Video 7/74: 0014.mp4
Video info: 4786 frames, 30.00 fps, 159.5s
Ground truth feedbacks: 24
GT timestamp range: 1.68s to 163.71s


Processing: 100%|██████████| 4786/4786 [00:14<00:00, 341.84it/s, feedbacks=10]



Processed 4786 frames in 14.0s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_7.json

Video 8/74: 0015.mp4
Video info: 4883 frames, 30.00 fps, 162.8s
Ground truth feedbacks: 31
GT timestamp range: 2.09s to 187.22s


Processing: 100%|██████████| 4883/4883 [00:21<00:00, 231.18it/s, feedbacks=10]



Processed 4883 frames in 21.1s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_8.json

Video 9/74: 0016.mp4
Video info: 4915 frames, 30.00 fps, 163.8s
Ground truth feedbacks: 26
GT timestamp range: 2.18s to 174.89s


Processing: 100%|██████████| 4915/4915 [00:20<00:00, 245.16it/s, feedbacks=10]



Processed 4915 frames in 20.1s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_9.json

Video 10/74: 0017.mp4
Video info: 4821 frames, 30.00 fps, 160.7s
Ground truth feedbacks: 16
GT timestamp range: 33.22s to 179.40s


Processing: 100%|██████████| 4821/4821 [00:19<00:00, 244.10it/s, feedbacks=10]



Processed 4821 frames in 19.8s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_10.json

Video 11/74: 0018.mp4
Video info: 4783 frames, 30.00 fps, 159.4s
Ground truth feedbacks: 31
GT timestamp range: 2.78s to 177.37s


Processing: 100%|██████████| 4783/4783 [00:21<00:00, 222.98it/s, feedbacks=10]



Processed 4783 frames in 21.5s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_11.json

Video 12/74: 0019.mp4
Video info: 4726 frames, 30.00 fps, 157.5s
Ground truth feedbacks: 23
GT timestamp range: 4.18s to 169.35s


Processing: 100%|██████████| 4726/4726 [00:16<00:00, 293.43it/s, feedbacks=10]



Processed 4726 frames in 16.1s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_12.json

Video 13/74: 0023.mp4
Video info: 4868 frames, 30.00 fps, 162.3s
Ground truth feedbacks: 25
GT timestamp range: 3.28s to 174.68s


Processing: 100%|██████████| 4868/4868 [00:19<00:00, 249.26it/s, feedbacks=10]



Processed 4868 frames in 19.5s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_13.json

Video 14/74: 0024.mp4
Video info: 4745 frames, 30.00 fps, 158.2s
Ground truth feedbacks: 13
GT timestamp range: 4.99s to 79.36s


Processing: 100%|██████████| 4745/4745 [00:17<00:00, 265.52it/s, feedbacks=10]



Processed 4745 frames in 17.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_14.json

Video 15/74: 0025.mp4
Video info: 4706 frames, 30.00 fps, 156.9s
Ground truth feedbacks: 14
GT timestamp range: 3.98s to 115.18s


Processing: 100%|██████████| 4706/4706 [00:15<00:00, 300.60it/s, feedbacks=10]



Processed 4706 frames in 15.7s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_15.json

Video 16/74: 0026.mp4
Video info: 4741 frames, 30.00 fps, 158.0s
Ground truth feedbacks: 34
GT timestamp range: 2.48s to 181.17s


Processing: 100%|██████████| 4741/4741 [00:15<00:00, 305.31it/s, feedbacks=10]



Processed 4741 frames in 15.5s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_16.json

Video 17/74: 0027.mp4
Video info: 4725 frames, 30.00 fps, 157.5s
Ground truth feedbacks: 24
GT timestamp range: 2.98s to 177.37s


Processing: 100%|██████████| 4725/4725 [00:20<00:00, 233.78it/s, feedbacks=10]



Processed 4725 frames in 20.2s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_17.json

Video 18/74: 0028.mp4
Video info: 4803 frames, 30.00 fps, 160.1s
Ground truth feedbacks: 17
GT timestamp range: 38.92s to 168.52s


Processing: 100%|██████████| 4803/4803 [00:21<00:00, 219.81it/s, feedbacks=10]



Processed 4803 frames in 21.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_18.json

Video 19/74: 0029.mp4
Video info: 4823 frames, 30.00 fps, 160.8s
Ground truth feedbacks: 32
GT timestamp range: 3.58s to 176.17s


Processing: 100%|██████████| 4823/4823 [00:21<00:00, 226.25it/s, feedbacks=10]



Processed 4823 frames in 21.3s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_19.json

Video 20/74: 0030.mp4
Video info: 4791 frames, 30.00 fps, 159.7s
Ground truth feedbacks: 29
GT timestamp range: 3.48s to 168.96s


Processing: 100%|██████████| 4791/4791 [00:21<00:00, 217.79it/s, feedbacks=10]



Processed 4791 frames in 22.0s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_20.json

Video 21/74: 0031.mp4
Video info: 4863 frames, 30.00 fps, 162.1s
Ground truth feedbacks: 33
GT timestamp range: 3.18s to 185.22s


Processing: 100%|██████████| 4863/4863 [00:21<00:00, 221.58it/s, feedbacks=10]



Processed 4863 frames in 22.0s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_21.json

Video 22/74: 0032.mp4
Video info: 4733 frames, 30.00 fps, 157.8s
Ground truth feedbacks: 35
GT timestamp range: 5.08s to 175.64s


Processing: 100%|██████████| 4733/4733 [00:18<00:00, 250.43it/s, feedbacks=10]



Processed 4733 frames in 18.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_22.json

Video 23/74: 0033.mp4
Video info: 4751 frames, 30.00 fps, 158.4s
Ground truth feedbacks: 35
GT timestamp range: 4.58s to 177.58s


Processing: 100%|██████████| 4751/4751 [00:19<00:00, 242.14it/s, feedbacks=10]



Processed 4751 frames in 19.6s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_23.json

Video 24/74: 0039.mp4
Video info: 4748 frames, 30.00 fps, 158.3s
Ground truth feedbacks: 22
GT timestamp range: 38.39s to 181.83s


Processing: 100%|██████████| 4748/4748 [00:21<00:00, 223.58it/s, feedbacks=10]



Processed 4748 frames in 21.2s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_24.json

Video 25/74: 0040.mp4
Video info: 4757 frames, 30.00 fps, 158.6s
Ground truth feedbacks: 25
GT timestamp range: 4.28s to 176.00s


Processing: 100%|██████████| 4757/4757 [00:21<00:00, 217.92it/s, feedbacks=10]



Processed 4757 frames in 21.8s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_25.json

Video 26/74: 0041.mp4
Video info: 4836 frames, 30.00 fps, 161.2s
Ground truth feedbacks: 23
GT timestamp range: 3.48s to 180.66s


Processing: 100%|██████████| 4836/4836 [00:22<00:00, 219.50it/s, feedbacks=10]



Processed 4836 frames in 22.0s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_26.json

Video 27/74: 0043.mp4
Video info: 4806 frames, 30.00 fps, 160.2s
Ground truth feedbacks: 24
GT timestamp range: 4.28s to 149.26s


Processing: 100%|██████████| 4806/4806 [00:20<00:00, 231.25it/s, feedbacks=10]



Processed 4806 frames in 20.8s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_27.json

Video 28/74: 0045.mp4
Video info: 4827 frames, 30.00 fps, 160.9s
Ground truth feedbacks: 19
GT timestamp range: 3.08s to 181.38s


Processing: 100%|██████████| 4827/4827 [00:19<00:00, 251.12it/s, feedbacks=10]



Processed 4827 frames in 19.2s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_28.json

Video 29/74: 0048.mp4
Video info: 4643 frames, 30.00 fps, 154.8s
Ground truth feedbacks: 26
GT timestamp range: 3.48s to 172.52s


Processing: 100%|██████████| 4643/4643 [00:19<00:00, 233.93it/s, feedbacks=10]



Processed 4643 frames in 19.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_29.json

Video 30/74: 0049.mp4
Video info: 4702 frames, 30.00 fps, 156.7s
Ground truth feedbacks: 27
GT timestamp range: 3.58s to 177.81s


Processing: 100%|██████████| 4702/4702 [00:22<00:00, 212.65it/s, feedbacks=10]



Processed 4702 frames in 22.1s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_30.json

Video 31/74: 0050.mp4
Video info: 4810 frames, 30.00 fps, 160.3s
Ground truth feedbacks: 24
GT timestamp range: 4.38s to 177.65s


Processing: 100%|██████████| 4810/4810 [00:20<00:00, 237.60it/s, feedbacks=10]



Processed 4810 frames in 20.2s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_31.json

Video 32/74: 0051.mp4
Video info: 4661 frames, 30.00 fps, 155.4s
Ground truth feedbacks: 23
GT timestamp range: 2.68s to 170.48s


Processing: 100%|██████████| 4661/4661 [00:21<00:00, 214.85it/s, feedbacks=10]



Processed 4661 frames in 21.7s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_32.json

Video 33/74: 0052.mp4
Video info: 4822 frames, 30.00 fps, 160.7s
Ground truth feedbacks: 18
GT timestamp range: 2.28s to 181.31s


Processing: 100%|██████████| 4822/4822 [00:21<00:00, 221.53it/s, feedbacks=10]



Processed 4822 frames in 21.8s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_33.json

Video 34/74: 0053.mp4
Video info: 4730 frames, 30.00 fps, 157.7s
Ground truth feedbacks: 17
GT timestamp range: 4.48s to 160.27s


Processing: 100%|██████████| 4730/4730 [00:20<00:00, 233.43it/s, feedbacks=10]



Processed 4730 frames in 20.3s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_34.json

Video 35/74: 0054.mp4
Video info: 4772 frames, 30.00 fps, 159.1s
Ground truth feedbacks: 27
GT timestamp range: 3.38s to 179.79s


Processing: 100%|██████████| 4772/4772 [00:16<00:00, 292.15it/s, feedbacks=10]



Processed 4772 frames in 16.3s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_35.json

Video 36/74: 0055.mp4
Video info: 4836 frames, 30.00 fps, 161.2s
Ground truth feedbacks: 24
GT timestamp range: 2.59s to 178.10s


Processing: 100%|██████████| 4836/4836 [00:22<00:00, 216.41it/s, feedbacks=10]



Processed 4836 frames in 22.4s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_36.json

Video 37/74: 0056.mp4
Video info: 4757 frames, 30.00 fps, 158.6s
Ground truth feedbacks: 24
GT timestamp range: 2.98s to 185.20s


Processing: 100%|██████████| 4757/4757 [00:22<00:00, 213.65it/s, feedbacks=10]



Processed 4757 frames in 22.3s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_37.json

Video 38/74: 0073.mp4
Video info: 4899 frames, 30.00 fps, 163.3s
Ground truth feedbacks: 16
GT timestamp range: 65.62s to 154.41s


Processing: 100%|██████████| 4899/4899 [00:20<00:00, 242.89it/s, feedbacks=10]



Processed 4899 frames in 20.2s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_38.json

Video 39/74: 0074.mp4
Video info: 4804 frames, 30.00 fps, 160.1s
Ground truth feedbacks: 30
GT timestamp range: 1.18s to 185.82s


Processing: 100%|██████████| 4804/4804 [00:17<00:00, 268.45it/s, feedbacks=10]



Processed 4804 frames in 17.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_39.json

Video 40/74: 0077.mp4
Video info: 4800 frames, 30.00 fps, 160.0s
Ground truth feedbacks: 24
GT timestamp range: 2.28s to 187.70s


Processing: 100%|██████████| 4800/4800 [00:19<00:00, 242.80it/s, feedbacks=10]



Processed 4800 frames in 19.8s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_40.json

Video 41/74: 0078.mp4
Video info: 4808 frames, 30.00 fps, 160.3s
Ground truth feedbacks: 21
GT timestamp range: 3.29s to 151.84s


Processing: 100%|██████████| 4808/4808 [00:21<00:00, 220.24it/s, feedbacks=10]



Processed 4808 frames in 21.8s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_41.json

Video 42/74: 0079.mp4
Video info: 4751 frames, 30.00 fps, 158.4s
Ground truth feedbacks: 19
GT timestamp range: 2.58s to 153.64s


Processing: 100%|██████████| 4751/4751 [00:23<00:00, 205.61it/s, feedbacks=10]



Processed 4751 frames in 23.1s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_42.json

Video 43/74: 0087.mp4
Video info: 4742 frames, 30.00 fps, 158.1s
Ground truth feedbacks: 20
GT timestamp range: 2.38s to 121.31s


Processing: 100%|██████████| 4742/4742 [00:20<00:00, 230.13it/s, feedbacks=10]



Processed 4742 frames in 20.6s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_43.json

Video 44/74: 0088.mp4
Video info: 4687 frames, 30.00 fps, 156.2s
Ground truth feedbacks: 14
GT timestamp range: 1.98s to 121.76s


Processing: 100%|██████████| 4687/4687 [00:22<00:00, 208.82it/s, feedbacks=10]



Processed 4687 frames in 22.4s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_44.json

Video 45/74: 0089.mp4
Video info: 4809 frames, 30.00 fps, 160.3s
Ground truth feedbacks: 0
GT timestamp range: N/A (no non-transition feedbacks)


Processing: 100%|██████████| 4809/4809 [00:21<00:00, 219.17it/s, feedbacks=10]



Processed 4809 frames in 21.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_45.json

Video 46/74: 0145.mp4
Video info: 4612 frames, 30.00 fps, 153.7s
Ground truth feedbacks: 34
GT timestamp range: 1.28s to 183.81s


Processing: 100%|██████████| 4612/4612 [00:14<00:00, 321.21it/s, feedbacks=10]



Processed 4612 frames in 14.4s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_46.json

Video 47/74: 0146.mp4
Video info: 4619 frames, 30.00 fps, 154.0s
Ground truth feedbacks: 29
GT timestamp range: 2.08s to 169.21s


Processing: 100%|██████████| 4619/4619 [00:12<00:00, 381.09it/s, feedbacks=10]



Processed 4619 frames in 12.1s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_47.json

Video 48/74: 0148.mp4
Video info: 4894 frames, 30.00 fps, 163.1s
Ground truth feedbacks: 38
GT timestamp range: 2.08s to 187.52s


Processing: 100%|██████████| 4894/4894 [00:13<00:00, 371.62it/s, feedbacks=9]



Processed 4894 frames in 13.2s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_48.json

Video 49/74: 0149.mp4
Video info: 4631 frames, 30.00 fps, 154.4s
Ground truth feedbacks: 38
GT timestamp range: 2.28s to 177.18s


Processing: 100%|██████████| 4631/4631 [00:10<00:00, 440.58it/s, feedbacks=6]



Processed 4631 frames in 10.5s
Generated 6 predictions (expected ~10)
Pred timestamp range: 15.00s to 120.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_49.json

Video 50/74: 0154.mp4
Video info: 4765 frames, 30.00 fps, 158.8s
Ground truth feedbacks: 32
GT timestamp range: 2.49s to 181.45s


Processing: 100%|██████████| 4765/4765 [00:14<00:00, 328.90it/s, feedbacks=10]



Processed 4765 frames in 14.5s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_50.json

Video 51/74: 0155.mp4
Video info: 4754 frames, 30.00 fps, 158.5s
Ground truth feedbacks: 29
GT timestamp range: 3.18s to 181.91s


Processing: 100%|██████████| 4754/4754 [00:10<00:00, 434.45it/s, feedbacks=10]



Processed 4754 frames in 10.9s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_51.json

Video 52/74: 0158.mp4
Video info: 4684 frames, 30.00 fps, 156.1s
Ground truth feedbacks: 27
GT timestamp range: 2.98s to 184.02s


Processing: 100%|██████████| 4684/4684 [00:13<00:00, 337.77it/s, feedbacks=9]



Processed 4684 frames in 13.9s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 135.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_52.json

Video 53/74: 0165.mp4
Video info: 4702 frames, 30.00 fps, 156.7s
Ground truth feedbacks: 37
GT timestamp range: 2.59s to 183.45s


Processing: 100%|██████████| 4702/4702 [00:14<00:00, 324.32it/s, feedbacks=9]



Processed 4702 frames in 14.5s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_53.json

Video 54/74: 0166.mp4
Video info: 4542 frames, 30.00 fps, 151.4s
Ground truth feedbacks: 27
GT timestamp range: 1.88s to 168.31s


Processing: 100%|██████████| 4542/4542 [00:11<00:00, 387.88it/s, feedbacks=9]



Processed 4542 frames in 11.7s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_54.json

Video 55/74: 0167.mp4
Video info: 4659 frames, 30.00 fps, 155.3s
Ground truth feedbacks: 28
GT timestamp range: 1.98s to 176.24s


Processing: 100%|██████████| 4659/4659 [00:11<00:00, 405.66it/s, feedbacks=9]



Processed 4659 frames in 11.5s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_55.json

Video 56/74: 0168.mp4
Video info: 4597 frames, 30.00 fps, 153.2s
Ground truth feedbacks: 30
GT timestamp range: 1.98s to 181.15s


Processing: 100%|██████████| 4597/4597 [00:10<00:00, 444.42it/s, feedbacks=9]



Processed 4597 frames in 10.3s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_56.json

Video 57/74: 0169.mp4
Video info: 4835 frames, 30.00 fps, 161.2s
Ground truth feedbacks: 31
GT timestamp range: 4.98s to 193.32s


Processing: 100%|██████████| 4835/4835 [00:09<00:00, 523.31it/s, feedbacks=9]



Processed 4835 frames in 9.2s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_57.json

Video 58/74: 0170.mp4
Video info: 4517 frames, 30.00 fps, 150.6s
Ground truth feedbacks: 33
GT timestamp range: 1.98s to 180.27s


Processing: 100%|██████████| 4517/4517 [00:11<00:00, 406.09it/s, feedbacks=9]



Processed 4517 frames in 11.1s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_58.json

Video 59/74: 0171.mp4
Video info: 4667 frames, 30.00 fps, 155.6s
Ground truth feedbacks: 34
GT timestamp range: 2.88s to 186.31s


Processing: 100%|██████████| 4667/4667 [00:09<00:00, 506.43it/s, feedbacks=7]



Processed 4667 frames in 9.2s
Generated 7 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_59.json

Video 60/74: 0179.mp4
Video info: 4793 frames, 30.00 fps, 159.8s
Ground truth feedbacks: 39
GT timestamp range: 2.78s to 183.61s


Processing: 100%|██████████| 4793/4793 [00:11<00:00, 430.51it/s, feedbacks=10]



Processed 4793 frames in 11.1s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_60.json

Video 61/74: 0180.mp4
Video info: 4591 frames, 30.00 fps, 153.0s
Ground truth feedbacks: 34
GT timestamp range: 2.29s to 177.87s


Processing: 100%|██████████| 4591/4591 [00:12<00:00, 372.87it/s, feedbacks=10]



Processed 4591 frames in 12.3s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_61.json

Video 62/74: 0181.mp4
Video info: 4625 frames, 30.00 fps, 154.2s
Ground truth feedbacks: 31
GT timestamp range: 1.48s to 175.68s


Processing: 100%|██████████| 4625/4625 [00:10<00:00, 431.03it/s, feedbacks=9]



Processed 4625 frames in 10.7s
Generated 9 predictions (expected ~10)
Pred timestamp range: 30.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_62.json

Video 63/74: 0185.mp4
Video info: 4595 frames, 30.00 fps, 153.2s
Ground truth feedbacks: 30
GT timestamp range: 2.18s to 176.73s


Processing: 100%|██████████| 4595/4595 [00:10<00:00, 451.15it/s, feedbacks=6]



Processed 4595 frames in 10.2s
Generated 6 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_63.json

Video 64/74: 0192.mp4
Video info: 4669 frames, 30.00 fps, 155.6s
Ground truth feedbacks: 30
GT timestamp range: 3.78s to 184.65s


Processing: 100%|██████████| 4669/4669 [00:12<00:00, 359.29it/s, feedbacks=9]



Processed 4669 frames in 13.0s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_64.json

Video 65/74: 0193.mp4
Video info: 4704 frames, 30.00 fps, 156.8s
Ground truth feedbacks: 26
GT timestamp range: 2.88s to 174.99s


Processing: 100%|██████████| 4704/4704 [00:10<00:00, 464.04it/s, feedbacks=9]



Processed 4704 frames in 10.1s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_65.json

Video 66/74: 0194.mp4
Video info: 4589 frames, 30.00 fps, 153.0s
Ground truth feedbacks: 26
GT timestamp range: 3.48s to 182.68s


Processing: 100%|██████████| 4589/4589 [00:15<00:00, 303.34it/s, feedbacks=9]



Processed 4589 frames in 15.1s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_66.json

Video 67/74: 0195.mp4
Video info: 4519 frames, 30.00 fps, 150.6s
Ground truth feedbacks: 31
GT timestamp range: 1.28s to 180.62s


Processing: 100%|██████████| 4519/4519 [00:10<00:00, 418.19it/s, feedbacks=9]



Processed 4519 frames in 10.8s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_67.json

Video 68/74: 0196.mp4
Video info: 4519 frames, 30.00 fps, 150.6s
Ground truth feedbacks: 31
GT timestamp range: 3.78s to 181.12s


Processing: 100%|██████████| 4519/4519 [00:11<00:00, 406.25it/s, feedbacks=9]



Processed 4519 frames in 11.1s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_68.json

Video 69/74: 0197.mp4
Video info: 4819 frames, 30.00 fps, 160.6s
Ground truth feedbacks: 34
GT timestamp range: 2.18s to 188.29s


Processing: 100%|██████████| 4819/4819 [00:12<00:00, 374.47it/s, feedbacks=9]



Processed 4819 frames in 12.9s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_69.json

Video 70/74: 0204.mp4
Video info: 4625 frames, 30.00 fps, 154.2s
Ground truth feedbacks: 29
GT timestamp range: 2.09s to 172.89s


Processing: 100%|██████████| 4625/4625 [00:11<00:00, 419.18it/s, feedbacks=9]



Processed 4625 frames in 11.0s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_70.json

Video 71/74: 0205.mp4
Video info: 4656 frames, 30.00 fps, 155.2s
Ground truth feedbacks: 32
GT timestamp range: 3.88s to 177.93s


Processing: 100%|██████████| 4656/4656 [00:11<00:00, 404.34it/s, feedbacks=9]



Processed 4656 frames in 11.5s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_71.json

Video 72/74: 0209.mp4
Video info: 4670 frames, 30.00 fps, 155.7s
Ground truth feedbacks: 31
GT timestamp range: 2.69s to 180.27s


Processing: 100%|██████████| 4670/4670 [00:11<00:00, 393.32it/s, feedbacks=9]



Processed 4670 frames in 11.9s
Generated 9 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_72.json

Video 73/74: 0212.mp4
Video info: 4569 frames, 30.00 fps, 152.3s
Ground truth feedbacks: 32
GT timestamp range: 4.98s to 180.49s


Processing: 100%|██████████| 4569/4569 [00:12<00:00, 356.32it/s, feedbacks=8]



Processed 4569 frames in 12.8s
Generated 8 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_73.json

Video 74/74: 0214.mp4
Video info: 4742 frames, 30.00 fps, 158.1s
Ground truth feedbacks: 29
GT timestamp range: 2.38s to 181.16s


Processing: 100%|██████████| 4742/4742 [00:15<00:00, 299.86it/s, feedbacks=10]


Processed 4742 frames in 15.8s
Generated 10 predictions (expected ~10)
Pred timestamp range: 15.00s to 150.00s
Checkpoint saved: /content/drive/Shareddrives/CIS6800 final project/checkpoint_video_74.json

Video processing complete!





## Step 8: Compute Evaluation Metrics

In [None]:
import evaluate
from bert_score import score as bert_score_fn
import numpy as np

print("Loading evaluation metrics...\n")
meteor = evaluate.load('meteor')
rouge = evaluate.load('rouge')

def temporal_match(gt_feedbacks, pred_feedbacks, gt_timestamps, pred_timestamps, tolerance=3.0):
    """
    Match predictions to ground truth within temporal window.
    Based on InteractiveFeedbackEvaluator._get_temporally_aligned_feedbacks

    Returns (matched_pairs, num_matched_gt, num_matched_pred)
    """
    gt_timestamps = np.array(gt_timestamps)
    pred_timestamps = np.array(pred_timestamps)

    matched_feedbacks = []
    matching_row_idxs = []
    matching_col_idxs = []

    if len(pred_timestamps) > 0:
        # For each GT timestamp, find closest prediction within tolerance
        last_match_idx = -1
        for idx_gt, ts_gt in enumerate(gt_timestamps):
            min_idx = np.argmin((pred_timestamps - ts_gt) ** 2)
            if (
                np.abs(ts_gt - pred_timestamps[min_idx]) < (tolerance / 2.0)
                and min_idx > last_match_idx
                and min_idx not in matching_col_idxs
                and pred_feedbacks[min_idx] != ""
            ):
                matching_row_idxs.append(idx_gt)
                matching_col_idxs.append(min_idx)
                last_match_idx = min_idx

    # Build matched pairs
    for match_idx, match_jdx in zip(matching_row_idxs, matching_col_idxs):
        matched_feedbacks.append((gt_feedbacks[match_idx], pred_feedbacks[match_jdx]))

    return matched_feedbacks, len(matching_row_idxs), len(matching_col_idxs)

# Compute metrics for all videos
print("Computing metrics...\n")

all_matched_pairs = []
meteor_scores = []
rouge_scores = []
bert_scores = []

# Temporal F-score running stats (like the original code)
t_f_score_stats = {
    'total_num_gt_feedbacks': 0,
    'total_num_pred_feedbacks': 0,
    'total_matched_gt_feedbacks': 0,
    'total_matched_pred_feedbacks': 0
}

for result in all_results:
    video_file = result['video_file']
    predictions = result['predictions']
    pred_timestamps = result['pred_timestamps']
    gt_feedbacks = result['ground_truth']['feedbacks']
    gt_timestamps = result['ground_truth']['feedback_timestamps']

    # Temporal matching (for recall: GT -> Pred)
    matched_feedbacks, num_matched_gt, _ = temporal_match(
        gt_feedbacks, predictions, gt_timestamps, pred_timestamps, tolerance=3.0
    )

    # Temporal matching (for precision: Pred -> GT)
    _, num_matched_pred, _ = temporal_match(
        predictions, gt_feedbacks, pred_timestamps, gt_timestamps, tolerance=3.0
    )

    # Update running stats
    t_f_score_stats['total_matched_gt_feedbacks'] += num_matched_gt
    t_f_score_stats['total_matched_pred_feedbacks'] += num_matched_pred
    t_f_score_stats['total_num_gt_feedbacks'] += len(gt_feedbacks)
    t_f_score_stats['total_num_pred_feedbacks'] += len(predictions)

    # Compute text quality metrics on matched pairs
    for gt_fb, pred_fb in matched_feedbacks:
        # METEOR
        meteor_scores.append(
            meteor.compute(references=[gt_fb], predictions=[pred_fb])['meteor']
        )
        # ROUGE-L
        rouge_scores.append(
            rouge.compute(references=[gt_fb], predictions=[pred_fb])['rougeL']
        )

    all_matched_pairs.extend([
        {'video': video_file, 'gt': gt_fb, 'pred': pred_fb}
        for gt_fb, pred_fb in matched_feedbacks
    ])

    print(f"{os.path.basename(video_file)}: Matched={num_matched_gt}, GT={len(gt_feedbacks)}, Pred={len(predictions)}")

# Compute BERT scores in batch (more efficient)
if len(all_matched_pairs) > 0:
    bert_results = bert_score_fn(
        [p['pred'] for p in all_matched_pairs],
        [p['gt'] for p in all_matched_pairs],
        lang='en'
    )
    bert_scores = bert_results[2].tolist()  # F1 scores

# Aggregate metrics
print("\n" + "="*60)
print("EVALUATION RESULTS")
print("="*60)

# Temporal F-Score
eps = 1e-12
precision = t_f_score_stats['total_matched_pred_feedbacks'] / (
    t_f_score_stats['total_num_pred_feedbacks'] + eps
)
recall = t_f_score_stats['total_matched_gt_feedbacks'] / (
    t_f_score_stats['total_num_gt_feedbacks'] + eps
)
f1_score = 2 * ((precision * recall) / (precision + recall + eps))

print("\nTemporal Metrics:")
print(f"  Precision: {precision:.4f}")
print(f"  Recall: {recall:.4f}")
print(f"  F1-Score: {f1_score:.4f}")
print(f"  Matched GT feedbacks: {t_f_score_stats['total_matched_gt_feedbacks']}")
print(f"  Matched Pred feedbacks: {t_f_score_stats['total_matched_pred_feedbacks']}")
print(f"  Total GT feedbacks: {t_f_score_stats['total_num_gt_feedbacks']}")
print(f"  Total Pred feedbacks: {t_f_score_stats['total_num_pred_feedbacks']}")

# Text Quality Metrics
mean = lambda x: sum(x) / (len(x) + eps)
print("\nText Quality Metrics (on matched pairs):")
if len(meteor_scores) > 0:
    print(f"  METEOR: {mean(meteor_scores):.4f}")
    print(f"  ROUGE-L: {mean(rouge_scores):.4f}")
    print(f"  BERT Score: {mean(bert_scores):.4f}")
    print(f"  Number of matched pairs: {len(meteor_scores)}")
else:
    print("  No matched pairs found")

# System metrics
total_predictions = sum(len(r['predictions']) for r in all_results)
total_oom = sum(r['stats']['oom_events'] for r in all_results)
total_time = sum(r['stats']['processing_time'] for r in all_results)

print("\nSystem Metrics:")
print(f"  Total predictions: {total_predictions}")
print(f"  GPU OOM events: {total_oom}")
print(f"  Total processing time: {total_time:.1f}s")
print(f"  Avg time per video: {total_time/len(all_results):.1f}s")

print("\n" + "="*60)

Loading evaluation metrics...



Downloading builder script: 0.00B [00:00, ?B/s]

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


Downloading builder script: 0.00B [00:00, ?B/s]

Computing metrics...

0006.mp4: Matched=4, GT=28, Pred=10
0009.mp4: Matched=4, GT=34, Pred=10
0010.mp4: Matched=5, GT=30, Pred=10
0011.mp4: Matched=4, GT=27, Pred=10
0012.mp4: Matched=1, GT=31, Pred=10
0013.mp4: Matched=5, GT=22, Pred=10
0014.mp4: Matched=5, GT=24, Pred=10
0015.mp4: Matched=7, GT=31, Pred=10
0016.mp4: Matched=2, GT=26, Pred=10
0017.mp4: Matched=3, GT=16, Pred=10
0018.mp4: Matched=4, GT=31, Pred=10
0019.mp4: Matched=3, GT=23, Pred=10
0023.mp4: Matched=5, GT=25, Pred=10
0024.mp4: Matched=2, GT=13, Pred=10
0025.mp4: Matched=3, GT=14, Pred=10
0026.mp4: Matched=4, GT=34, Pred=10
0027.mp4: Matched=3, GT=24, Pred=10
0028.mp4: Matched=3, GT=17, Pred=10
0029.mp4: Matched=5, GT=32, Pred=10
0030.mp4: Matched=5, GT=29, Pred=10
0031.mp4: Matched=5, GT=33, Pred=10
0032.mp4: Matched=4, GT=35, Pred=10
0033.mp4: Matched=5, GT=35, Pred=10
0039.mp4: Matched=4, GT=22, Pred=10
0040.mp4: Matched=4, GT=25, Pred=10
0041.mp4: Matched=5, GT=23, Pred=10
0043.mp4: Matched=7, GT=24, Pred=10
0045.m

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



EVALUATION RESULTS

Temporal Metrics:
  Precision: 0.3879
  Recall: 0.1380
  F1-Score: 0.2036
  Matched GT feedbacks: 275
  Matched Pred feedbacks: 275
  Total GT feedbacks: 1993
  Total Pred feedbacks: 709

Text Quality Metrics (on matched pairs):
  METEOR: 0.1119
  ROUGE-L: 0.0879
  BERT Score: 0.8618
  Number of matched pairs: 275

System Metrics:
  Total predictions: 709
  GPU OOM events: 0
  Total processing time: 1236.4s
  Avg time per video: 16.7s



## Step 9: Save Results

In [None]:
import json
from datetime import datetime

# Prepare final results
final_results = {
    'evaluation_config': {
        'num_videos': NUM_EVAL_VIDEOS,
        'feedback_interval': feedback_interval,
        'feature_frequency': feats_frequency,
        'temporal_window': 3.0,
        'timestamp': datetime.now().isoformat()
    },
    'metrics': {
        'temporal': {
            'precision': float(precision),
            'recall': float(recall),
            'f1_score': float(f1_score),
            'matched_gt_feedbacks': int(t_f_score_stats['total_matched_gt_feedbacks']),
            'matched_pred_feedbacks': int(t_f_score_stats['total_matched_pred_feedbacks']),
            'total_gt_feedbacks': int(t_f_score_stats['total_num_gt_feedbacks']),
            'total_pred_feedbacks': int(t_f_score_stats['total_num_pred_feedbacks'])
        },
        'text_quality': {
            'meteor_mean': float(mean(meteor_scores)) if len(meteor_scores) > 0 else None,
            'rouge_l_mean': float(mean(rouge_scores)) if len(rouge_scores) > 0 else None,
            'bert_score_mean': float(mean(bert_scores)) if len(bert_scores) > 0 else None,
            'num_matched_pairs': len(meteor_scores)
        },
        'system': {
            'total_predictions': int(total_predictions),
            'oom_events': int(total_oom),
            'total_processing_time': float(total_time),
            'avg_time_per_video': float(total_time / len(all_results)) if all_results else 0
        }
    },
    'per_video_results': all_results,
    'matched_pairs': all_matched_pairs
}

# Save to Drive
results_file = os.path.join(DRIVE_ROOT, "evaluation_results.json")
with open(results_file, 'w') as f:
    json.dump(final_results, f, indent=2)

print(f"Results saved to: {results_file}")
print("\nEvaluation complete!")

Results saved to: /content/drive/Shareddrives/CIS6800 final project/evaluation_results.json

Evaluation complete!


## Step 10: Print Final Results Summary

In [None]:
import json

print("\n" + "="*80)
print(" "*25 + "FINAL EVALUATION RESULTS")
print("="*80)

print("\nEVALUATION CONFIGURATION")
print("-" * 80)
print(f"Number of videos evaluated: {final_results['evaluation_config']['num_videos']}")
print(f"Feedback interval: {final_results['evaluation_config']['feedback_interval']}s")
print(f"Feature extraction frequency: {final_results['evaluation_config']['feature_frequency']} fps")
print(f"Temporal matching window: {final_results['evaluation_config']['temporal_window']}s")
print(f"Evaluation timestamp: {final_results['evaluation_config']['timestamp']}")

print("\nTEMPORAL METRICS")
print("-" * 80)
temporal = final_results['metrics']['temporal']
print(f"Precision:        {temporal['precision']:.4f}")
print(f"Recall:           {temporal['recall']:.4f}")
print(f"F1-Score:         {temporal['f1_score']:.4f}")
print(f"\nMatched GT feedbacks:   {temporal['matched_gt_feedbacks']}")
print(f"Matched Pred feedbacks: {temporal['matched_pred_feedbacks']}")
print(f"Total GT feedbacks:     {temporal['total_gt_feedbacks']}")
print(f"Total Pred feedbacks:   {temporal['total_pred_feedbacks']}")

print("\nTEXT QUALITY METRICS (on matched pairs)")
print("-" * 80)
text_quality = final_results['metrics']['text_quality']
if text_quality['meteor_mean'] is not None:
    print(f"METEOR:           {text_quality['meteor_mean']:.4f}")
    print(f"ROUGE-L:          {text_quality['rouge_l_mean']:.4f}")
    print(f"BERT Score:       {text_quality['bert_score_mean']:.4f}")
    print(f"Matched pairs:    {text_quality['num_matched_pairs']}")
else:
    print("No matched pairs found for text quality evaluation")

print("\nSYSTEM METRICS")
print("-" * 80)
system = final_results['metrics']['system']
print(f"Total predictions:        {system['total_predictions']}")
print(f"GPU OOM events:           {system['oom_events']}")
print(f"Total processing time:    {system['total_processing_time']:.1f}s ({system['total_processing_time']/60:.1f} min)")
print(f"Avg time per video:       {system['avg_time_per_video']:.1f}s")

print("\nPER-VIDEO BREAKDOWN")
print("-" * 80)
print(f"{'Video':<15} {'Predictions':<15} {'GT Feedbacks':<15} {'Processing Time'}")
print("-" * 80)
for result in final_results['per_video_results']:
    video_name = os.path.basename(result['video_file'])
    num_preds = len(result['predictions'])
    num_gt = len(result['ground_truth']['feedbacks'])
    proc_time = result['stats']['processing_time']
    print(f"{video_name:<15} {num_preds:<15} {num_gt:<15} {proc_time:.1f}s")

print("\n" + "="*80)
print(f"Results saved to: {results_file}")
print("="*80)

# Pretty print the JSON metrics
print("\nJSON METRICS:")
print(json.dumps(final_results['metrics'], indent=2))


                         FINAL EVALUATION RESULTS

EVALUATION CONFIGURATION
--------------------------------------------------------------------------------
Number of videos evaluated: 74
Feedback interval: 15.0s
Feature extraction frequency: 2 fps
Temporal matching window: 3.0s
Evaluation timestamp: 2025-12-19T18:24:29.955194

TEMPORAL METRICS
--------------------------------------------------------------------------------
Precision:        0.3879
Recall:           0.1380
F1-Score:         0.2036

Matched GT feedbacks:   275
Matched Pred feedbacks: 275
Total GT feedbacks:     1993
Total Pred feedbacks:   709

TEXT QUALITY METRICS (on matched pairs)
--------------------------------------------------------------------------------
METEOR:           0.1119
ROUGE-L:          0.0879
BERT Score:       0.8618
Matched pairs:    275

SYSTEM METRICS
--------------------------------------------------------------------------------
Total predictions:        709
GPU OOM events:           0
Total pr