# BeneAI Performance Testing Notebook

This notebook provides a comprehensive framework for testing BeneAI's end-to-end pipeline performance, including:
- **Latency**: Time from frame capture to emotion detection
- **Throughput**: Frames processed per second
- **Accuracy**: Precision, recall, F1 score compared to ground truth annotations

## Workflow
1. Configure test parameters (frame rates, resolutions, MediaPipe settings)
2. Load ground truth annotations
3. Run parameter sweep tests
4. Calculate metrics and visualize results
5. Export results and recommendations

## Prerequisites
- BeneAI backend running at configured URL (default: ws://localhost:8000/ws)
- Test video file in `videos/` directory
- Ground truth annotations JSON file (use `ground_truth_template.json` as reference)

---
## 1. Setup & Configuration

In [None]:
# Import required libraries
import sys
import os
import json
import asyncio
from datetime import datetime
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm
import nest_asyncio

# Enable nested event loops for Jupyter
nest_asyncio.apply()

# Import custom utilities
from test_utils import (
    TestConfig, TestResult, VideoProcessor, BeneAIClient,
    GroundTruth, MetricsCalculator, run_single_test, aggregate_results
)

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ All imports successful")

In [None]:
# Configuration
CONFIG = {
    # Paths
    'video_path': 'videos/test_video.mp4',  # Update with your video filename
    'ground_truth_path': 'videos/test_video_annotations.json',  # Update with your annotations
    'results_dir': 'results/',
    
    # Backend
    'backend_url': 'ws://localhost:8000/ws',
    
    # Test timestamp
    'test_run_id': datetime.now().strftime('%Y%m%d_%H%M%S')
}

# Ensure paths exist
Path(CONFIG['results_dir']).mkdir(parents=True, exist_ok=True)

print("Configuration:")
for key, value in CONFIG.items():
    print(f"  {key}: {value}")

In [None]:
# Define parameter grid for testing
PARAM_GRID = {
    'frame_rates': [1, 2, 3, 5, 10],  # FPS to test
    'resolutions': [
        (320, 240),   # Low
        (480, 360),   # Medium (current default)
        (640, 480)    # High
    ],
    'jpeg_qualities': [60],  # Keep constant (0.6 * 100)
    'mediapipe_complexities': [0, 1, 2],  # 0=fast, 1=medium, 2=accurate
    'mediapipe_confidences': [0.3, 0.5, 0.7]  # Detection confidence thresholds
}

# Generate all test configurations
def generate_test_configs(param_grid):
    """Generate all combinations of test parameters"""
    configs = []
    
    for fps in param_grid['frame_rates']:
        for res in param_grid['resolutions']:
            for quality in param_grid['jpeg_qualities']:
                for complexity in param_grid['mediapipe_complexities']:
                    for confidence in param_grid['mediapipe_confidences']:
                        config = TestConfig(
                            frame_rate=fps,
                            resolution=res,
                            jpeg_quality=quality,
                            mediapipe_complexity=complexity,
                            mediapipe_confidence=confidence
                        )
                        configs.append(config)
    
    return configs

TEST_CONFIGS = generate_test_configs(PARAM_GRID)

print(f"Generated {len(TEST_CONFIGS)} test configurations")
print(f"\nExample config: {TEST_CONFIGS[0].config_id()}")
print(f"\nEstimated time (assuming 30s video, ~500ms per frame):")
print(f"  ~{len(TEST_CONFIGS) * 2:.0f}-{len(TEST_CONFIGS) * 5:.0f} minutes total")

---
## 2. Ground Truth Management

In [None]:
# Load ground truth annotations
try:
    ground_truth = GroundTruth(CONFIG['ground_truth_path'])
    print(f"✓ Loaded {len(ground_truth.annotations)} annotations")
    print(f"  Video: {ground_truth.video_name}")
    print(f"  FPS: {ground_truth.fps}")
    print(f"  Unique emotions: {len(ground_truth.unique_emotions)}")
    print(f"  Unique states: {len(ground_truth.unique_states)}")
    
    # Display annotations DataFrame
    annotations_df = ground_truth.to_dataframe()
    print(f"\nAnnotations preview:")
    display(annotations_df.head(10))
    
except FileNotFoundError:
    print(f"⚠ Ground truth file not found: {CONFIG['ground_truth_path']}")
    print("  Tests will run without accuracy metrics")
    print("  Use ground_truth_template.json to create annotations")
    ground_truth = None
except Exception as e:
    print(f"❌ Error loading ground truth: {e}")
    ground_truth = None

In [None]:
# Visualize ground truth timeline
if ground_truth:
    fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
    
    # Emotion timeline
    ax1 = axes[0]
    emotion_colors = {emotion: f"C{i}" for i, emotion in enumerate(ground_truth.unique_emotions)}
    
    for annotation in ground_truth.annotations:
        ax1.axvline(annotation['timestamp'], color=emotion_colors[annotation['emotion']], 
                    alpha=0.6, linewidth=2, label=annotation['emotion'])
    
    # Remove duplicate labels
    handles, labels = ax1.get_legend_handles_labels()
    by_label = dict(zip(labels, handles))
    ax1.legend(by_label.values(), by_label.keys(), loc='upper right')
    
    ax1.set_ylabel('Emotions')
    ax1.set_title('Ground Truth Emotion Timeline')
    ax1.grid(True, alpha=0.3)
    
    # Investor state timeline
    ax2 = axes[1]
    state_colors = {state: f"C{i}" for i, state in enumerate(ground_truth.unique_states)}
    
    for annotation in ground_truth.annotations:
        ax2.axvline(annotation['timestamp'], color=state_colors[annotation['investor_state']], 
                    alpha=0.6, linewidth=2, label=annotation['investor_state'])
    
    handles, labels = ax2.get_legend_handles_labels()
    by_label = dict(zip(labels, handles))
    ax2.legend(by_label.values(), by_label.keys(), loc='upper right')
    
    ax2.set_xlabel('Time (seconds)')
    ax2.set_ylabel('Investor State')
    ax2.set_title('Ground Truth Investor State Timeline')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Distribution of emotions and states
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Emotion distribution
    emotion_counts = annotations_df['emotion'].value_counts()
    axes[0].bar(emotion_counts.index, emotion_counts.values, color='steelblue')
    axes[0].set_xlabel('Emotion')
    axes[0].set_ylabel('Count')
    axes[0].set_title('Emotion Distribution')
    axes[0].tick_params(axis='x', rotation=45)
    
    # State distribution
    state_counts = annotations_df['investor_state'].value_counts()
    axes[1].bar(state_counts.index, state_counts.values, color='coral')
    axes[1].set_xlabel('Investor State')
    axes[1].set_ylabel('Count')
    axes[1].set_title('Investor State Distribution')
    axes[1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()

---
## 3. Backend Communication Test

In [None]:
# Test backend connectivity
async def test_backend_connection():
    """Verify backend is accessible"""
    try:
        async with BeneAIClient(CONFIG['backend_url']) as client:
            print(f"✓ Successfully connected to {CONFIG['backend_url']}")
            return True
    except Exception as e:
        print(f"❌ Failed to connect to backend: {e}")
        print(f"\nMake sure the backend is running:")
        print(f"  cd backend && python -m uvicorn app.main:app --reload")
        return False

# Run connectivity test
backend_available = await test_backend_connection()

In [None]:
# Quick test with first frame
if backend_available:
    print("Running quick test with first frame...\n")
    
    try:
        # Extract first frame
        with VideoProcessor(CONFIG['video_path']) as vp:
            print(f"Video info:")
            print(f"  FPS: {vp.original_fps:.2f}")
            print(f"  Total frames: {vp.total_frames}")
            print(f"  Duration: {vp.duration:.2f}s\n")
            
            frames = vp.extract_frames(target_fps=1, resolution=(480, 360))
            if frames:
                timestamp, frame = frames[0]
                frame_b64 = vp.encode_frame_jpeg(frame, quality=60)
                
                # Send to backend
                async with BeneAIClient(CONFIG['backend_url']) as client:
                    response, latency_ms = await client.send_frame(frame_b64, timestamp)
                    
                print(f"✓ Test frame processed successfully")
                print(f"  Latency: {latency_ms:.1f}ms")
                print(f"  Response type: {response.get('type')}")
                
                if response.get('type') == 'emotion_update':
                    data = response.get('data', {})
                    print(f"  Emotion: {data.get('primary_emotion')}")
                    print(f"  State: {data.get('investor_state')}")
                    print(f"  Confidence: {data.get('confidence', 0):.2f}")
            else:
                print("❌ No frames extracted from video")
                
    except FileNotFoundError:
        print(f"❌ Video file not found: {CONFIG['video_path']}")
        print(f"   Please add your test video to the videos/ directory")
    except Exception as e:
        print(f"❌ Error during test: {e}")

---
## 4. Test Execution - Parameter Sweep

In [None]:
# Option 1: Run all tests (can take a while)
RUN_ALL_TESTS = False  # Set to True to run all configurations

# Option 2: Run subset of tests for quick iteration
RUN_SUBSET = True
SUBSET_CONFIGS = TEST_CONFIGS[::10]  # Every 10th config

# Option 3: Run specific configs
RUN_SPECIFIC = False
SPECIFIC_INDICES = [0, 10, 20, 30, 40]  # Specific config indices

# Select configs to run
if RUN_ALL_TESTS:
    configs_to_run = TEST_CONFIGS
    print(f"Running ALL {len(configs_to_run)} test configurations")
elif RUN_SUBSET:
    configs_to_run = SUBSET_CONFIGS
    print(f"Running SUBSET of {len(configs_to_run)} test configurations")
elif RUN_SPECIFIC:
    configs_to_run = [TEST_CONFIGS[i] for i in SPECIFIC_INDICES if i < len(TEST_CONFIGS)]
    print(f"Running SPECIFIC {len(configs_to_run)} test configurations")
else:
    configs_to_run = TEST_CONFIGS[:3]  # Just first 3 by default
    print(f"Running first {len(configs_to_run)} test configurations (default)")

print(f"\nConfigurations to test:")
for i, config in enumerate(configs_to_run[:5]):
    print(f"  {i+1}. {config.config_id()}")
if len(configs_to_run) > 5:
    print(f"  ... and {len(configs_to_run) - 5} more")

In [None]:
# Run tests with progress tracking
async def run_all_tests(configs, video_path, backend_url, ground_truth):
    """Run all test configurations"""
    results = []
    
    for i, config in enumerate(tqdm(configs, desc="Running tests")):
        print(f"\n[{i+1}/{len(configs)}] Testing: {config.config_id()}")
        
        try:
            result = await run_single_test(
                video_path=video_path,
                config=config,
                backend_url=backend_url,
                ground_truth=ground_truth,
                progress_callback=lambda current, total: None  # Silent progress
            )
            results.append(result)
            
        except Exception as e:
            print(f"❌ Test failed: {e}")
            continue
    
    return results

# Run tests if backend is available
if backend_available:
    print("\n" + "="*60)
    print("STARTING TEST EXECUTION")
    print("="*60)
    
    test_results = await run_all_tests(
        configs=configs_to_run,
        video_path=CONFIG['video_path'],
        backend_url=CONFIG['backend_url'],
        ground_truth=ground_truth
    )
    
    print("\n" + "="*60)
    print(f"✓ COMPLETED {len(test_results)}/{len(configs_to_run)} TESTS")
    print("="*60)
else:
    print("⚠ Backend not available. Skipping tests.")
    test_results = []

---
## 5. Metrics Calculation & Analysis

In [None]:
# Aggregate results into DataFrame
if test_results:
    results_df = aggregate_results(test_results)
    
    print(f"Results Summary ({len(results_df)} tests):")
    print("\n" + "="*60)
    
    # Overall statistics
    print("\nLatency Statistics (ms):")
    print(results_df[['avg_latency_ms', 'p95_latency_ms']].describe())
    
    print("\nThroughput Statistics (fps):")
    print(results_df['throughput_fps'].describe())
    
    if 'f1_score' in results_df.columns and results_df['f1_score'].notna().any():
        print("\nAccuracy Statistics (F1 Score):")
        print(results_df['f1_score'].describe())
    
    # Display top performing configurations
    print("\n" + "="*60)
    print("Top 5 Configurations by F1 Score:")
    print("="*60)
    
    if 'f1_score' in results_df.columns:
        top_by_f1 = results_df.nlargest(5, 'f1_score')[[
            'config_id', 'frame_rate', 'resolution', 'mediapipe_complexity',
            'f1_score', 'avg_latency_ms', 'throughput_fps'
        ]]
        display(top_by_f1)
    
    print("\n" + "="*60)
    print("Top 5 Configurations by Latency (lowest):")
    print("="*60)
    
    top_by_latency = results_df.nsmallest(5, 'avg_latency_ms')[[
        'config_id', 'frame_rate', 'resolution', 'mediapipe_complexity',
        'avg_latency_ms', 'p95_latency_ms', 'throughput_fps'
    ]]
    display(top_by_latency)
    
    # Full results table
    print("\n" + "="*60)
    print("All Results:")
    print("="*60)
    display(results_df)
    
else:
    print("No test results available")
    results_df = None

---
## 6. Visualization & Analysis

In [None]:
# Latency Distribution
if test_results and results_df is not None:
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Latency by Frame Rate
    ax1 = axes[0, 0]
    results_df.boxplot(column='avg_latency_ms', by='frame_rate', ax=ax1)
    ax1.set_xlabel('Frame Rate (FPS)')
    ax1.set_ylabel('Average Latency (ms)')
    ax1.set_title('Latency by Frame Rate')
    plt.sca(ax1)
    plt.xticks(rotation=0)
    
    # 2. Latency by Resolution
    ax2 = axes[0, 1]
    results_df.boxplot(column='avg_latency_ms', by='resolution', ax=ax2)
    ax2.set_xlabel('Resolution')
    ax2.set_ylabel('Average Latency (ms)')
    ax2.set_title('Latency by Resolution')
    plt.sca(ax2)
    plt.xticks(rotation=45)
    
    # 3. Latency by MediaPipe Complexity
    ax3 = axes[1, 0]
    results_df.boxplot(column='avg_latency_ms', by='mediapipe_complexity', ax=ax3)
    ax3.set_xlabel('MediaPipe Complexity')
    ax3.set_ylabel('Average Latency (ms)')
    ax3.set_title('Latency by MediaPipe Complexity')
    plt.sca(ax3)
    plt.xticks(rotation=0)
    
    # 4. Latency by MediaPipe Confidence
    ax4 = axes[1, 1]
    results_df.boxplot(column='avg_latency_ms', by='mediapipe_confidence', ax=ax4)
    ax4.set_xlabel('MediaPipe Confidence Threshold')
    ax4.set_ylabel('Average Latency (ms)')
    ax4.set_title('Latency by Confidence Threshold')
    plt.sca(ax4)
    plt.xticks(rotation=0)
    
    plt.tight_layout()
    plt.savefig(f"{CONFIG['results_dir']}/latency_analysis_{CONFIG['test_run_id']}.png", dpi=150)
    plt.show()
    print("✓ Saved: latency_analysis.png")

In [None]:
# Throughput Analysis
if test_results and results_df is not None:
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # Throughput by frame rate
    ax1 = axes[0]
    for resolution in results_df['resolution'].unique():
        subset = results_df[results_df['resolution'] == resolution]
        ax1.plot(subset['frame_rate'], subset['throughput_fps'], 
                marker='o', label=resolution, alpha=0.7)
    ax1.set_xlabel('Target Frame Rate (FPS)')
    ax1.set_ylabel('Actual Throughput (FPS)')
    ax1.set_title('Throughput vs Target Frame Rate')
    ax1.legend(title='Resolution')
    ax1.grid(True, alpha=0.3)
    
    # Throughput distribution
    ax2 = axes[1]
    ax2.hist(results_df['throughput_fps'], bins=20, color='steelblue', edgecolor='black')
    ax2.axvline(results_df['throughput_fps'].mean(), color='red', 
               linestyle='--', label=f"Mean: {results_df['throughput_fps'].mean():.2f}")
    ax2.set_xlabel('Throughput (FPS)')
    ax2.set_ylabel('Count')
    ax2.set_title('Throughput Distribution')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(f"{CONFIG['results_dir']}/throughput_analysis_{CONFIG['test_run_id']}.png", dpi=150)
    plt.show()
    print("✓ Saved: throughput_analysis.png")

In [None]:
# Accuracy Analysis (if ground truth available)
if test_results and results_df is not None and 'f1_score' in results_df.columns:
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. F1 Score by Frame Rate
    ax1 = axes[0, 0]
    results_df.boxplot(column='f1_score', by='frame_rate', ax=ax1)
    ax1.set_xlabel('Frame Rate (FPS)')
    ax1.set_ylabel('F1 Score')
    ax1.set_title('F1 Score by Frame Rate')
    plt.sca(ax1)
    plt.xticks(rotation=0)
    
    # 2. F1 Score by Resolution
    ax2 = axes[0, 1]
    results_df.boxplot(column='f1_score', by='resolution', ax=ax2)
    ax2.set_xlabel('Resolution')
    ax2.set_ylabel('F1 Score')
    ax2.set_title('F1 Score by Resolution')
    plt.sca(ax2)
    plt.xticks(rotation=45)
    
    # 3. F1 Score by MediaPipe Complexity
    ax3 = axes[1, 0]
    results_df.boxplot(column='f1_score', by='mediapipe_complexity', ax=ax3)
    ax3.set_xlabel('MediaPipe Complexity')
    ax3.set_ylabel('F1 Score')
    ax3.set_title('F1 Score by MediaPipe Complexity')
    plt.sca(ax3)
    plt.xticks(rotation=0)
    
    # 4. Precision vs Recall scatter
    ax4 = axes[1, 1]
    scatter = ax4.scatter(results_df['recall'], results_df['precision'], 
                         c=results_df['f1_score'], cmap='viridis', s=100, alpha=0.6)
    ax4.set_xlabel('Recall')
    ax4.set_ylabel('Precision')
    ax4.set_title('Precision vs Recall')
    ax4.plot([0, 1], [0, 1], 'r--', alpha=0.3, label='Perfect balance')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    plt.colorbar(scatter, ax=ax4, label='F1 Score')
    
    plt.tight_layout()
    plt.savefig(f"{CONFIG['results_dir']}/accuracy_analysis_{CONFIG['test_run_id']}.png", dpi=150)
    plt.show()
    print("✓ Saved: accuracy_analysis.png")

In [None]:
# Latency vs Accuracy Tradeoff
if test_results and results_df is not None and 'f1_score' in results_df.columns:
    fig, ax = plt.subplots(figsize=(12, 8))
    
    # Color by frame rate, size by resolution
    for fps in results_df['frame_rate'].unique():
        subset = results_df[results_df['frame_rate'] == fps]
        ax.scatter(subset['avg_latency_ms'], subset['f1_score'], 
                  label=f"{fps} FPS", s=100, alpha=0.6)
    
    ax.set_xlabel('Average Latency (ms)')
    ax.set_ylabel('F1 Score (Accuracy)')
    ax.set_title('Latency vs Accuracy Tradeoff')
    ax.legend(title='Frame Rate')
    ax.grid(True, alpha=0.3)
    
    # Highlight best tradeoff (Pareto optimal)
    best_idx = (results_df['f1_score'] / results_df['f1_score'].max() - 
                results_df['avg_latency_ms'] / results_df['avg_latency_ms'].max()).idxmax()
    best_config = results_df.loc[best_idx]
    ax.scatter(best_config['avg_latency_ms'], best_config['f1_score'], 
              color='red', s=300, marker='*', edgecolors='black', linewidths=2,
              label='Best Tradeoff', zorder=10)
    
    ax.legend()
    plt.tight_layout()
    plt.savefig(f"{CONFIG['results_dir']}/latency_accuracy_tradeoff_{CONFIG['test_run_id']}.png", dpi=150)
    plt.show()
    
    print("\n" + "="*60)
    print("Best Tradeoff Configuration:")
    print("="*60)
    print(f"Config ID: {best_config['config_id']}")
    print(f"Frame Rate: {best_config['frame_rate']} FPS")
    print(f"Resolution: {best_config['resolution']}")
    print(f"MediaPipe Complexity: {best_config['mediapipe_complexity']}")
    print(f"MediaPipe Confidence: {best_config['mediapipe_confidence']}")
    print(f"\nMetrics:")
    print(f"  Latency: {best_config['avg_latency_ms']:.1f}ms (p95: {best_config['p95_latency_ms']:.1f}ms)")
    print(f"  Throughput: {best_config['throughput_fps']:.2f} fps")
    print(f"  F1 Score: {best_config['f1_score']:.3f}")
    print(f"  Precision: {best_config['precision']:.3f}")
    print(f"  Recall: {best_config['recall']:.3f}")
    print("✓ Saved: latency_accuracy_tradeoff.png")

In [None]:
# Confusion Matrix for Best Configuration
if test_results:
    # Find best result by F1 score
    if results_df is not None and 'f1_score' in results_df.columns:
        best_idx = results_df['f1_score'].idxmax()
        best_result = test_results[best_idx]
        
        if best_result.confusion_mat is not None:
            fig, ax = plt.subplots(figsize=(10, 8))
            
            # Get labels (emotions)
            y_true, y_pred = MetricsCalculator.align_predictions_with_ground_truth(
                best_result.emotion_results, ground_truth, tolerance=0.5
            )
            labels = sorted(set(y_true))
            
            # Plot confusion matrix
            sns.heatmap(best_result.confusion_mat, annot=True, fmt='d', cmap='Blues',
                       xticklabels=labels, yticklabels=labels, ax=ax)
            ax.set_xlabel('Predicted Emotion')
            ax.set_ylabel('True Emotion')
            ax.set_title(f'Confusion Matrix - Best Configuration\n{best_result.config.config_id()}')
            
            plt.tight_layout()
            plt.savefig(f"{CONFIG['results_dir']}/confusion_matrix_{CONFIG['test_run_id']}.png", dpi=150)
            plt.show()
            print("✓ Saved: confusion_matrix.png")

---
## 7. Results Export & Recommendations

In [None]:
# Export results to CSV
if results_df is not None:
    csv_path = f"{CONFIG['results_dir']}/test_results_{CONFIG['test_run_id']}.csv"
    results_df.to_csv(csv_path, index=False)
    print(f"✓ Exported results to: {csv_path}")
    
    # Export detailed results as JSON
    json_path = f"{CONFIG['results_dir']}/test_results_detailed_{CONFIG['test_run_id']}.json"
    
    detailed_results = []
    for result in test_results:
        result_dict = result.to_dict()
        # Convert emotion results to dicts
        result_dict['emotion_results'] = [
            {
                'timestamp': er.timestamp,
                'emotion': er.emotion,
                'investor_state': er.investor_state,
                'confidence': er.confidence,
                'latency_ms': er.latency_ms
            }
            for er in result.emotion_results
        ]
        detailed_results.append(result_dict)
    
    with open(json_path, 'w') as f:
        json.dump({
            'test_run_id': CONFIG['test_run_id'],
            'video_path': CONFIG['video_path'],
            'ground_truth_path': CONFIG['ground_truth_path'],
            'total_tests': len(test_results),
            'results': detailed_results
        }, f, indent=2)
    
    print(f"✓ Exported detailed results to: {json_path}")

In [None]:
# Generate recommendations summary
if results_df is not None:
    print("\n" + "="*60)
    print("RECOMMENDATIONS")
    print("="*60)
    
    # Best for latency
    best_latency = results_df.loc[results_df['avg_latency_ms'].idxmin()]
    print("\n🚀 LOWEST LATENCY Configuration:")
    print(f"   {best_latency['config_id']}")
    print(f"   Latency: {best_latency['avg_latency_ms']:.1f}ms avg")
    print(f"   Use when: Responsiveness is critical")
    
    # Best for accuracy (if available)
    if 'f1_score' in results_df.columns and results_df['f1_score'].notna().any():
        best_accuracy = results_df.loc[results_df['f1_score'].idxmax()]
        print("\n🎯 HIGHEST ACCURACY Configuration:")
        print(f"   {best_accuracy['config_id']}")
        print(f"   F1 Score: {best_accuracy['f1_score']:.3f}")
        print(f"   Use when: Accuracy is most important")
        
        # Best balanced (if F1 scores available)
        # Normalize and combine metrics
        normalized_f1 = results_df['f1_score'] / results_df['f1_score'].max()
        normalized_latency = 1 - (results_df['avg_latency_ms'] / results_df['avg_latency_ms'].max())
        balance_score = (normalized_f1 + normalized_latency) / 2
        best_balanced = results_df.loc[balance_score.idxmax()]
        
        print("\n⚖️  BEST BALANCED Configuration:")
        print(f"   {best_balanced['config_id']}")
        print(f"   F1 Score: {best_balanced['f1_score']:.3f}")
        print(f"   Latency: {best_balanced['avg_latency_ms']:.1f}ms avg")
        print(f"   Use when: Need good balance of speed and accuracy")
    
    # Best for throughput
    best_throughput = results_df.loc[results_df['throughput_fps'].idxmax()]
    print("\n📊 HIGHEST THROUGHPUT Configuration:")
    print(f"   {best_throughput['config_id']}")
    print(f"   Throughput: {best_throughput['throughput_fps']:.2f} fps")
    print(f"   Use when: Processing many frames quickly")
    
    print("\n" + "="*60)
    
    # Save recommendations
    recommendations_path = f"{CONFIG['results_dir']}/recommendations_{CONFIG['test_run_id']}.txt"
    with open(recommendations_path, 'w') as f:
        f.write("BeneAI Performance Testing Recommendations\n")
        f.write("="*60 + "\n\n")
        f.write(f"Test Run ID: {CONFIG['test_run_id']}\n")
        f.write(f"Video: {CONFIG['video_path']}\n")
        f.write(f"Total Tests: {len(test_results)}\n\n")
        f.write(f"Lowest Latency: {best_latency['config_id']}\n")
        if 'f1_score' in results_df.columns:
            f.write(f"Highest Accuracy: {best_accuracy['config_id']}\n")
            f.write(f"Best Balanced: {best_balanced['config_id']}\n")
        f.write(f"Highest Throughput: {best_throughput['config_id']}\n")
    
    print(f"\n✓ Saved recommendations to: {recommendations_path}")

---
## Summary

This notebook provided comprehensive testing of BeneAI's performance across multiple parameter configurations. 

**Key Outputs:**
- Performance metrics (latency, throughput, accuracy)
- Visualizations showing parameter impacts
- Configuration recommendations for different use cases
- Exported results for further analysis

**Next Steps:**
1. Apply recommended configurations to your production deployment
2. Re-run tests with additional videos to validate findings
3. Monitor real-world performance and adjust as needed
4. Consider A/B testing different configurations with users