# Leaderboard Generation

This notebook demonstrates how to create comprehensive performance leaderboards from benchmark results, showing how to rank estimators and generate publication-ready comparisons.

## Overview

The leaderboard generation system allows you to:

1. **Load Benchmark Results**: Import results from multiple benchmark runs
2. **Create Rankings**: Generate performance rankings across different metrics
3. **Composite Scoring**: Combine multiple metrics into overall scores
4. **Visualization**: Create publication-ready plots and tables
5. **Export Results**: Save leaderboards in various formats

## Table of Contents

1. [Setup and Imports](#setup)
2. [Loading Benchmark Results](#loading)
3. [Creating Performance Rankings](#rankings)
4. [Composite Scoring System](#scoring)
5. [Visualization and Export](#visualization)
6. [Summary and Next Steps](#summary)


## 1. Setup and Imports {#setup}

First, let's import all necessary libraries and set up the leaderboard generation system.


In [None]:
# Standard scientific computing imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import time
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

# Set random seed for reproducibility
np.random.seed(42)

# Import LRDBenchmark leaderboard system
from lrdbenchmark.analysis.benchmark import ComprehensiveBenchmark

# Import data models for generating test data
from lrdbenchmark.models.data_models.fbm.fbm_model import FractionalBrownianMotion
from lrdbenchmark.models.data_models.fgn.fgn_model import FractionalGaussianNoise

# Import estimators for testing
from lrdbenchmark.analysis.temporal.rs.rs_estimator_unified import RSEstimator
from lrdbenchmark.analysis.temporal.dfa.dfa_estimator_unified import DFAEstimator
from lrdbenchmark.analysis.spectral.gph.gph_estimator_unified import GPHEstimator
from lrdbenchmark.analysis.spectral.whittle.whittle_estimator_unified import WhittleEstimator
from lrdbenchmark.analysis.machine_learning.random_forest_estimator_unified import RandomForestEstimator
from lrdbenchmark.analysis.machine_learning.svr_estimator_unified import SVREstimator

print("‚úÖ All imports successful!")
print("üèÜ Ready to generate performance leaderboards")


## 2. Loading Benchmark Results {#loading}

Let's run comprehensive benchmarks to generate data for our leaderboard, then load and process the results.


In [None]:
# Initialize benchmark system
print("üîß Initializing Benchmark System for Leaderboard Generation...")
print("=" * 70)

benchmark = ComprehensiveBenchmark(output_dir="leaderboard_results")

# Run comprehensive benchmarks
print("\nüöÄ Running Comprehensive Benchmarks...")
print("=" * 70)

# Run classical benchmark
print("üìä Running Classical Estimator Benchmark...")
classical_results = benchmark.run_classical_benchmark(
    data_length=1000,
    n_runs=10,
    save_results=True
)

print(f"‚úÖ Classical benchmark completed!")
print(f"Success rate: {classical_results['success_rate']:.1%}")
print(f"Total tests: {classical_results['total_tests']}")

# Run ML benchmark
print("\nüìä Running ML Estimator Benchmark...")
ml_results = benchmark.run_ml_benchmark(
    data_length=1000,
    n_runs=5,
    save_results=True
)

print(f"‚úÖ ML benchmark completed!")
print(f"Success rate: {ml_results['success_rate']:.1%}")
print(f"Total tests: {ml_results['total_tests']}")

# Run neural benchmark
print("\nüìä Running Neural Network Benchmark...")
neural_results = benchmark.run_neural_benchmark(
    data_length=1000,
    n_runs=3,
    save_results=True
)

print(f"‚úÖ Neural benchmark completed!")
print(f"Success rate: {neural_results['success_rate']:.1%}")
print(f"Total tests: {neural_results['total_tests']}")

# Run comprehensive benchmark
print("\nüìä Running Comprehensive Benchmark...")
comprehensive_results = benchmark.run_comprehensive_benchmark(
    data_length=1000,
    n_runs=5,
    save_results=True
)

print(f"‚úÖ Comprehensive benchmark completed!")
print(f"Success rate: {comprehensive_results['success_rate']:.1%}")
print(f"Total tests: {comprehensive_results['total_tests']}")

print("\nüéØ All benchmarks completed successfully!")


## 3. Creating Performance Rankings {#rankings}

Now let's create comprehensive performance rankings and leaderboards from our benchmark results.


In [None]:
# Create comprehensive leaderboard
print("üèÜ Creating Performance Leaderboard...")
print("=" * 70)

# Combine all benchmark results
all_results = {
    'Classical': classical_results,
    'ML': ml_results,
    'Neural': neural_results,
    'Comprehensive': comprehensive_results
}

# Create performance summary
performance_data = []

for category, results in all_results.items():
    if 'summary' in results and 'estimator_results' in results['summary']:
        for estimator_result in results['summary']['estimator_results']:
            if estimator_result['success']:
                performance_data.append({
                    'Category': category,
                    'Estimator': estimator_result['estimator'],
                    'True_H': estimator_result['true_hurst'],
                    'Estimated_H': estimator_result['estimated_hurst'],
                    'Error': estimator_result['error'],
                    'Execution_Time': estimator_result['execution_time'],
                    'Data_Model': estimator_result.get('data_model', 'Unknown')
                })

# Create DataFrame
performance_df = pd.DataFrame(performance_data)

if len(performance_df) > 0:
    print(f"üìä Loaded {len(performance_df)} performance records")
    
    # Calculate performance metrics
    performance_metrics = performance_df.groupby(['Category', 'Estimator']).agg({
        'Error': ['mean', 'std', 'min', 'max'],
        'Execution_Time': ['mean', 'std'],
        'True_H': 'count'
    }).round(4)
    
    print("\nüìà Performance Metrics Summary:")
    print(performance_metrics)
    
    # Create overall leaderboard
    print("\nüèÜ Overall Performance Leaderboard:")
    print("=" * 70)
    
    # Calculate composite scores
    leaderboard_data = []
    
    for (category, estimator), group in performance_df.groupby(['Category', 'Estimator']):
        mean_error = group['Error'].mean()
        std_error = group['Error'].std()
        mean_time = group['Execution_Time'].mean()
        count = len(group)
        
        # Composite score (lower is better for error, higher is better for count)
        composite_score = (1 / (1 + mean_error)) * (count / 10) * (1 / (1 + mean_time))
        
        leaderboard_data.append({
            'Category': category,
            'Estimator': estimator,
            'Mean_Error': mean_error,
            'Std_Error': std_error,
            'Mean_Time': mean_time,
            'Count': count,
            'Composite_Score': composite_score
        })
    
    leaderboard_df = pd.DataFrame(leaderboard_data)
    leaderboard_df = leaderboard_df.sort_values('Composite_Score', ascending=False)
    
    print(leaderboard_df.round(4))
    
    # Save leaderboard
    leaderboard_df.to_csv('outputs/performance_leaderboard.csv', index=False)
    print("\nüíæ Leaderboard saved to outputs/performance_leaderboard.csv")
    
else:
    print("‚ùå No performance data available for leaderboard generation")


## 4. Visualization and Export {#visualization}

Let's create comprehensive visualizations of our leaderboard results and export them in various formats.


In [None]:
# Create comprehensive visualizations
if len(performance_df) > 0:
    print("üìä Creating Performance Visualizations...")
    print("=" * 70)
    
    # Create figure with subplots
    fig, axes = plt.subplots(2, 3, figsize=(20, 12))
    
    # 1. Error distribution by category
    ax1 = axes[0, 0]
    for category in performance_df['Category'].unique():
        category_data = performance_df[performance_df['Category'] == category]['Error']
        ax1.hist(category_data, alpha=0.7, label=category, bins=15)
    ax1.set_xlabel('Absolute Error')
    ax1.set_ylabel('Frequency')
    ax1.set_title('Error Distribution by Category')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # 2. Execution time by category
    ax2 = axes[0, 1]
    for category in performance_df['Category'].unique():
        category_data = performance_df[performance_df['Category'] == category]['Execution_Time']
        ax2.hist(category_data, alpha=0.7, label=category, bins=15)
    ax2.set_xlabel('Execution Time (seconds)')
    ax2.set_ylabel('Frequency')
    ax2.set_title('Execution Time Distribution by Category')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # 3. Error vs True H
    ax3 = axes[0, 2]
    for category in performance_df['Category'].unique():
        category_data = performance_df[performance_df['Category'] == category]
        ax3.scatter(category_data['True_H'], category_data['Error'], 
                   alpha=0.7, label=category, s=50)
    ax3.set_xlabel('True Hurst Parameter')
    ax3.set_ylabel('Absolute Error')
    ax3.set_title('Error vs True Hurst Parameter')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # 4. Performance by estimator
    ax4 = axes[1, 0]
    estimator_performance = performance_df.groupby('Estimator')['Error'].mean().sort_values()
    ax4.bar(range(len(estimator_performance)), estimator_performance.values, alpha=0.7)
    ax4.set_xlabel('Estimator')
    ax4.set_ylabel('Mean Absolute Error')
    ax4.set_title('Mean Error by Estimator')
    ax4.set_xticks(range(len(estimator_performance)))
    ax4.set_xticklabels(estimator_performance.index, rotation=45, ha='right')
    ax4.grid(True, alpha=0.3)
    
    # 5. Execution time by estimator
    ax5 = axes[1, 1]
    time_performance = performance_df.groupby('Estimator')['Execution_Time'].mean().sort_values()
    ax5.bar(range(len(time_performance)), time_performance.values, alpha=0.7)
    ax5.set_xlabel('Estimator')
    ax5.set_ylabel('Mean Execution Time (seconds)')
    ax5.set_title('Mean Execution Time by Estimator')
    ax5.set_xticks(range(len(time_performance)))
    ax5.set_xticklabels(time_performance.index, rotation=45, ha='right')
    ax5.grid(True, alpha=0.3)
    
    # 6. Composite score ranking
    ax6 = axes[1, 2]
    if len(leaderboard_df) > 0:
        top_10 = leaderboard_df.head(10)
        ax6.barh(range(len(top_10)), top_10['Composite_Score'], alpha=0.7)
        ax6.set_xlabel('Composite Score')
        ax6.set_ylabel('Rank')
        ax6.set_title('Top 10 Estimators by Composite Score')
        ax6.set_yticks(range(len(top_10)))
        ax6.set_yticklabels([f"{row['Category']} - {row['Estimator']}" for _, row in top_10.iterrows()])
        ax6.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('outputs/leaderboard_visualization.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    # Create category-specific leaderboards
    print("\nüìä Category-Specific Leaderboards:")
    print("=" * 70)
    
    for category in performance_df['Category'].unique():
        category_data = performance_df[performance_df['Category'] == category]
        category_leaderboard = category_data.groupby('Estimator').agg({
            'Error': ['mean', 'std'],
            'Execution_Time': 'mean',
            'True_H': 'count'
        }).round(4)
        
        print(f"\n{category} Category Leaderboard:")
        print(category_leaderboard)
    
    # Export results in multiple formats
    print("\nüíæ Exporting Results...")
    print("=" * 70)
    
    # CSV export
    performance_df.to_csv('outputs/performance_data.csv', index=False)
    print("‚úÖ Performance data exported to CSV")
    
    # JSON export
    performance_df.to_json('outputs/performance_data.json', orient='records', indent=2)
    print("‚úÖ Performance data exported to JSON")
    
    # LaTeX table export
    if len(leaderboard_df) > 0:
        latex_table = leaderboard_df.to_latex(index=False, float_format='%.4f')
        with open('outputs/leaderboard_table.tex', 'w') as f:
            f.write(latex_table)
        print("‚úÖ Leaderboard table exported to LaTeX")
    
    print("\nüéØ All visualizations and exports completed successfully!")
    
else:
    print("‚ùå No performance data available for visualization")


## 5. Summary and Next Steps {#summary}

### Key Takeaways

1. **Leaderboard Generation**: LRDBenchmark provides comprehensive tools for creating performance leaderboards:
   - **Multi-category Comparison**: Classical, ML, and Neural estimators
   - **Composite Scoring**: Combined accuracy, speed, and reliability metrics
   - **Statistical Analysis**: Confidence intervals and significance tests
   - **Publication-ready Output**: LaTeX, CSV, JSON formats

2. **Performance Rankings**: The system generates multiple types of leaderboards:
   - **Overall Leaderboard**: Combined performance across all categories
   - **Category-specific**: Rankings within each estimator category
   - **Metric-specific**: Rankings by accuracy, speed, or reliability
   - **Composite Scoring**: Weighted combination of multiple metrics

3. **Visualization**: Comprehensive plots and tables for:
   - **Error Distributions**: Performance across different scenarios
   - **Execution Time Analysis**: Computational efficiency comparison
   - **Scatter Plots**: Error vs true Hurst parameter relationships
   - **Bar Charts**: Direct performance comparisons

### Leaderboard Results

- **Top Performers**: Best estimators across different categories
- **Performance Trade-offs**: Accuracy vs speed analysis
- **Category Strengths**: Each category's optimal use cases
- **Statistical Significance**: Confidence in performance differences

### Next Steps

1. **Real-world Application**: Apply leaderboards to actual time series data
2. **Advanced Analysis**: Explore statistical significance and confidence intervals
3. **Custom Metrics**: Create domain-specific performance measures
4. **Interactive Dashboards**: Build web-based leaderboard interfaces

### Files Generated

- `outputs/performance_leaderboard.csv`: Complete leaderboard data
- `outputs/performance_data.csv`: Raw performance data
- `outputs/performance_data.json`: JSON format data
- `outputs/leaderboard_table.tex`: LaTeX table for publications
- `outputs/leaderboard_visualization.png`: Comprehensive visualization

### References

1. Taqqu, M. S., Teverovsky, V., & Willinger, W. (1995). Estimators for long-range dependence: an empirical study. Fractals, 3(04), 785-798.
2. Beran, J. (1994). Statistics for long-memory processes. CRC press.
3. Abry, P., & Veitch, D. (1998). Wavelet analysis of long-range-dependent traffic. IEEE Transactions on information theory, 44(1), 2-15.

---

**Congratulations!** You've completed the comprehensive LRDBenchmark demonstration series. You now have a complete understanding of:
- Data generation and visualization
- Estimation and statistical validation
- Custom model and estimator development
- Comprehensive benchmarking
- Leaderboard generation and analysis
