# Prediction Accuracy Analysis Report

This notebook demonstrates how to load, analyze, and visualize prediction accuracy data from the SenseQuant trading system.

## Overview

The accuracy audit system captures telemetry during backtests and live trading, recording:
- Predicted vs. actual trade directions (LONG/SHORT/NOOP)
- Entry and exit prices
- Holding periods
- Realized returns
- Feature values and metadata

This notebook shows how to:
1. Load prediction traces from CSV files
2. Compute accuracy metrics (precision, recall, F1)
3. Analyze financial performance (Sharpe ratio, drawdown, profit factor)
4. Generate visualizations (confusion matrix, return distributions)

## Configuration

Set the `batch_id` parameter to analyze a specific backtest run, or use glob patterns to aggregate multiple runs.

In [None]:
# Configuration - Modify these parameters
batch_id = "20241012_143000"  # Replace with your batch ID
telemetry_dir = f"../data/analytics/predictions_{batch_id}_*.csv*"  # Glob pattern for trace files
output_dir = "../data/reports"  # Output directory for reports and plots

# Create output directory if needed
import os

os.makedirs(output_dir, exist_ok=True)

print(f"Batch ID: {batch_id}")
print(f"Telemetry pattern: {telemetry_dir}")
print(f"Output directory: {output_dir}")

## 1. Import Dependencies

Import required libraries and modules from the SenseQuant codebase.

In [None]:
import glob
import json

# Import SenseQuant modules
import sys
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sys.path.insert(0, '../src')

from services.accuracy_analyzer import AccuracyAnalyzer

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("Dependencies loaded successfully!")

## 2. Load Prediction Traces

Load telemetry data from CSV files. The analyzer supports both compressed (.csv.gz) and uncompressed (.csv) formats.

In [None]:
# Initialize analyzer
analyzer = AccuracyAnalyzer()

# Find all matching telemetry files
trace_files = glob.glob(telemetry_dir)

if not trace_files:
    print(f"WARNING: No trace files found matching pattern: {telemetry_dir}")
    print("Please check the batch_id and telemetry_dir configuration.")
else:
    print(f"Found {len(trace_files)} trace file(s):")
    for f in trace_files:
        print(f"  - {f}")

# Load all traces
all_traces = []
for trace_file in trace_files:
    try:
        traces = analyzer.load_traces(Path(trace_file))
        all_traces.extend(traces)
        print(f"Loaded {len(traces)} traces from {trace_file}")
    except Exception as e:
        print(f"Error loading {trace_file}: {e}")

print(f"\nTotal traces loaded: {len(all_traces)}")

if len(all_traces) == 0:
    print("\nNo traces available for analysis. This notebook requires telemetry data.")
    print("Run a backtest with enable_telemetry=True to generate trace files.")

## 3. Explore Trace Data

Let's examine the structure and basic statistics of the loaded traces.

In [None]:
if len(all_traces) > 0:
    # Convert traces to DataFrame for easier analysis
    traces_df = pd.DataFrame([trace.to_dict() for trace in all_traces])

    print("=== Trace Data Summary ===")
    print(f"\nTotal trades: {len(traces_df)}")
    print(f"Date range: {traces_df['timestamp'].min()} to {traces_df['timestamp'].max()}")
    print(f"\nSymbols: {traces_df['symbol'].unique()}")
    print(f"Strategies: {traces_df['strategy'].unique()}")

    print("\n=== Direction Distribution ===")
    print("\nPredicted directions:")
    print(traces_df['predicted_direction'].value_counts())
    print("\nActual directions:")
    print(traces_df['actual_direction'].value_counts())

    print("\n=== Return Statistics ===")
    print(traces_df['realized_return_pct'].describe())

    # Display first few traces
    print("\n=== Sample Traces (first 5) ===")
    display(traces_df[[
        'timestamp', 'symbol', 'strategy',
        'predicted_direction', 'actual_direction',
        'realized_return_pct'
    ]].head())
else:
    print("No traces to display.")

## 4. Compute Accuracy Metrics

Calculate comprehensive accuracy metrics including:
- Classification metrics: precision, recall, F1 score per direction
- Confusion matrix showing prediction vs. actual outcomes
- Hit ratio (overall accuracy)
- Win rate (percentage of profitable trades)
- Financial metrics: Sharpe ratio, max drawdown, profit factor

In [None]:
if len(all_traces) > 0:
    # Compute metrics
    metrics = analyzer.compute_metrics(all_traces)

    print("=== Classification Metrics ===")
    print(f"\nHit Ratio (Accuracy): {metrics.hit_ratio:.2%}")
    print(f"Total Trades: {metrics.total_trades}")

    print("\nPrecision by Direction:")
    for direction, score in metrics.precision.items():
        print(f"  {direction}: {score:.2%}")

    print("\nRecall by Direction:")
    for direction, score in metrics.recall.items():
        print(f"  {direction}: {score:.2%}")

    print("\nF1 Score by Direction:")
    for direction, score in metrics.f1_score.items():
        print(f"  {direction}: {score:.2%}")

    print("\n=== Financial Metrics ===")
    print(f"Win Rate: {metrics.win_rate:.2%}")
    print(f"Average Return: {metrics.avg_return:.2f}%")
    print(f"Sharpe Ratio: {metrics.sharpe_ratio:.2f}")
    print(f"Max Drawdown: {metrics.max_drawdown:.2f}%")
    print(f"Profit Factor: {metrics.profit_factor:.2f}")
    print(f"Average Holding Period: {metrics.avg_holding_minutes:.1f} minutes")
else:
    print("No traces available for metric computation.")

## 5. Confusion Matrix Visualization

The confusion matrix shows how well predictions match actual outcomes:
- Rows: Actual direction (what actually happened)
- Columns: Predicted direction (what we predicted)
- Diagonal values: Correct predictions
- Off-diagonal values: Misclassifications

In [None]:
if len(all_traces) > 0:
    # Create confusion matrix plot
    plt.figure(figsize=(10, 8))

    labels = ["LONG", "SHORT", "NOOP"]

    sns.heatmap(
        metrics.confusion_matrix,
        annot=True,
        fmt='d',
        cmap='Blues',
        xticklabels=labels,
        yticklabels=labels,
        cbar_kws={'label': 'Count'},
        square=True,
        linewidths=0.5
    )

    plt.title('Prediction Confusion Matrix', fontsize=16, fontweight='bold', pad=20)
    plt.xlabel('Predicted Direction', fontsize=12, labelpad=10)
    plt.ylabel('Actual Direction', fontsize=12, labelpad=10)
    plt.tight_layout()

    # Save plot
    output_path = Path(output_dir) / f"confusion_matrix_{batch_id}.png"
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    print(f"Saved confusion matrix to: {output_path}")

    plt.show()
else:
    print("No data available for confusion matrix.")

## 6. Return Distribution Analysis

Analyze the distribution of realized returns to understand:
- Overall profitability
- Return symmetry (skewness)
- Outliers and tail risk

In [None]:
if len(all_traces) > 0:
    returns = [t.realized_return_pct for t in all_traces]

    # Create return distribution plot
    plt.figure(figsize=(14, 6))

    # Histogram
    plt.subplot(1, 2, 1)
    plt.hist(returns, bins=50, edgecolor='black', alpha=0.7, color='steelblue')
    plt.axvline(x=0, color='red', linestyle='--', linewidth=2, label='Break-even')
    plt.axvline(x=np.mean(returns), color='green', linestyle='--', linewidth=2,
                label=f'Mean: {np.mean(returns):.2f}%')
    plt.title('Return Distribution', fontsize=14, fontweight='bold')
    plt.xlabel('Return (%)', fontsize=11)
    plt.ylabel('Frequency', fontsize=11)
    plt.legend(fontsize=10)
    plt.grid(True, alpha=0.3)

    # Box plot
    plt.subplot(1, 2, 2)
    plt.boxplot(returns, vert=True, patch_artist=True,
                boxprops=dict(facecolor='lightblue', alpha=0.7),
                medianprops=dict(color='red', linewidth=2))
    plt.axhline(y=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)
    plt.title('Return Box Plot', fontsize=14, fontweight='bold')
    plt.ylabel('Return (%)', fontsize=11)
    plt.grid(True, alpha=0.3, axis='y')

    plt.tight_layout()

    # Save plot
    output_path = Path(output_dir) / f"return_distribution_{batch_id}.png"
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    print(f"Saved return distribution to: {output_path}")

    plt.show()

    # Print statistics
    print("\n=== Return Statistics ===")
    print(f"Mean: {np.mean(returns):.2f}%")
    print(f"Median: {np.median(returns):.2f}%")
    print(f"Std Dev: {np.std(returns):.2f}%")
    print(f"Min: {np.min(returns):.2f}%")
    print(f"Max: {np.max(returns):.2f}%")
    print(f"Skewness: {pd.Series(returns).skew():.2f}")
    print(f"Kurtosis: {pd.Series(returns).kurtosis():.2f}")
else:
    print("No data available for return distribution.")

## 7. Precision-Recall Analysis

Compare precision and recall across different prediction directions to identify strengths and weaknesses.

In [None]:
if len(all_traces) > 0:
    # Create precision-recall bar chart
    directions = ['LONG', 'SHORT', 'NOOP']
    precision_vals = [metrics.precision[d] for d in directions]
    recall_vals = [metrics.recall[d] for d in directions]
    f1_vals = [metrics.f1_score[d] for d in directions]

    x = np.arange(len(directions))
    width = 0.25

    plt.figure(figsize=(12, 6))

    plt.bar(x - width, precision_vals, width, label='Precision', color='skyblue', edgecolor='black')
    plt.bar(x, recall_vals, width, label='Recall', color='lightcoral', edgecolor='black')
    plt.bar(x + width, f1_vals, width, label='F1 Score', color='lightgreen', edgecolor='black')

    plt.xlabel('Direction', fontsize=12)
    plt.ylabel('Score', fontsize=12)
    plt.title('Precision, Recall, and F1 Score by Direction', fontsize=14, fontweight='bold')
    plt.xticks(x, directions)
    plt.ylim(0, 1.1)
    plt.legend(fontsize=10)
    plt.grid(True, alpha=0.3, axis='y')
    plt.tight_layout()

    # Save plot
    output_path = Path(output_dir) / f"precision_recall_{batch_id}.png"
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    print(f"Saved precision-recall chart to: {output_path}")

    plt.show()
else:
    print("No data available for precision-recall analysis.")

## 8. Cumulative Return Analysis

Visualize the cumulative returns over time to understand strategy performance trajectory.

In [None]:
if len(all_traces) > 0:
    # Sort traces by timestamp
    sorted_traces = sorted(all_traces, key=lambda t: t.timestamp)

    # Compute cumulative returns
    timestamps = [t.timestamp for t in sorted_traces]
    returns = [t.realized_return_pct for t in sorted_traces]
    cumulative_returns = np.cumsum(returns)

    # Plot cumulative returns
    plt.figure(figsize=(14, 6))
    plt.plot(timestamps, cumulative_returns, linewidth=2, color='steelblue', label='Cumulative Return')
    plt.axhline(y=0, color='red', linestyle='--', linewidth=1, alpha=0.5, label='Break-even')

    # Mark drawdown regions
    running_max = np.maximum.accumulate(cumulative_returns)
    drawdown = cumulative_returns - running_max
    plt.fill_between(timestamps, cumulative_returns, running_max,
                     where=(drawdown < 0), color='red', alpha=0.2, label='Drawdown')

    plt.title('Cumulative Returns Over Time', fontsize=14, fontweight='bold')
    plt.xlabel('Date', fontsize=11)
    plt.ylabel('Cumulative Return (%)', fontsize=11)
    plt.legend(fontsize=10)
    plt.grid(True, alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()

    # Save plot
    output_path = Path(output_dir) / f"cumulative_returns_{batch_id}.png"
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    print(f"Saved cumulative returns chart to: {output_path}")

    plt.show()
else:
    print("No data available for cumulative return analysis.")

## 9. Export Metrics Report

Save computed metrics to a JSON file for future reference and comparison.

In [None]:
if len(all_traces) > 0:
    # Export report to JSON
    report_path = Path(output_dir) / f"accuracy_report_{batch_id}.json"
    analyzer.export_report(metrics, report_path)

    print(f"Metrics report exported to: {report_path}")

    # Display report contents
    with open(report_path) as f:
        report_data = json.load(f)

    print("\n=== Report Contents (preview) ===")
    print(json.dumps(report_data, indent=2, default=str)[:1000] + "...")
else:
    print("No data available for report export.")

## 10. Summary and Recommendations

Generate actionable insights based on the analysis.

In [None]:
if len(all_traces) > 0:
    print("=== Analysis Summary ===")
    print(f"\nBatch ID: {batch_id}")
    print(f"Total Trades Analyzed: {metrics.total_trades}")
    print(f"Date Range: {traces_df['timestamp'].min()} to {traces_df['timestamp'].max()}")

    print("\n=== Key Performance Indicators ===")
    print(f"Hit Ratio: {metrics.hit_ratio:.2%}")
    print(f"Win Rate: {metrics.win_rate:.2%}")
    print(f"Sharpe Ratio: {metrics.sharpe_ratio:.2f}")
    print(f"Profit Factor: {metrics.profit_factor:.2f}")

    print("\n=== Recommendations ===")

    # Hit ratio recommendations
    if metrics.hit_ratio < 0.5:
        print("- Low hit ratio (<50%) suggests prediction accuracy needs improvement")
        print("  Consider feature engineering or model tuning")
    elif metrics.hit_ratio > 0.7:
        print("- Strong hit ratio (>70%) indicates good prediction accuracy")

    # Win rate recommendations
    if metrics.win_rate < 0.5:
        print("- Win rate below 50% suggests risk management issues")
        print("  Review stop-loss and take-profit settings")

    # Sharpe ratio recommendations
    if metrics.sharpe_ratio < 1.0:
        print("- Sharpe ratio <1.0 indicates poor risk-adjusted returns")
        print("  Focus on reducing volatility or improving returns")
    elif metrics.sharpe_ratio > 2.0:
        print("- Excellent Sharpe ratio (>2.0) shows strong risk-adjusted performance")

    # Direction-specific recommendations
    print("\n=== Direction-Specific Insights ===")
    for direction in ['LONG', 'SHORT', 'NOOP']:
        if metrics.precision[direction] < 0.4:
            print(f"- {direction} predictions have low precision ({metrics.precision[direction]:.2%})")
            print(f"  Many {direction} predictions turn out to be incorrect")
        if metrics.recall[direction] < 0.4:
            print(f"- {direction} predictions have low recall ({metrics.recall[direction]:.2%})")
            print(f"  Missing many actual {direction} opportunities")

    print("\n=== Next Steps ===")
    print("1. Compare this batch with historical batches to identify trends")
    print("2. Analyze misclassifications in confusion matrix for pattern insights")
    print("3. Investigate outlier returns (extreme gains/losses)")
    print("4. Consider parameter optimization based on weak areas")
    print("5. Run additional backtests with adjusted hyperparameters")
else:
    print("No data available for summary generation.")
    print("\nTo generate a report:")
    print("1. Run a backtest with enable_telemetry=True")
    print("2. Update the batch_id parameter in this notebook")
    print("3. Re-run all cells")

## 11. Release Audit Summary (US-022)

This section consolidates telemetry data, optimization results, and student model metrics
for release readiness assessment. It provides a unified view of baseline vs optimized performance,
student model validation status, and monitoring KPIs.

In [None]:
# Load latest release audit bundle (if available)
import glob
from datetime import datetime

# Find latest audit bundle
audit_dirs = sorted(glob.glob("../release/audit_*"), reverse=True)

if audit_dirs:
    latest_audit_dir = Path(audit_dirs[0])
    metrics_path = latest_audit_dir / "metrics.json"
    
    if metrics_path.exists():
        with open(metrics_path) as f:
            audit_metrics = json.load(f)
        
        print(f"\n{'='*70}")
        print("RELEASE AUDIT SUMMARY")
        print(f"{'='*70}")
        print(f"\nAudit ID: {audit_metrics['audit_id']}")
        print(f"Audit Date: {audit_metrics['audit_timestamp']}")
        print(f"Deployment Ready: {'YES' if audit_metrics['deployment_ready'] else 'NO'}")
        
        if audit_metrics.get('risk_flags'):
            print(f"\nRisk Flags ({len(audit_metrics['risk_flags'])}):")
            for flag in audit_metrics['risk_flags']:
                print(f"   - {flag}")
    else:
        print("Latest audit bundle found but metrics.json is missing")
        audit_metrics = None
else:
    print("No release audit bundles found. Run: python scripts/release_audit.py")
    audit_metrics = None

In [None]:
# Baseline vs Optimized Comparison
if audit_metrics and 'baseline' in audit_metrics and 'optimized' in audit_metrics:
    baseline = audit_metrics['baseline']
    optimized = audit_metrics['optimized']
    deltas = audit_metrics.get('deltas', {})
    
    print("\n" + "="*70)
    print("BASELINE vs OPTIMIZED CONFIGURATION")
    print("="*70)
    
    comparison_data = [
        ['Sharpe Ratio', baseline['sharpe_ratio'], optimized['sharpe_ratio'], 
         deltas.get('sharpe_ratio_delta', 0.0)],
        ['Total Return (%)', baseline['total_return_pct'], optimized['total_return_pct'], 
         deltas.get('total_return_delta_pct', 0.0)],
        ['Win Rate (%)', baseline['win_rate_pct'], optimized['win_rate_pct'], 
         deltas.get('win_rate_delta_pct', 0.0)],
        ['Hit Ratio (%)', baseline['hit_ratio_pct'], optimized['hit_ratio_pct'], 
         deltas.get('hit_ratio_delta_pct', 0.0)],
    ]
    
    comp_df = pd.DataFrame(comparison_data, 
                          columns=['Metric', 'Baseline', 'Optimized', 'Delta'])
    
    print("\n" + comp_df.to_string(index=False))
    
    # Visualization
    fig, ax = plt.subplots(figsize=(14, 6))
    
    x = np.arange(len(comparison_data))
    width = 0.35
    
    baseline_vals = [row[1] for row in comparison_data]
    optimized_vals = [row[2] for row in comparison_data]
    
    bars1 = ax.bar(x - width/2, baseline_vals, width, label='Baseline',
                   color='#E74C3C', alpha=0.8, edgecolor='black')
    bars2 = ax.bar(x + width/2, optimized_vals, width, label='Optimized',
                   color='#27AE60', alpha=0.8, edgecolor='black')
    
    ax.set_xlabel('Metrics', fontsize=12, fontweight='bold')
    ax.set_ylabel('Value', fontsize=12, fontweight='bold')
    ax.set_title('Release Audit: Baseline vs Optimized Performance',
                fontsize=14, fontweight='bold')
    ax.set_xticks(x)
    ax.set_xticklabels([row[0] for row in comparison_data], rotation=15, ha='right')
    ax.legend()
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    
    output_path = Path(output_dir) / "release_audit_comparison.png"
    plt.savefig(output_path, dpi=300, bbox_inches='tight')
    print(f"\nSaved comparison chart to: {output_path}")
    
    plt.show()
else:
    print("Baseline/optimized metrics not available in audit bundle")

In [None]:
# Student Model Status
if audit_metrics and 'student_model' in audit_metrics:
    student = audit_metrics['student_model']
    
    print("\n" + "="*70)
    print("STUDENT MODEL VALIDATION STATUS")
    print("="*70)
    
    print(f"\nDeployed: {'YES' if student.get('deployed') else 'NO'}")
    if student.get('deployed'):
        print(f"Version: {student.get('version', 'unknown')}")
        print(f"Validation Precision: {student.get('validation_precision', 0.0):.2%}")
        print(f"Validation Recall: {student.get('validation_recall', 0.0):.2%}")
        print(f"Test Accuracy: {student.get('test_accuracy', 0.0):.2%}")
        print(f"Feature Count: {student.get('feature_count', 0)}")
        print(f"Training Samples: {student.get('training_samples', 0):,}")
else:
    print("\nNo student model metrics in audit bundle")

In [None]:
# Monitoring KPIs
if audit_metrics and 'monitoring' in audit_metrics:
    monitoring = audit_metrics['monitoring']
    
    print("\n" + "="*70)
    print("MONITORING KPIs (Rolling Windows)")
    print("="*70)
    
    if 'intraday_30day' in monitoring:
        intra = monitoring['intraday_30day']
        print("\nIntraday Strategy (30-day window):")
        print(f"   Hit Ratio: {intra.get('hit_ratio', 0.0):.2%}")
        print(f"   Sharpe Ratio: {intra.get('sharpe_ratio', 0.0):.2f}")
        print(f"   Alert Count: {intra.get('alert_count', 0)}")
        print(f"   Degradation: {'YES' if intra.get('degradation_detected') else 'NO'}")
    
    if 'swing_90day' in monitoring:
        swing = monitoring['swing_90day']
        print("\nSwing Strategy (90-day window):")
        print(f"   Precision (LONG): {swing.get('precision_long', 0.0):.2%}")
        print(f"   Recall (LONG): {swing.get('recall_long', 0.0):.2%}")
        print(f"   Max Drawdown: {swing.get('max_drawdown_pct', 0.0):.1f}%")
        print(f"   Alert Count: {swing.get('alert_count', 0)}")
        print(f"   Degradation: {'YES' if swing.get('degradation_detected') else 'NO'}")
else:
    print("\nNo monitoring metrics in audit bundle")

### Release Audit Recommendations

Based on the consolidated audit metrics:

1. **If Deployment Ready**: Proceed with gradual rollout as per deployment plan
2. **If Risk Flags Present**: Address each flag before production deployment
3. **Monitor KPIs**: Continue tracking rolling window metrics for early degradation detection
4. **Schedule Next Audit**: Plan monthly audits to maintain release readiness

For full audit details, review:
- `release/audit_<timestamp>/summary.md` - Executive summary
- `release/audit_<timestamp>/metrics.json` - Complete metrics
- `release/audit_<timestamp>/plots/` - All visualizations

## Conclusion

This notebook provides a comprehensive analysis of prediction accuracy and financial performance. Use these insights to:

- Identify model weaknesses and areas for improvement
- Compare performance across different time periods or parameter settings
- Make data-driven decisions about strategy modifications
- Track improvements over time as you iterate on the trading system

For questions or issues, refer to the SenseQuant documentation or accuracy analyzer source code.