# Ball Knower v1.2 Model Evaluation

Reproducible evaluation notebook for v1.2 model performance.

## Purpose
- Load backtest results from standardized paths
- Compute v1.2 metrics (MAE, ATS, CLV, EV)
- Visualize performance trends
- Generate calibration curves

## Important
- This notebook is for visualization and reporting only
- No model logic or feature engineering
- Uses the evaluation module for metrics

In [None]:
# Standard imports
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# Import Ball Knower modules
from ball_knower import config
from ball_knower.utils import paths

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
%matplotlib inline

print("✓ Imports loaded successfully")

## 1. Load Backtest Results

In [None]:
# Define backtest parameters
MODEL_VERSION = "v1.2"
START_SEASON = 2019
END_SEASON = 2024

# Load backtest results using standardized paths
backtest_path = paths.get_backtest_results_path(
    MODEL_VERSION,
    START_SEASON,
    END_SEASON
)

if backtest_path.exists():
    backtest_df = pd.read_csv(backtest_path)
    print(f"✓ Loaded backtest results: {backtest_path}")
    print(f"  Seasons: {len(backtest_df)} rows")
    display(backtest_df.head())
else:
    print(f"⚠ Backtest file not found: {backtest_path}")
    print(f"  Run: python src/bk_build.py backtest --model {MODEL_VERSION} --start-season {START_SEASON} --end-season {END_SEASON}")
    backtest_df = None

## 2. Compute Summary Metrics

In [None]:
if backtest_df is not None:
    # Overall metrics
    print("\n" + "="*60)
    print("V1.2 MODEL - OVERALL PERFORMANCE")
    print("="*60)
    
    total_games = backtest_df['n_games'].sum()
    avg_mae = backtest_df['mae_vs_vegas'].mean()
    avg_rmse = backtest_df['rmse_vs_vegas'].mean()
    
    print(f"\nTotal games analyzed: {total_games:,}")
    print(f"Average MAE vs Vegas: {avg_mae:.2f} points")
    print(f"Average RMSE vs Vegas: {avg_rmse:.2f} points")
    print(f"Average edge: {backtest_df['mean_edge'].mean():.3f} points")
    
    # Per-season breakdown
    print("\n" + "-"*60)
    print("PER-SEASON BREAKDOWN")
    print("-"*60)
    display(backtest_df[['season', 'n_games', 'mae_vs_vegas', 'rmse_vs_vegas', 'mean_edge']])

## 3. Visualizations

### 3.1 MAE vs Vegas by Season

In [None]:
if backtest_df is not None:
    fig, ax = plt.subplots(figsize=(10, 6))
    
    ax.plot(backtest_df['season'], backtest_df['mae_vs_vegas'], 
            marker='o', linewidth=2, markersize=8)
    ax.axhline(y=avg_mae, color='r', linestyle='--', label=f'Average MAE: {avg_mae:.2f}')
    
    ax.set_xlabel('Season', fontsize=12)
    ax.set_ylabel('MAE vs Vegas (points)', fontsize=12)
    ax.set_title('Ball Knower v1.2 - MAE vs Vegas by Season', fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3)
    ax.legend()
    
    plt.tight_layout()
    plt.show()
    
    print(f"Average MAE across all seasons: {avg_mae:.2f} points")

### 3.2 Games Analyzed by Season

In [None]:
if backtest_df is not None:
    fig, ax = plt.subplots(figsize=(10, 6))
    
    ax.bar(backtest_df['season'], backtest_df['n_games'], alpha=0.7, edgecolor='black')
    
    ax.set_xlabel('Season', fontsize=12)
    ax.set_ylabel('Number of Games', fontsize=12)
    ax.set_title('Ball Knower v1.2 - Games Analyzed by Season', fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.show()

### 3.3 Edge Distribution

In [None]:
if backtest_df is not None:
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Histogram of mean edge by season
    ax.hist(backtest_df['mean_edge'], bins=20, alpha=0.7, edgecolor='black')
    ax.axvline(x=0, color='r', linestyle='--', linewidth=2, label='Zero edge')
    
    ax.set_xlabel('Mean Edge (points)', fontsize=12)
    ax.set_ylabel('Frequency', fontsize=12)
    ax.set_title('Ball Knower v1.2 - Distribution of Mean Edge by Season', fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3, axis='y')
    ax.legend()
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nEdge Statistics:")
    print(f"  Mean: {backtest_df['mean_edge'].mean():.3f} points")
    print(f"  Std: {backtest_df['mean_edge'].std():.3f} points")
    print(f"  Min: {backtest_df['mean_edge'].min():.3f} points")
    print(f"  Max: {backtest_df['mean_edge'].max():.3f} points")

## 4. Model Performance Summary

In [None]:
if backtest_df is not None:
    print("\n" + "="*60)
    print("BALL KNOWER V1.2 - EVALUATION SUMMARY")
    print("="*60)
    print(f"\nModel Version: {MODEL_VERSION}")
    print(f"Evaluation Period: {START_SEASON}-{END_SEASON}")
    print(f"Total Games: {total_games:,}")
    print(f"\nKey Metrics:")
    print(f"  MAE vs Vegas: {avg_mae:.2f} points")
    print(f"  RMSE vs Vegas: {avg_rmse:.2f} points")
    print(f"  Mean Edge: {backtest_df['mean_edge'].mean():.3f} points")
    print(f"\nInterpretation:")
    print(f"  - MAE measures average prediction error")
    print(f"  - Lower MAE indicates better accuracy")
    print(f"  - Edge represents how much BK differs from Vegas")
    print(f"  - Near-zero mean edge suggests Vegas efficiency")
    print("\n" + "="*60)
else:
    print("⚠ No backtest data available for evaluation")

## 5. Next Steps

To improve this evaluation:
1. Add game-level predictions (not just season aggregates)
2. Compute ATS (against the spread) win rate
3. Calculate CLV (closing line value)
4. Add expected value (EV) calculations
5. Create calibration curves for probability predictions

For now, this notebook provides:
- Basic performance metrics
- Temporal trends
- Visual diagnostics