# Ball Knower - NFL Spread Prediction System

## Quick Start Demo

This notebook demonstrates the Ball Knower system with your Week 11, 2025 data.

### System Overview:
- **v1.0**: Deterministic baseline using EPA + ratings + HFA
- **v1.1**: Enhanced with structural features (rest, form, QB)
- **v1.2**: ML correction layer on top of v1.1

### Key Features:
- ✅ **Leak-free**: All rolling features use only past data
- ✅ **Team normalization**: Handles all data source variations
- ✅ **Modular**: Clean separation of concerns
- ✅ **Tested**: All data loaders validated with actual files

## Section 1: Setup & Configuration

Import all modules and display configuration.

In [None]:
# Standard imports
import sys
from pathlib import Path
import pandas as pd
import numpy as np

# Add src to path (for local execution)
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# Ball Knower modules
from ball_knower import config
from src import team_mapping, data_loader, models

# Display pandas tables nicely
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print(config.get_config_summary())

##Section 2: Load Current Week Data

Load all nfelo and Substack data for Week 11, 2025.

In [None]:
# Load all current week data
data = data_loader.load_all_current_week_data()

In [None]:
# Inspect nfelo power ratings
print("\nnfelo Power Ratings (Top 10 Teams):")
display(data['nfelo_power'][['team', 'nfelo', 'QB Adj', 'Value']].head(10))

In [None]:
# Inspect EPA tiers
print("\nnfelo EPA Tiers (Top 10 Offenses):")
display(data['nfelo_epa'][['team', 'epa_off', 'epa_def', 'epa_margin']]
        .sort_values('epa_off', ascending=False).head(10))

In [None]:
# Inspect Substack power ratings
print("\nSubstack Power Ratings (Top 10 Teams):")
display(data['substack_power'][['team', 'Off.', 'Def.', 'Ovr.']].head(10))

In [None]:
# Inspect weekly matchups
print("\nWeek 11 Matchups:")
display(data['substack_weekly'][['team_away', 'team_home', 'substack_spread_line', 'Win Prob.']])

## Section 3: Merge and Prepare Team Ratings

Combine all ratings into a single DataFrame.

In [None]:
# Merge all team ratings
team_ratings = data_loader.merge_current_week_ratings()

print(f"\nMerged Team Ratings: {len(team_ratings)} teams")
print(f"Features: {list(team_ratings.columns)}\n")

# Show top 10 teams by nfelo
display(team_ratings.sort_values('nfelo', ascending=False).head(10))

## Section 4: v1.0 - Deterministic Baseline Predictions

Simple deterministic model based on EPA differential + power ratings + HFA.

In [None]:
# Create v1.0 model
model_v1 = models.DeterministicSpreadModel(hfa=config.HOME_FIELD_ADVANTAGE)

# Get weekly matchups
matchups = data['substack_weekly'].copy()

# Add team ratings for home and away teams
matchups = matchups.merge(
    team_ratings,
    left_on='team_home',
    right_on='team',
    how='left',
    suffixes=('', '_home')
).merge(
    team_ratings,
    left_on='team_away',
    right_on='team',
    how='left',
    suffixes=('_home', '_away')
)

print(f"\nPrepared {len(matchups)} matchups for prediction")

In [None]:
# Make predictions with v1.0 model
predictions_v1 = []

for idx, game in matchups.iterrows():
    home_features = {
        'nfelo': game.get('nfelo_home'),
        'epa_margin': game.get('epa_margin_home'),
        'Ovr.': game.get('Ovr._home')
    }
    
    away_features = {
        'nfelo': game.get('nfelo_away'),
        'epa_margin': game.get('epa_margin_away'),
        'Ovr.': game.get('Ovr._away')
    }
    
    pred_spread = model_v1.predict(home_features, away_features)
    
    predictions_v1.append({
        'away_team': game['team_away'],
        'home_team': game['team_home'],
        'vegas_line': game['substack_spread_line'],
        'bk_v1_line': round(pred_spread, 1),
        'edge': round(pred_spread - game['substack_spread_line'], 1)
    })

predictions_df = pd.DataFrame(predictions_v1)

# Display predictions
print("\n" + "="*70)
print("BALL KNOWER v1.0 - WEEK 11 PREDICTIONS")
print("="*70 + "\n")

display(predictions_df.sort_values('edge', key=abs, ascending=False))

print("\n" + "="*70)
print("SPREAD CONVENTION: Negative = Home Favored, Positive = Home Underdog")
print("EDGE = BK Prediction - Vegas Line")
print("="*70)

## Section 5: Identify Value Bets

Find games where the model disagrees with Vegas by at least 0.5 points.

In [None]:
# Filter for games with meaningful edge
value_bets = predictions_df[predictions_df['edge'].abs() >= config.MIN_BET_EDGE].copy()

# Add recommendation
value_bets['recommendation'] = value_bets.apply(
    lambda row: f"Bet {row['home_team']} (value on home)" if row['edge'] < 0 
                else f"Bet {row['away_team']} (value on away)",
    axis=1
)

print("\n" + "="*70)
print(f"VALUE BETS (Edge >= {config.MIN_BET_EDGE} pts)")
print("="*70 + "\n")

if len(value_bets) > 0:
    display(value_bets[['away_team', 'home_team', 'vegas_line', 'bk_v1_line', 'edge', 'recommendation']]
            .sort_values('edge', key=abs, ascending=False))
else:
    print("No value bets found with current threshold.")

## Next Steps

### To build a complete backtest system:

1. **Load historical data** (2015-2024) using `nfl_data_py`
2. **Engineer features** using the `features` module (leak-free rolling EPA)
3. **Train v1.2 model** with ML correction layer
4. **Backtest** across multiple seasons with time-series validation
5. **Analyze ROI** by edge bin to optimize bet sizing

### To run weekly:

1. Download latest nfelo/Substack CSVs for current week
2. Update `CURRENT_WEEK` in `src/config.py`
3. Replace files in `data/current_season/`
4. Re-run this notebook
5. Compare predictions to Bovada lines

### Code Structure:

- `src/team_mapping.py` - Team name normalization
- `src/config.py` - All configuration (paths, parameters)
- `src/data_loader.py` - Load nfelo, Substack, nfl_data_py
- `src/features.py` - Leak-free feature engineering
- `src/models.py` - v1.0, v1.1, v1.2 models + backtest functions

All code is modular, tested, and documented!