## Custom-Rank Gradient Boosting ‚Äî Notebook Driver (Overview)

This notebook serves as the top-level driver for running the **custom rank-guided Gradient Boosting search**.  
It wraps around `gradientboosting_custom_rank.py`, which performs the actual modeling work:

### **What this driver does**
- Locates the Dynasty project root (must contain both `src/` and `data/Bakery/`).
- Loads project utilities and the main training function `run_seed_for_subsets()`.
- Defines the experiment configuration:
  - NFL position (`RB`, `WR`, `TE`, `QB`)
  - Seeds to run
  - Number of random feature subsets to explore
  - Hyperparameter limits for:
    - Maximum base features  
    - Number of RandomizedSearchCV iterations  
    - Interaction rules (strong / weak / none)
  - Any **must-use** or **banned** features or interactions.
- Loads the ranked dataset used as supervision:
  - File: `data/Rankings/ranked_by_position/master_list_with_ranks_{pos}.csv`
  - Target is `Train_Target = -Rank_raw`
- Calls `run_seed_for_subsets()` which:
  - Samples feature subsets
  - Builds interaction features
  - Runs cross-validated Gradient Boosting
  - Evaluates on held-out test data
  - Saves:
    - Leaderboards  
    - Best model  
    - Predictions  
    - SHAP contributions  
    - Metadata JSON  
- After all runs finish, aggregates a summary of performance across seeds and subset sizes.

### **Outputs**
All generated artifacts are saved under: `data/Training/_derived/{pos}`

***QB***

In [None]:
# === Custom-Rank Gradient Boosting ‚Äî Notebook Driver ===

from pathlib import Path
import sys
import pandas as pd
import numpy as np
import pickle
import json
import warnings

from sklearn.exceptions import ConvergenceWarning

warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)

# ---------------------------------------------------------------------
# Locate Dynasty repo root (must contain BOTH src/ and data/Bakery)
# ---------------------------------------------------------------------
def find_repo_root(start: Path) -> Path:
    for p in [start, *start.parents]:
        if (p / "src" / "models").exists() and (p / "data" / "Bakery").exists():
            return p
    raise FileNotFoundError(
        "Could not locate the Dynasty repo root (needs both 'src/models' and 'data/Bakery')."
    )

REPO_ROOT = find_repo_root(Path.cwd())
print("‚úÖ REPO_ROOT:", REPO_ROOT)

# Make sure we can import from src/
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

# ---------------------------------------------------------------------
# Imports from your project
#   Uses gradientboosting_custom_rank.py, which:
#   - reads master_list_with_ranks_{pos}.csv
#   - uses Rank only as supervision (Train_Target = -Rank_raw)
# ---------------------------------------------------------------------
from src.models.gradientboosting_custom_rank import run_seed_for_subsets
from src.utils import default_out_dir  # still used for _derived/{pos} outputs

# ---------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------
position = "QB"            # RB / WR / TE / QB
seeds = [21,34,56]
subset_grid = [10,20,40]

# Hyperparameter grids
max_base_grid = [15]
n_iter_grid   = [15]

# Optional constraints (leave empty if none)
must_feats  = ["PDOM+","RDOM+", "Draft Capital", "Comp%"]      # e.g. ["DOM+", "YPC"]
ban_feats   = ["Conference Rank", "Draft Age", "DOM+", 
               "ELU","Drop%","Break%","10+","FUM"
               ]
must_inters = []                           # e.g. ["SpeedxBMI"]
ban_inters  = []                                      # e.g. ["Wide%xSlot%"]
hierarchy   = "none"                                  # "strong" | "weak" | "none"

# ---------------------------------------------------------------------
# Confirm ranked CSV location (training source)
#   data/Rankings/ranked_by_position/master_list_with_ranks_{pos}.csv
# ---------------------------------------------------------------------
ranked_dir = REPO_ROOT / "data" / "Rankings" / "ranked_by_position"
ranked_csv_path = ranked_dir / f"master_list_with_ranks_{position}.csv"

print("Ranked CSV path:", ranked_csv_path)
assert ranked_csv_path.exists(), f"Ranked CSV not found at {ranked_csv_path}"

df_ranked = pd.read_csv(ranked_csv_path)
print(f"\nLoaded ranked CSV for position={position}:")
print(f"Shape: {df_ranked.shape}")
print("Columns:", df_ranked.columns.tolist())
print("First 5 rows:")
display(df_ranked.head())

# ---------------------------------------------------------------------
# Run wide search over subsets / seeds
#   Inside run_seed_for_subsets:
#     - target = Train_Target = -Rank_raw
#     - model outputs are later rescaled to a 0‚Äì15 score
# ---------------------------------------------------------------------

all_runs = []
for n in subset_grid:
    for max_base in max_base_grid:
        for n_iter in n_iter_grid:
            print(f"\nüîπ Running n_subsets = {n}, "
                  f"max_base_feats = {max_base}, n_iter_per_model = {n_iter}")
            try:
                res = run_seed_for_subsets(
                    position=position,
                    project_root=REPO_ROOT,
                    n_subsets=int(n),
                    seeds=seeds,
                    max_base_feats=int(max_base),
                    max_interactions=3,
                    n_iter_per_model=int(n_iter),
                    cv_folds=5,
                    test_size=0.20,
                    must_feats=must_feats,
                    ban_feats=ban_feats,
                    must_inters=must_inters,
                    ban_inters=ban_inters,
                    interaction_hierarchy=hierarchy,
                    draft_cap_cap=0.10,          # adjust as needed
                    draft_cap_lower_q=0.05,
                    draft_cap_upper_q=0.95,
                    draft_cap_importance_cap=0.1,
                    breakout_age_importance_cap=0.1,
                    draft_age_importance_cap=None
                )

                # Tag the results with the hyperparameters
                res = res.copy()
                res["n_subsets"] = int(n)
                res["max_base_feats"] = int(max_base)
                res["n_iter_per_model"] = int(n_iter)

                all_runs.append(res)

            except UnboundLocalError as e:
                print(f"‚ö†Ô∏è Warning: UnboundLocalError for "
                      f"n={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue
            except Exception as e:
                print(f"‚ö†Ô∏è Error running subsets={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue

# ---------------------------------------------------------------------
# Aggregate summary
# ---------------------------------------------------------------------
if all_runs:
    summary = pd.concat(all_runs, ignore_index=True)
else:
    summary = pd.DataFrame()
    print("‚ö†Ô∏è No successful runs completed; summary is an empty DataFrame.")

# Save summary under REPO_ROOT/data/Bakery/_derived/<POS>/
out_dir = default_out_dir(REPO_ROOT, position)
out_dir.mkdir(parents=True, exist_ok=True)

summary_path = out_dir / f"{position.lower()}_runtime_accuracy_summary.csv"
summary.to_csv(summary_path, index=False)

print("\n‚úÖ Custom-rank run complete!")
print("Summary CSV:", summary_path)
print("üíæ Trained models saved as .pkl files in:", out_dir)

# ---------------------------------------------------------------------
# Show available files
# ---------------------------------------------------------------------
model_files = list(out_dir.glob("*.pkl"))
json_files = list(out_dir.glob("*_wide_best_meta_*.json"))

if model_files:
    print("\nüìÅ Saved model files:")
    for model_file in sorted(model_files):
        print(f"  - {model_file.name}")
else:
    print("\n‚ö†Ô∏è No .pkl model files found yet.")

if json_files:
    print("\nüìÑ JSON metadata files with SHAP analysis:")
    for json_file in sorted(json_files):
        print(f"  - {json_file.name}")
        # Quick preview of SHAP data (if present)
        try:
            with open(json_file, 'r') as f:
                metadata = json.load(f)
            shap_info = metadata.get("shap_analysis", {})
            if "base_importance_sum" in shap_info:
                print(f"    üí° SHAP: Base importance = {shap_info['base_importance_sum']:.4f}, "
                      f"Interaction = {shap_info['interaction_importance_sum']:.4f}")
                top5 = shap_info.get("top_5_features") or []
                if top5:
                    top_feat, top_val = top5[0]
                    print(f"    ‚≠ê Top feature: {top_feat} ({top_val:.4f})")
        except Exception as e:
            print(f"    (Could not read SHAP metadata: {e})")
else:
    print("\n‚ö†Ô∏è No JSON metadata files found yet.")

summary.head()


‚úÖ REPO_ROOT: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty
Ranked CSV path: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_QB.csv

Loaded ranked CSV for position=QB:
Shape: (118, 78)
Columns: ['player_name', 'pos', 'rank', 'birth_date', 'team', 'conf', 'draft_year', 'Draft Capital', 'breakout_year', 'Breakout Age', 'DOM', 'DOM+', 'PDOM', 'PDOM+', 'RDOM', 'RDOM+', 'ADJ%', 'aimed_passes', 'attempts', 'aDOT', 'ATT', 'MTF', 'big_time_throws', 'breakaway_attempts', 'Break%', 'breakaway_yards', 'BTT%', 'REC%', 'Comp%', 'completions', 'CC%', 'contested_receptions', 'contested_targets', 'def_gen_pressures', 'designed_yards', 'Drop%', 'dropbacks', 'drops', 'elu_recv_mtf', 'elu_rush_mtf', 'elu_yco', 'ELU', '10+', 'FUM', 'gap_attempts', 'hit_as_threw', 'interceptions', 'passing_snaps', 'pressure_to_sack_rate', 'qb_rating', 'rec_yards', 'receptions', 'route_rate', 'routes', 'run_plays', 'sack_percent', 'sacks', 'scramble_yards'

Unnamed: 0,player_name,pos,rank,birth_date,team,conf,draft_year,Draft Capital,breakout_year,Breakout Age,DOM,DOM+,PDOM,PDOM+,RDOM,RDOM+,ADJ%,aimed_passes,attempts,aDOT,ATT,MTF,big_time_throws,breakaway_attempts,Break%,breakaway_yards,BTT%,REC%,Comp%,completions,CC%,contested_receptions,contested_targets,def_gen_pressures,designed_yards,Drop%,dropbacks,drops,elu_recv_mtf,elu_rush_mtf,elu_yco,ELU,10+,FUM,gap_attempts,hit_as_threw,interceptions,passing_snaps,pressure_to_sack_rate,qb_rating,rec_yards,receptions,route_rate,routes,run_plays,sack_percent,sacks,scramble_yards,scrambles,Slot%,slot_snaps,targets,total_touches,turnover_worthy_plays,TWP%,Wide%,wide_snaps,yards_after_catch,YAC/R,yards_after_contact,Y/REC,YCO/A,ypa,Y/RR,zone_attempts,Speed,BMI,SpeedxBMI
0,Jameis Winston,QB,13.0,1994-01-06,Florida State,ACC,2015,1,2013.0,19.652,0.0,0.0,0.941,0.894,0.072,0.069,72.7,450.0,467.0,8.8,2.74,8.0,20.0,2.0,23.5,48.0,4.2,,65.3,305.0,,,,148.0,45.0,6.7,508.0,22.0,0.0,5.0,32.0,27.8,7.0,9.0,0.0,2.0,18.0,530.0,12.2,93.1,0.0,0.0,,0.0,365.0,3.5,18.0,159.0,23.0,,,0.0,24.0,18.0,3.4,,,,,152.0,,3.23,8.3,,0.0,4.97,28.115,139.73155
1,Marcus Mariota,QB,9.0,1993-10-30,Oregon,Pac-12,2015,1,2012.0,18.839,0.007,0.007,0.96,0.912,0.248,0.235,76.7,434.0,444.0,9.8,2.83,15.0,26.0,12.0,36.3,342.0,5.5,,68.2,303.0,,,,134.0,617.0,9.0,513.0,30.0,1.0,12.0,179.0,46.8,33.0,8.0,7.0,0.0,4.0,540.0,23.9,128.5,26.0,1.0,,1.0,505.0,6.2,32.0,325.0,37.0,,,1.0,71.0,14.0,2.6,,,,,339.0,,3.17,10.0,26.0,1.0,4.52,27.02,122.1304
2,Garrett Grayson,QB,99.0,1991-05-29,Colorado State,MWC,2015,3,2013.0,22.261,0.009,0.007,0.962,0.731,0.018,0.014,75.4,395.0,423.0,10.1,2.71,2.0,26.0,0.0,0.0,0.0,5.9,,64.5,273.0,,,,140.0,-21.0,8.4,476.0,25.0,0.0,1.0,3.0,1.4,4.0,8.0,0.0,2.0,7.0,497.0,20.0,113.6,39.0,1.0,,1.0,329.0,5.9,28.0,146.0,25.0,,,1.0,15.0,18.0,3.6,,,,,14.0,,0.36,9.5,39.0,0.0,4.75,27.345,129.88875
3,Sean Mannion,QB,97.0,1992-04-25,Oregon State,Pac-12,2015,3,2011.0,19.351,0.0,0.0,0.981,0.932,-0.065,-0.062,71.3,436.0,456.0,8.9,2.76,0.0,27.0,0.0,0.0,0.0,5.6,,61.8,282.0,,,,147.0,-27.0,9.3,499.0,29.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,3.0,8.0,527.0,24.5,86.2,0.0,0.0,,0.0,334.0,7.2,36.0,37.0,7.0,,,0.0,16.0,16.0,3.0,,,,,2.0,,0.09,6.9,,0.0,5.14,26.461,136.00954
4,Bryce Petty,QB,81.0,1991-05-31,Baylor,Big 12,2015,4,2013.0,22.256,0.0,0.0,0.847,0.805,0.134,0.127,72.9,410.0,430.0,12.0,2.38,5.0,26.0,3.0,21.2,53.0,5.8,,62.8,270.0,,,,97.0,151.0,9.7,470.0,29.0,0.0,3.0,52.0,6.8,5.0,7.0,9.0,2.0,7.0,495.0,22.7,107.5,0.0,0.0,,0.0,425.0,4.7,22.0,99.0,18.0,,,0.0,48.0,12.0,2.4,,,,,89.0,,1.35,9.0,,0.0,4.87,28.745,139.98815



üîπ Running n_subsets = 10, max_base_feats = 15, n_iter_per_model = 15

Position          : QB
CSV path          : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_QB.csv
Output directory  : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/QB
Hierarchy         : none
Must feats        : ['PDOM+', 'RDOM+', 'Draft Capital', 'Comp%']
Ban feats         : ['Conference Rank', 'Draft Age', 'DOM+', 'ELU', 'Drop%', 'Break%', '10+', 'FUM']
Must inters       : []
Ban inters        : []
n_subsets         : 10
max_base_feats    : 15
n_iter_per_model  : 15
DraftCap limiter  : cap=0.1, lower_q=0.05, upper_q=0.95

[QB] Rows after filtering: 118 | Feature cols: 21
üíæ Saved model: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/QB/qb_model_seed21_subs10_base15_iter15.pkl
üìÑ Saved metadata with SHAP results: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/QB

Unnamed: 0,position,seed,n_subsets,max_base_feats,n_iter_per_model,best_test_R2,best_test_MAE,best_test_RMSE,runtime_sec,leaderboard_csv,predictions_csv,metadata_json,model_pickle,best_model_tag,best_bases,best_interactions
0,QB,21,10,15,15,0.734617,21.13659,28.468268,19.793282,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,PDOM+|RDOM+|Draft Capital|Comp%|BMI|Y/RR|BTT%|...,SpeedxBMI
1,QB,34,10,15,15,0.790161,14.453672,18.717587,17.830963,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,PDOM+|RDOM+|Draft Capital|Comp%|BTT%|ADJ%|Brea...,
2,QB,56,10,15,15,0.818338,14.918386,20.213047,17.176747,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,PDOM+|RDOM+|Draft Capital|Comp%|ADJ%|YCO/A|BMI...,SpeedxBMI
3,QB,21,20,15,15,0.757043,19.873673,27.238845,34.085847,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,PDOM+|RDOM+|Draft Capital|Comp%|YCO/A|Breakout...,
4,QB,34,20,15,15,0.790161,14.453672,18.717587,34.493618,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,PDOM+|RDOM+|Draft Capital|Comp%|BTT%|ADJ%|Brea...,


***RB***

In [2]:
# === Custom-Rank Gradient Boosting ‚Äî Notebook Driver ===

from pathlib import Path
import sys
import pandas as pd
import numpy as np
import pickle
import json
import warnings

from sklearn.exceptions import ConvergenceWarning

warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)

# ---------------------------------------------------------------------
# Locate Dynasty repo root (must contain BOTH src/ and data/Bakery)
# ---------------------------------------------------------------------
def find_repo_root(start: Path) -> Path:
    for p in [start, *start.parents]:
        if (p / "src" / "models").exists() and (p / "data" / "Bakery").exists():
            return p
    raise FileNotFoundError(
        "Could not locate the Dynasty repo root (needs both 'src/models' and 'data/Bakery')."
    )

REPO_ROOT = find_repo_root(Path.cwd())
print("‚úÖ REPO_ROOT:", REPO_ROOT)

# Make sure we can import from src/
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

# ---------------------------------------------------------------------
# Imports from your project
#   Uses gradientboosting_custom_rank.py, which:
#   - reads master_list_with_ranks_{pos}.csv
#   - uses Rank only as supervision (Train_Target = -Rank_raw)
#   - derives a 0‚Äì15 score from model predictions internally
# ---------------------------------------------------------------------
from src.models.gradientboosting_custom_rank import run_seed_for_subsets
from src.utils import default_out_dir  # still used for _derived/{pos} outputs

# ---------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------
position = "RB"            # RB / WR / TE / QB
seeds = [21,34,56]
subset_grid = [10,20,40]

# Hyperparameter grids
max_base_grid = [15]
n_iter_grid   = [15]

# Optional constraints (leave empty if none)
must_feats  = ["DOM+","RDOM+", "Draft Capital", "ELU"]      # e.g. ["DOM+", "YPC"]
ban_feats   = ["Conference Rank", "Draft Age","PDOM+"]
must_inters = ["SpeedxBMI"]                           # e.g. ["SpeedxBMI"]
ban_inters  = []                                      # e.g. ["Wide%xSlot%"]
hierarchy   = "none"                                  # "strong" | "weak" | "none"

# ---------------------------------------------------------------------
# Confirm ranked CSV location (training source)
#   data/Rankings/ranked_by_position/master_list_with_ranks_{pos}.csv
# ---------------------------------------------------------------------
ranked_dir = REPO_ROOT / "data" / "Rankings" / "ranked_by_position"
ranked_csv_path = ranked_dir / f"master_list_with_ranks_{position}.csv"

print("Ranked CSV path:", ranked_csv_path)
assert ranked_csv_path.exists(), f"Ranked CSV not found at {ranked_csv_path}"

df_ranked = pd.read_csv(ranked_csv_path)
print(f"\nLoaded ranked CSV for position={position}:")
print(f"Shape: {df_ranked.shape}")
print("Columns:", df_ranked.columns.tolist())
print("First 5 rows:")
display(df_ranked.head())

# ---------------------------------------------------------------------
# Run wide search over subsets / seeds
#   Inside run_seed_for_subsets:
#     - target = Train_Target = -Rank_raw
#     - model outputs are later rescaled to a 0‚Äì15 score
# ---------------------------------------------------------------------

all_runs = []
for n in subset_grid:
    for max_base in max_base_grid:
        for n_iter in n_iter_grid:
            print(f"\nüîπ Running n_subsets = {n}, "
                  f"max_base_feats = {max_base}, n_iter_per_model = {n_iter}")
            try:
                res = run_seed_for_subsets(
                    position=position,
                    project_root=REPO_ROOT,
                    n_subsets=int(n),
                    seeds=seeds,
                    max_base_feats=int(max_base),
                    max_interactions=3,
                    n_iter_per_model=int(n_iter),
                    cv_folds=15,
                    test_size=0.20,
                    must_feats=must_feats,
                    ban_feats=ban_feats,
                    must_inters=must_inters,
                    ban_inters=ban_inters,
                    interaction_hierarchy=hierarchy,
                    draft_cap_cap=0.10,          # adjust as needed
                    draft_cap_lower_q=0.05,
                    draft_cap_upper_q=0.95,
                    draft_cap_importance_cap=0.1,
                    breakout_age_importance_cap=0.1,
                    draft_age_importance_cap=None
                )

                # Tag the results with the hyperparameters for easier analysis
                res = res.copy()
                res["n_subsets"] = int(n)
                res["max_base_feats"] = int(max_base)
                res["n_iter_per_model"] = int(n_iter)

                all_runs.append(res)

            except UnboundLocalError as e:
                print(f"‚ö†Ô∏è Warning: UnboundLocalError for "
                      f"n={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue
            except Exception as e:
                print(f"‚ö†Ô∏è Error running subsets={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue

# ---------------------------------------------------------------------
# Aggregate summary
# ---------------------------------------------------------------------
if all_runs:
    summary = pd.concat(all_runs, ignore_index=True)
else:
    summary = pd.DataFrame()
    print("‚ö†Ô∏è No successful runs completed; summary is an empty DataFrame.")

# Save summary under REPO_ROOT/data/Bakery/_derived/<POS>/
out_dir = default_out_dir(REPO_ROOT, position)
out_dir.mkdir(parents=True, exist_ok=True)

summary_path = out_dir / f"{position.lower()}_runtime_accuracy_summary.csv"
summary.to_csv(summary_path, index=False)

print("\n‚úÖ Custom-rank run complete!")
print("Summary CSV:", summary_path)
print("üíæ Trained models saved as .pkl files in:", out_dir)

# ---------------------------------------------------------------------
# Show available files
# ---------------------------------------------------------------------
model_files = list(out_dir.glob("*.pkl"))
json_files = list(out_dir.glob("*_wide_best_meta_*.json"))

if model_files:
    print("\nüìÅ Saved model files:")
    for model_file in sorted(model_files):
        print(f"  - {model_file.name}")
else:
    print("\n‚ö†Ô∏è No .pkl model files found yet.")

if json_files:
    print("\nüìÑ JSON metadata files with SHAP analysis:")
    for json_file in sorted(json_files):
        print(f"  - {json_file.name}")
        # Quick preview of SHAP data (if present)
        try:
            with open(json_file, 'r') as f:
                metadata = json.load(f)
            shap_info = metadata.get("shap_analysis", {})
            if "base_importance_sum" in shap_info:
                print(f"    üí° SHAP: Base importance = {shap_info['base_importance_sum']:.4f}, "
                      f"Interaction = {shap_info['interaction_importance_sum']:.4f}")
                top5 = shap_info.get("top_5_features") or []
                if top5:
                    top_feat, top_val = top5[0]
                    print(f"    ‚≠ê Top feature: {top_feat} ({top_val:.4f})")
        except Exception as e:
            print(f"    (Could not read SHAP metadata: {e})")
else:
    print("\n‚ö†Ô∏è No JSON metadata files found yet.")

summary.head()


‚úÖ REPO_ROOT: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty
Ranked CSV path: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_RB.csv

Loaded ranked CSV for position=RB:
Shape: (212, 78)
Columns: ['player_name', 'pos', 'rank', 'birth_date', 'team', 'conf', 'draft_year', 'Draft Capital', 'breakout_year', 'Breakout Age', 'DOM', 'DOM+', 'PDOM', 'PDOM+', 'RDOM', 'RDOM+', 'ADJ%', 'aimed_passes', 'attempts', 'aDOT', 'ATT', 'MTF', 'big_time_throws', 'breakaway_attempts', 'Break%', 'breakaway_yards', 'BTT%', 'REC%', 'Comp%', 'completions', 'CC%', 'contested_receptions', 'contested_targets', 'def_gen_pressures', 'designed_yards', 'Drop%', 'dropbacks', 'drops', 'elu_recv_mtf', 'elu_rush_mtf', 'elu_yco', 'ELU', '10+', 'FUM', 'gap_attempts', 'hit_as_threw', 'interceptions', 'passing_snaps', 'pressure_to_sack_rate', 'qb_rating', 'rec_yards', 'receptions', 'route_rate', 'routes', 'run_plays', 'sack_percent', 'sacks', 'scramble_yards'

Unnamed: 0,player_name,pos,rank,birth_date,team,conf,draft_year,Draft Capital,breakout_year,Breakout Age,DOM,DOM+,PDOM,PDOM+,RDOM,RDOM+,ADJ%,aimed_passes,attempts,aDOT,ATT,MTF,big_time_throws,breakaway_attempts,Break%,breakaway_yards,BTT%,REC%,Comp%,completions,CC%,contested_receptions,contested_targets,def_gen_pressures,designed_yards,Drop%,dropbacks,drops,elu_recv_mtf,elu_rush_mtf,elu_yco,ELU,10+,FUM,gap_attempts,hit_as_threw,interceptions,passing_snaps,pressure_to_sack_rate,qb_rating,rec_yards,receptions,route_rate,routes,run_plays,sack_percent,sacks,scramble_yards,scrambles,Slot%,slot_snaps,targets,total_touches,turnover_worthy_plays,TWP%,Wide%,wide_snaps,yards_after_catch,YAC/R,yards_after_contact,Y/REC,YCO/A,ypa,Y/RR,zone_attempts,Speed,BMI,SpeedxBMI
0,Todd Gurley,RB,2.0,1994-08-03,Georgia,SEC,2015,1,2012.0,18.081,0.013,0.013,0.005,0.005,0.395,0.395,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.52,29.286,132.37272
1,Melvin Gordon,RB,15.0,1993-04-13,Wisconsin,Big Ten,2015,1,2013.0,20.386,0.137,0.137,0.0,0.0,0.604,0.604,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.52,28.363,128.20076
2,Ameer Abdullah,RB,78.0,1993-06-13,Nebraska,Big Ten,2015,2,2012.0,19.22,0.111,0.111,0.0,0.0,0.529,0.529,,,263.0,,,57.0,,25.0,45.9,741.0,,,,,,,,,1614.0,,,0.0,9.0,57.0,810.0,71.3,45.0,2.0,45.0,,,,,,269.0,22.0,,227.0,366.0,,,0.0,0.0,,,32.0,285.0,,,,,,,810.0,,3.08,6.1,1.19,0.0,4.6,30.27,139.242
3,Tevin Coleman,RB,42.0,1993-04-16,Indiana,Big Ten,2015,3,2013.0,20.378,0.042,0.042,0.0,0.0,0.58,0.58,,,270.0,,,50.0,,29.0,57.0,1156.0,,,,,,,,,2027.0,,,2.0,4.0,50.0,1086.0,73.6,46.0,5.0,68.0,,,,,,141.0,25.0,,161.0,310.0,,,0.0,0.0,,,31.0,295.0,,,,,,,1086.0,,4.02,7.5,0.88,0.0,4.4,28.728,126.4032
4,Duke Johnson,RB,59.0,1993-09-23,Miami (FL),ACC,2015,3,2012.0,18.94,0.122,0.116,0.0,0.0,0.615,0.585,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.54,30.565,138.7651



üîπ Running n_subsets = 10, max_base_feats = 15, n_iter_per_model = 15

Position          : RB
CSV path          : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_RB.csv
Output directory  : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/RB
Hierarchy         : none
Must feats        : ['DOM+', 'RDOM+', 'Draft Capital', 'ELU']
Ban feats         : ['Conference Rank', 'Draft Age', 'PDOM+']
Must inters       : ['SpeedxBMI']
Ban inters        : []
n_subsets         : 10
max_base_feats    : 15
n_iter_per_model  : 15
DraftCap limiter  : cap=0.1, lower_q=0.05, upper_q=0.95

[RB] Rows after filtering: 212 | Feature cols: 14
üíæ Saved model: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/RB/rb_model_seed21_subs10_base15_iter15.pkl
üìÑ Saved metadata with SHAP results: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/RB/rb_wide_best_meta_seed21_subs1

Unnamed: 0,position,seed,n_subsets,max_base_feats,n_iter_per_model,best_test_R2,best_test_MAE,best_test_RMSE,runtime_sec,leaderboard_csv,predictions_csv,metadata_json,model_pickle,best_model_tag,best_bases,best_interactions
0,RB,21,10,15,15,0.544992,46.930073,58.928715,70.556659,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|RDOM+|Draft Capital|ELU|YCO/A|Breakout Ag...,SpeedxBMI
1,RB,34,10,15,15,0.509393,39.947175,52.716707,71.108591,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|RDOM+|Draft Capital|ELU|10+|Break%|FUM|MT...,SpeedxBMI
2,RB,56,10,15,15,0.537565,39.497374,49.598888,71.43229,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|RDOM+|Draft Capital|ELU|MTF|FUM|10+|YCO/A...,SpeedxBMI
3,RB,21,20,15,15,0.544992,46.930073,58.928715,119.278899,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|RDOM+|Draft Capital|ELU|YCO/A|Breakout Ag...,SpeedxBMI
4,RB,34,20,15,15,0.509393,39.947175,52.716707,145.004356,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|RDOM+|Draft Capital|ELU|10+|Break%|FUM|MT...,SpeedxBMI


***WR***

In [7]:
# === Custom-Rank Gradient Boosting ‚Äî Notebook Driver ===

from pathlib import Path
import sys
import pandas as pd
import numpy as np
import pickle
import json
import warnings

from sklearn.exceptions import ConvergenceWarning

warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)

# ---------------------------------------------------------------------
# Locate Dynasty repo root (must contain BOTH src/ and data/Bakery)
# ---------------------------------------------------------------------
def find_repo_root(start: Path) -> Path:
    for p in [start, *start.parents]:
        if (p / "src" / "models").exists() and (p / "data" / "Bakery").exists():
            return p
    raise FileNotFoundError(
        "Could not locate the Dynasty repo root (needs both 'src/models' and 'data/Bakery')."
    )

REPO_ROOT = find_repo_root(Path.cwd())
print("‚úÖ REPO_ROOT:", REPO_ROOT)

# Make sure we can import from src/
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

# ---------------------------------------------------------------------
# Imports from your project
#   Uses gradientboosting_custom_rank.py, which:
#   - reads master_list_with_ranks_{pos}.csv
#   - uses Rank only as supervision (Train_Target = -Rank_raw)
#   - derives a 0‚Äì15 score from model predictions internally
# ---------------------------------------------------------------------
from src.models.gradientboosting_custom_rank import run_seed_for_subsets
from src.utils import default_out_dir  # still used for _derived/{pos} outputs

# ---------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------
position = "WR"            # RB / WR / TE / QB
seeds = [21,34,56]
subset_grid = [10,20,40]

# Hyperparameter grids
max_base_grid = [15]
n_iter_grid   = [15]

# Optional constraints (leave empty if none)
must_feats  = ["DOM+", "Draft Capital"]      # e.g. ["DOM+", "YPC"]
ban_feats   = ["Conference Rank", "Draft Age","RDOM+","PDOM+","Wide%","Slot%","ELU"]
must_inters = ["SpeedxBMI"]                           # e.g. ["SpeedxBMI"]
ban_inters  = ["Wide%xSlot%"]                                      # e.g. ["Wide%xSlot%"]
hierarchy   = "none"                                  # "strong" | "weak" | "none"

# ---------------------------------------------------------------------
# Confirm ranked CSV location (training source)
#   data/Rankings/ranked_by_position/master_list_with_ranks_{pos}.csv
# ---------------------------------------------------------------------
ranked_dir = REPO_ROOT / "data" / "Rankings" / "ranked_by_position"
ranked_csv_path = ranked_dir / f"master_list_with_ranks_{position}.csv"

print("Ranked CSV path:", ranked_csv_path)
assert ranked_csv_path.exists(), f"Ranked CSV not found at {ranked_csv_path}"

df_ranked = pd.read_csv(ranked_csv_path)
print(f"\nLoaded ranked CSV for position={position}:")
print(f"Shape: {df_ranked.shape}")
print("Columns:", df_ranked.columns.tolist())
print("First 5 rows:")
display(df_ranked.head())

# ---------------------------------------------------------------------
# Run wide search over subsets / seeds
#   Inside run_seed_for_subsets:
#     - target = Train_Target = -Rank_raw
#     - model outputs are later rescaled to a 0‚Äì15 score
# ---------------------------------------------------------------------

all_runs = []
for n in subset_grid:
    for max_base in max_base_grid:
        for n_iter in n_iter_grid:
            print(f"\nüîπ Running n_subsets = {n}, "
                  f"max_base_feats = {max_base}, n_iter_per_model = {n_iter}")
            try:
                res = run_seed_for_subsets(
                    position=position,
                    project_root=REPO_ROOT,
                    n_subsets=int(n),
                    seeds=seeds,
                    max_base_feats=int(max_base),
                    max_interactions=3,
                    n_iter_per_model=int(n_iter),
                    cv_folds=15,
                    test_size=0.20,
                    must_feats=must_feats,
                    ban_feats=ban_feats,
                    must_inters=must_inters,
                    ban_inters=ban_inters,
                    interaction_hierarchy=hierarchy,
                    draft_cap_cap=0.10,          # adjust as needed
                    draft_cap_lower_q=0.05,
                    draft_cap_upper_q=0.95,
                    draft_cap_importance_cap=0.1,
                    breakout_age_importance_cap=0.1,
                    draft_age_importance_cap=None
                )

                # Tag the results with the hyperparameters for easier analysis
                res = res.copy()
                res["n_subsets"] = int(n)
                res["max_base_feats"] = int(max_base)
                res["n_iter_per_model"] = int(n_iter)

                all_runs.append(res)

            except UnboundLocalError as e:
                print(f"‚ö†Ô∏è Warning: UnboundLocalError for "
                      f"n={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue
            except Exception as e:
                print(f"‚ö†Ô∏è Error running subsets={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue

# ---------------------------------------------------------------------
# Aggregate summary
# ---------------------------------------------------------------------
if all_runs:
    summary = pd.concat(all_runs, ignore_index=True)
else:
    summary = pd.DataFrame()
    print("‚ö†Ô∏è No successful runs completed; summary is an empty DataFrame.")

# Save summary under REPO_ROOT/data/Bakery/_derived/<POS>/
out_dir = default_out_dir(REPO_ROOT, position)
out_dir.mkdir(parents=True, exist_ok=True)

summary_path = out_dir / f"{position.lower()}_runtime_accuracy_summary.csv"
summary.to_csv(summary_path, index=False)

print("\n‚úÖ Custom-rank run complete!")
print("Summary CSV:", summary_path)
print("üíæ Trained models saved as .pkl files in:", out_dir)

# ---------------------------------------------------------------------
# Show available files
# ---------------------------------------------------------------------
model_files = list(out_dir.glob("*.pkl"))
json_files = list(out_dir.glob("*_wide_best_meta_*.json"))

if model_files:
    print("\nüìÅ Saved model files:")
    for model_file in sorted(model_files):
        print(f"  - {model_file.name}")
else:
    print("\n‚ö†Ô∏è No .pkl model files found yet.")

if json_files:
    print("\nüìÑ JSON metadata files with SHAP analysis:")
    for json_file in sorted(json_files):
        print(f"  - {json_file.name}")
        # Quick preview of SHAP data (if present)
        try:
            with open(json_file, 'r') as f:
                metadata = json.load(f)
            shap_info = metadata.get("shap_analysis", {})
            if "base_importance_sum" in shap_info:
                print(f"    üí° SHAP: Base importance = {shap_info['base_importance_sum']:.4f}, "
                      f"Interaction = {shap_info['interaction_importance_sum']:.4f}")
                top5 = shap_info.get("top_5_features") or []
                if top5:
                    top_feat, top_val = top5[0]
                    print(f"    ‚≠ê Top feature: {top_feat} ({top_val:.4f})")
        except Exception as e:
            print(f"    (Could not read SHAP metadata: {e})")
else:
    print("\n‚ö†Ô∏è No JSON metadata files found yet.")

summary.head()


‚úÖ REPO_ROOT: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty
Ranked CSV path: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_WR.csv

Loaded ranked CSV for position=WR:
Shape: (332, 78)
Columns: ['player_name', 'pos', 'rank', 'birth_date', 'team', 'conf', 'draft_year', 'Draft Capital', 'breakout_year', 'Breakout Age', 'DOM', 'DOM+', 'PDOM', 'PDOM+', 'RDOM', 'RDOM+', 'ADJ%', 'aimed_passes', 'attempts', 'aDOT', 'ATT', 'MTF', 'big_time_throws', 'breakaway_attempts', 'Break%', 'breakaway_yards', 'BTT%', 'REC%', 'Comp%', 'completions', 'CC%', 'contested_receptions', 'contested_targets', 'def_gen_pressures', 'designed_yards', 'Drop%', 'dropbacks', 'drops', 'elu_recv_mtf', 'elu_rush_mtf', 'elu_yco', 'ELU', '10+', 'FUM', 'gap_attempts', 'hit_as_threw', 'interceptions', 'passing_snaps', 'pressure_to_sack_rate', 'qb_rating', 'rec_yards', 'receptions', 'route_rate', 'routes', 'run_plays', 'sack_percent', 'sacks', 'scramble_yards'

Unnamed: 0,player_name,pos,rank,birth_date,team,conf,draft_year,Draft Capital,breakout_year,Breakout Age,DOM,DOM+,PDOM,PDOM+,RDOM,RDOM+,ADJ%,aimed_passes,attempts,aDOT,ATT,MTF,big_time_throws,breakaway_attempts,Break%,breakaway_yards,BTT%,REC%,Comp%,completions,CC%,contested_receptions,contested_targets,def_gen_pressures,designed_yards,Drop%,dropbacks,drops,elu_recv_mtf,elu_rush_mtf,elu_yco,ELU,10+,FUM,gap_attempts,hit_as_threw,interceptions,passing_snaps,pressure_to_sack_rate,qb_rating,rec_yards,receptions,route_rate,routes,run_plays,sack_percent,sacks,scramble_yards,scrambles,Slot%,slot_snaps,targets,total_touches,turnover_worthy_plays,TWP%,Wide%,wide_snaps,yards_after_catch,YAC/R,yards_after_contact,Y/REC,YCO/A,ypa,Y/RR,zone_attempts,Speed,BMI,SpeedxBMI
0,Mike Williams,WR,32.0,1980-01-11,Nebraska,Big Ten,2005,1,2000.0,20.641,0.371,0.371,0.002,0.002,0.002,0.002,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.53,29.154,132.06762
1,Mike Thomas,WR,334.0,1987-06-04,Arizona,Pac-10,2009,4,2005.0,18.245,0.275,0.165,0.0,0.0,0.104,0.062,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.3,29.646,127.4778
2,Mike Williams,WR,32.0,1980-01-11,Nebraska,Big Ten,2010,4,2000.0,20.641,0.371,0.371,0.002,0.002,0.002,0.002,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.53,29.154,132.06762
3,Kyle Williams,WR,82.0,1983-06-10,Washington State,Ind,2010,6,,,0.393,0.236,0.036,0.021,0.026,0.015,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.4,26.972,118.6768
4,Amari Cooper,WR,5.0,1994-06-17,Alabama,SEC,2015,1,2012.0,18.209,0.472,0.472,0.0,0.0,0.004,0.004,,,,9.9,,26.0,,,,,,71.3,,,,0.0,0.0,,,6.1,,8.0,,,,69.5,,0.0,,,2.0,,,,,124.0,97.1,433.0,,,,,,19.5,87.0,174.0,,,,79.6,355.0,878.0,7.1,,13.9,,,3.99,,4.42,27.835,123.0307



üîπ Running n_subsets = 10, max_base_feats = 15, n_iter_per_model = 15

Position          : WR
CSV path          : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_WR.csv
Output directory  : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/WR
Hierarchy         : none
Must feats        : ['DOM+', 'Draft Capital']
Ban feats         : ['Conference Rank', 'Draft Age', 'RDOM+', 'PDOM+', 'Wide%', 'Slot%', 'ELU']
Must inters       : ['SpeedxBMI']
Ban inters        : ['Wide%xSlot%']
n_subsets         : 10
max_base_feats    : 15
n_iter_per_model  : 15
DraftCap limiter  : cap=0.1, lower_q=0.05, upper_q=0.95

[WR] Rows after filtering: 332 | Feature cols: 19
üíæ Saved model: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/WR/wr_model_seed21_subs10_base15_iter15.pkl
üìÑ Saved metadata with SHAP results: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/WR

Unnamed: 0,position,seed,n_subsets,max_base_feats,n_iter_per_model,best_test_R2,best_test_MAE,best_test_RMSE,runtime_sec,leaderboard_csv,predictions_csv,metadata_json,model_pickle,best_model_tag,best_bases,best_interactions
0,WR,21,10,15,15,0.679551,64.818845,81.002542,82.19519,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|Y/REC|YAC/R|Drop%|CC%|REC%|...,SpeedxBMI
1,WR,34,10,15,15,0.785378,53.68051,73.481785,67.398777,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|REC%|CC%|Y/REC|YAC/R|Drop%|...,SpeedxBMI
2,WR,56,10,15,15,0.750752,54.326516,74.669429,90.675533,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|Breakout Age|Y/RR|CC%|FUM|Y...,SpeedxBMI
3,WR,21,20,15,15,0.679551,64.818845,81.002542,158.257297,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|YAC/R|FUM|REC%|aDOT|Breakou...,SpeedxBMI
4,WR,34,20,15,15,0.785378,53.68051,73.481785,160.375904,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|REC%|CC%|Y/REC|YAC/R|Drop%|...,SpeedxBMI


***TE***

In [6]:
# === Custom-Rank Gradient Boosting ‚Äî Notebook Driver ===

from pathlib import Path
import sys
import pandas as pd
import numpy as np
import pickle
import json
import warnings

from sklearn.exceptions import ConvergenceWarning

warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)

# ---------------------------------------------------------------------
# Locate Dynasty repo root (must contain BOTH src/ and data/Bakery)
# ---------------------------------------------------------------------
def find_repo_root(start: Path) -> Path:
    for p in [start, *start.parents]:
        if (p / "src" / "models").exists() and (p / "data" / "Bakery").exists():
            return p
    raise FileNotFoundError(
        "Could not locate the Dynasty repo root (needs both 'src/models' and 'data/Bakery')."
    )

REPO_ROOT = find_repo_root(Path.cwd())
print("‚úÖ REPO_ROOT:", REPO_ROOT)

# Make sure we can import from src/
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

# ---------------------------------------------------------------------
# Imports from your project
#   Uses gradientboosting_custom_rank.py, which:
#   - reads master_list_with_ranks_{pos}.csv
#   - uses Rank only as supervision (Train_Target = -Rank_raw)
#   - derives a 0‚Äì15 score from model predictions internally
# ---------------------------------------------------------------------
from src.models.gradientboosting_custom_rank import run_seed_for_subsets
from src.utils import default_out_dir  # still used for _derived/{pos} outputs

# ---------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------
position = "TE"            # RB / WR / TE / QB
seeds = [21,34,56]
subset_grid = [10,20,40]

# Hyperparameter grids
max_base_grid = [15]
n_iter_grid   = [15]

# Constraints
must_feats  = ["DOM+", "Draft Capital"]      # e.g. ["DOM+", "YPC"]
ban_feats   = ["Conference Rank", "Draft Age","RDOM+","PDOM+","Wide%","Slot%","ELU"]
must_inters = ["SpeedxBMI"]                           # e.g. ["SpeedxBMI"]
ban_inters  = ["Wide%xSlot%"]                                      # e.g. ["Wide%xSlot%"]
hierarchy   = "none"                                  # "strong" | "weak" | "none"

# ---------------------------------------------------------------------
# Confirm ranked CSV location (training source)
#   data/Rankings/ranked_by_position/master_list_with_ranks_{pos}.csv
# ---------------------------------------------------------------------
ranked_dir = REPO_ROOT / "data" / "Rankings" / "ranked_by_position"
ranked_csv_path = ranked_dir / f"master_list_with_ranks_{position}.csv"

print("Ranked CSV path:", ranked_csv_path)
assert ranked_csv_path.exists(), f"Ranked CSV not found at {ranked_csv_path}"

df_ranked = pd.read_csv(ranked_csv_path)
print(f"\nLoaded ranked CSV for position={position}:")
print(f"Shape: {df_ranked.shape}")
print("Columns:", df_ranked.columns.tolist())
print("First 5 rows:")
display(df_ranked.head())

# ---------------------------------------------------------------------
# Run wide search over subsets / seeds
#   Inside run_seed_for_subsets:
#     - target = Train_Target = -Rank_raw
#     - model outputs are later rescaled to a 0‚Äì15 score
# ---------------------------------------------------------------------

all_runs = []
for n in subset_grid:
    for max_base in max_base_grid:
        for n_iter in n_iter_grid:
            print(f"\nüîπ Running n_subsets = {n}, "
                  f"max_base_feats = {max_base}, n_iter_per_model = {n_iter}")
            try:
                res = run_seed_for_subsets(
                    position=position,
                    project_root=REPO_ROOT,
                    n_subsets=int(n),
                    seeds=seeds,
                    max_base_feats=int(max_base),
                    max_interactions=3,
                    n_iter_per_model=int(n_iter),
                    cv_folds=15,
                    test_size=0.20,
                    must_feats=must_feats,
                    ban_feats=ban_feats,
                    must_inters=must_inters,
                    ban_inters=ban_inters,
                    interaction_hierarchy=hierarchy,
                    draft_cap_cap=0.10,          # adjust as needed
                    draft_cap_lower_q=0.05,
                    draft_cap_upper_q=0.95,
                    draft_cap_importance_cap=0.1,
                    breakout_age_importance_cap=0.1,
                    draft_age_importance_cap=None
                )

                # Tag the results with the hyperparameters for easier analysis
                res = res.copy()
                res["n_subsets"] = int(n)
                res["max_base_feats"] = int(max_base)
                res["n_iter_per_model"] = int(n_iter)

                all_runs.append(res)

            except UnboundLocalError as e:
                print(f"‚ö†Ô∏è Warning: UnboundLocalError for "
                      f"n={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue
            except Exception as e:
                print(f"‚ö†Ô∏è Error running subsets={n}, max_base={max_base}, n_iter={n_iter}: {e}. Skipping this run.")
                continue

# ---------------------------------------------------------------------
# Aggregate summary
# ---------------------------------------------------------------------
if all_runs:
    summary = pd.concat(all_runs, ignore_index=True)
else:
    summary = pd.DataFrame()
    print("‚ö†Ô∏è No successful runs completed; summary is an empty DataFrame.")

# Save summary under REPO_ROOT/data/Bakery/_derived/<POS>/
out_dir = default_out_dir(REPO_ROOT, position)
out_dir.mkdir(parents=True, exist_ok=True)

summary_path = out_dir / f"{position.lower()}_runtime_accuracy_summary.csv"
summary.to_csv(summary_path, index=False)

print("\n‚úÖ Custom-rank run complete!")
print("Summary CSV:", summary_path)
print("üíæ Trained models saved as .pkl files in:", out_dir)

# ---------------------------------------------------------------------
# Show available files
# ---------------------------------------------------------------------
model_files = list(out_dir.glob("*.pkl"))
json_files = list(out_dir.glob("*_wide_best_meta_*.json"))

if model_files:
    print("\nüìÅ Saved model files:")
    for model_file in sorted(model_files):
        print(f"  - {model_file.name}")
else:
    print("\n‚ö†Ô∏è No .pkl model files found yet.")

if json_files:
    print("\nüìÑ JSON metadata files with SHAP analysis:")
    for json_file in sorted(json_files):
        print(f"  - {json_file.name}")
        # Quick preview of SHAP data (if present)
        try:
            with open(json_file, 'r') as f:
                metadata = json.load(f)
            shap_info = metadata.get("shap_analysis", {})
            if "base_importance_sum" in shap_info:
                print(f"    üí° SHAP: Base importance = {shap_info['base_importance_sum']:.4f}, "
                      f"Interaction = {shap_info['interaction_importance_sum']:.4f}")
                top5 = shap_info.get("top_5_features") or []
                if top5:
                    top_feat, top_val = top5[0]
                    print(f"    ‚≠ê Top feature: {top_feat} ({top_val:.4f})")
        except Exception as e:
            print(f"    (Could not read SHAP metadata: {e})")
else:
    print("\n‚ö†Ô∏è No JSON metadata files found yet.")

summary.head()


‚úÖ REPO_ROOT: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty
Ranked CSV path: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_TE.csv

Loaded ranked CSV for position=TE:
Shape: (144, 78)
Columns: ['player_name', 'pos', 'rank', 'birth_date', 'team', 'conf', 'draft_year', 'Draft Capital', 'breakout_year', 'Breakout Age', 'DOM', 'DOM+', 'PDOM', 'PDOM+', 'RDOM', 'RDOM+', 'ADJ%', 'aimed_passes', 'attempts', 'aDOT', 'ATT', 'MTF', 'big_time_throws', 'breakaway_attempts', 'Break%', 'breakaway_yards', 'BTT%', 'REC%', 'Comp%', 'completions', 'CC%', 'contested_receptions', 'contested_targets', 'def_gen_pressures', 'designed_yards', 'Drop%', 'dropbacks', 'drops', 'elu_recv_mtf', 'elu_rush_mtf', 'elu_yco', 'ELU', '10+', 'FUM', 'gap_attempts', 'hit_as_threw', 'interceptions', 'passing_snaps', 'pressure_to_sack_rate', 'qb_rating', 'rec_yards', 'receptions', 'route_rate', 'routes', 'run_plays', 'sack_percent', 'sacks', 'scramble_yards'

Unnamed: 0,player_name,pos,rank,birth_date,team,conf,draft_year,Draft Capital,breakout_year,Breakout Age,DOM,DOM+,PDOM,PDOM+,RDOM,RDOM+,ADJ%,aimed_passes,attempts,aDOT,ATT,MTF,big_time_throws,breakaway_attempts,Break%,breakaway_yards,BTT%,REC%,Comp%,completions,CC%,contested_receptions,contested_targets,def_gen_pressures,designed_yards,Drop%,dropbacks,drops,elu_recv_mtf,elu_rush_mtf,elu_yco,ELU,10+,FUM,gap_attempts,hit_as_threw,interceptions,passing_snaps,pressure_to_sack_rate,qb_rating,rec_yards,receptions,route_rate,routes,run_plays,sack_percent,sacks,scramble_yards,scrambles,Slot%,slot_snaps,targets,total_touches,turnover_worthy_plays,TWP%,Wide%,wide_snaps,yards_after_catch,YAC/R,yards_after_contact,Y/REC,YCO/A,ypa,Y/RR,zone_attempts,Speed,BMI,SpeedxBMI
0,Maxx Williams,TE,10.0,1994-04-12,Minnesota,Big Ten,2015,2,2013.0,19.389,0.488,0.488,0.0,0.0,0.0,0.0,,,,11.8,,4.0,,,,,,56.3,,,,0.0,0.0,,,7.7,,3.0,,,,,,0.0,,,2.0,,,,,36.0,94.0,205.0,,,,,,39.9,87.0,64.0,,,,5.0,11.0,302.0,8.4,,15.8,,,2.78,,4.78,30.306,144.86268
1,Clive Walford,TE,45.0,1991-10-01,Miami (FL),ACC,2015,3,2014.0,22.919,0.239,0.227,0.0,0.0,0.0,0.0,,,,9.5,,8.0,,,,,,77.2,,,,0.0,0.0,,,4.3,,2.0,,,,,,1.0,,,3.0,,,,,44.0,80.8,206.0,,,,,,41.2,105.0,57.0,,,,4.7,12.0,295.0,6.7,,15.3,,,3.28,,4.79,30.549,146.32971
2,Tyler Kroft,TE,109.0,1992-10-15,Rutgers,Big Ten,2015,3,,,0.184,0.184,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.75,29.168,138.548
3,Jeff Heuerman,TE,124.0,1992-11-24,Ohio State,Big Ten,2015,3,,,0.135,0.135,0.0,0.0,0.001,0.001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.81,30.117,144.86277
4,Blake Bell,TE,174.0,1991-08-07,Oklahoma,Big 12,2015,4,2011.0,20.068,0.158,0.15,0.0,0.0,0.001,0.001,,,,11.8,,1.0,,,,,,55.2,,,,0.0,0.0,,,11.1,,2.0,,,,8.3,,0.0,,,3.0,,,,,16.0,79.6,257.0,,,,,,44.3,143.0,29.0,,,,0.0,0.0,76.0,4.8,,13.4,,,0.83,,4.8,29.118,139.7664



üîπ Running n_subsets = 10, max_base_feats = 15, n_iter_per_model = 15

Position          : TE
CSV path          : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Rankings/ranked_by_position/master_list_with_ranks_TE.csv
Output directory  : /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/TE
Hierarchy         : none
Must feats        : ['DOM+', 'Draft Capital']
Ban feats         : ['Conference Rank', 'Draft Age', 'RDOM+', 'PDOM+', 'Wide%', 'Slot%', 'ELU']
Must inters       : ['SpeedxBMI']
Ban inters        : ['Wide%xSlot%']
n_subsets         : 10
max_base_feats    : 15
n_iter_per_model  : 15
DraftCap limiter  : cap=0.1, lower_q=0.05, upper_q=0.95

[TE] Rows after filtering: 144 | Feature cols: 19
üíæ Saved model: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/TE/te_model_seed21_subs10_base15_iter15.pkl
üìÑ Saved metadata with SHAP results: /Users/chasesiegel/Desktop/Comp_Sci/Capstone/Dynasty/data/Training/_derived/TE

Unnamed: 0,position,seed,n_subsets,max_base_feats,n_iter_per_model,best_test_R2,best_test_MAE,best_test_RMSE,runtime_sec,leaderboard_csv,predictions_csv,metadata_json,model_pickle,best_model_tag,best_bases,best_interactions
0,TE,21,10,15,15,0.728786,22.083745,28.332641,51.876865,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|Y/REC|Drop%|CC%|aDOT|Y/RR|R...,SpeedxBMI
1,TE,34,10,15,15,0.627565,28.634647,36.048779,54.765969,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|REC%|CC%|Y/REC|YAC/R|Drop%|...,SpeedxBMI
2,TE,56,10,15,15,0.601757,31.810377,41.218913,48.122377,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|Y/RR|aDOT|Y/REC|CC%|MTF,SpeedxBMI
3,TE,21,20,15,15,0.728786,22.083745,28.332641,92.629151,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|Y/REC|Drop%|CC%|aDOT|Y/RR|R...,SpeedxBMI
4,TE,34,20,15,15,0.627565,28.634647,36.048779,95.182849,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,/Users/chasesiegel/Desktop/Comp_Sci/Capstone/D...,GB,DOM+|Draft Capital|REC%|CC%|Y/REC|YAC/R|Drop%|...,SpeedxBMI
