# CFP Resume Analysis - Team Sheets

This notebook provides committee-style resume analysis for all teams, focusing on:
- Strength of Record (SOR) - quality of wins given schedule
- Strength of Schedule (SOS) - difficulty of opponents faced
- Quality Wins - wins vs top-tier opponents
- Bad Losses - losses to lower-ranked teams
- Conference championship status
- Resume vs Predictive comparison

This separates **what teams have accomplished** (resume) from **how good they are** (predictive power).

In [20]:
# Cell 1: Setup and Imports
import sys
import os
import pandas as pd
import numpy as np
import json
from pathlib import Path

# Add src to path
sys.path.insert(0, os.path.abspath('..'))

from src.utils.metrics import (
    calculate_sor,
    calculate_sos,
    calculate_quality_wins,
    identify_bad_losses,
    build_resume_dataframe,
    compare_resume_vs_predictive
)

# Create output directories
output_dir = Path('./data/output')
rankings_dir = output_dir / 'rankings'
exports_dir = output_dir / 'exports'
rankings_dir.mkdir(parents=True, exist_ok=True)
exports_dir.mkdir(parents=True, exist_ok=True)

print("✅ Imports loaded successfully")
print(f"Output directory: {output_dir}")

✅ Imports loaded successfully
Output directory: data/output


In [21]:
# Cell 2: Load Previous Results
year = 2025
week = 15
# Load composite rankings from previous notebook
final_rankings = pd.read_csv(rankings_dir / f'composite_rankings_{year}_week{week}.csv')

# Load games data
games_df = pd.read_csv(Path(f'./data/cache/{year}') / f'games_w{week}.csv')

print(f'Loaded rankings for {len(final_rankings)} teams')
print(f'Loaded {len(games_df)} games')
print(f'\nTop 5 teams:')
print(final_rankings[['rank', 'team', 'wins', 'losses', 'composite_score']].head())

Loaded rankings for 136 teams
Loaded 752 games

Top 5 teams:
   rank        team  wins  losses  composite_score
0     1  Ohio State    11       0         0.976192
1     2     Indiana    11       0         0.967141
2     3         BYU    10       1         0.892947
3     4      Oregon    10       1         0.864353
4     5  Texas Tech    10       1         0.849449


---

## Strength of Record (SOR)

**Definition:** The probability that an average Top-25 team would achieve at least this team's record against this exact schedule.

**Why it matters:** SOR evaluates **how impressive** a team's record is given who they played. A 10-2 team with a brutal schedule can have better SOR than an 11-1 team with weak opponents.

**Calculation:** Uses game-by-game win probabilities for an average Top-25 team (baseline rating ~0.75) against each opponent, then calculates the probability of achieving the observed number of wins.

In [22]:
# Cell 3: Calculate Strength of Record (SOR)
from scipy.stats import norm
from sklearn.preprocessing import MinMaxScaler

# Build opponent ratings dictionary from PREDICTIVE scores (not composite!)
# Use predictive rankings to represent team strength for SOR calculation
scaler = MinMaxScaler()

# Option 1: Use the predictive_score from loaded rankings
if 'predictive_score' in final_rankings.columns:
    # Normalize predictive scores to 0-1
    final_rankings['normalized_predictive'] = scaler.fit_transform(
        final_rankings[['predictive_score']]
    )
    opponent_ratings = dict(zip(
        final_rankings['team'],
        final_rankings['normalized_predictive']
    ))
    print("✅ Using predictive_score for opponent ratings")

# Option 2: Calculate from Massey + Elo if predictive_score not available
else:
    # Combine Massey and Elo
    massey_dict = dict(zip(massey_rankings['team'], massey_rankings['rating']))
    elo_dict = dict(zip(elo_rankings['team'], elo_rankings['rating']))
    
    # Average them and normalize
    combined_ratings = {}
    for team in final_rankings['team']:
        massey_norm = scaler.fit_transform([[massey_dict.get(team, 0)]])[0][0]
        elo_norm = scaler.fit_transform([[elo_dict.get(team, 0)]])[0][0]
        combined_ratings[team] = (massey_norm + elo_norm) / 2
    
    opponent_ratings = combined_ratings
    print("✅ Using Massey + Elo for opponent ratings")

print(f"Opponent ratings range: {min(opponent_ratings.values()):.4f} to {max(opponent_ratings.values()):.4f}")

# Calculate SOR for each team
sor_scores = {}

for _, team_row in final_rankings.iterrows():
    team = team_row['team']
    
    # Get team's games
    team_games = games_df[
        (games_df['home_team'] == team) | (games_df['away_team'] == team)
    ]
    
    # Build opponent ratings list
    opp_ratings_list = []
    for _, game in team_games.iterrows():
        opponent = game['away_team'] if game['home_team'] == team else game['home_team']
        opp_rating = opponent_ratings.get(opponent, 0.5)
        opp_ratings_list.append(opp_rating)
    
    # Calculate SOR
    team_record = {
        'wins': int(team_row['wins']),
        'losses': int(team_row['losses'])
    }
    
    sor_score = calculate_sor(
        team_record=team_record,
        opponent_ratings=opp_ratings_list,
        baseline_rating=0.75,
        rating_scale=0.25
    )
    
    sor_scores[team] = sor_score

# Rank SOR scores (higher SOR score = better record given schedule)
sor_series = pd.Series(sor_scores)
sor_ranks = sor_series.rank(method='min', ascending=False).to_dict()

print("\n✅ SOR calculated for all teams")
print(f"SOR range: {sor_series.min():.4f} to {sor_series.max():.4f}")

print("\nTop 10 Teams by SOR (best records given schedule):")
top_sor = sor_series.sort_values(ascending=False).head(10)
for team, score in top_sor.items():
    rank = int(sor_ranks[team])
    wins = final_rankings[final_rankings['team'] == team]['wins'].values[0]
    losses = final_rankings[final_rankings['team'] == team]['losses'].values[0]
    print(f"  {rank:2d}. {team:<25} {wins}-{losses}  SOR={score:.4f}")

print("\nBottom 10 Teams by SOR (worst records given schedule):")
bottom_sor = sor_series.sort_values().head(10)
for team, score in bottom_sor.items():
    rank = int(sor_ranks[team])
    wins = final_rankings[final_rankings['team'] == team]['wins'].values[0]
    losses = final_rankings[final_rankings['team'] == team]['losses'].values[0]
    print(f"  {rank:2d}. {team:<25} {wins}-{losses}  SOR={score:.4f}")

✅ Using predictive_score for opponent ratings
Opponent ratings range: 0.0000 to 1.0000

✅ SOR calculated for all teams
SOR range: -0.0000 to 1.3045

Top 10 Teams by SOR (best records given schedule):
   1. Indiana                   11-0  SOR=1.3045
   2. Ohio State                11-0  SOR=1.1105
   3. BYU                       10-1  SOR=0.9398
   4. Oregon                    10-1  SOR=0.8235
   5. Texas A&M                 10-1  SOR=0.6852
   6. Georgia                   10-1  SOR=0.6323
   7. Ole Miss                  10-1  SOR=0.5892
   8. Texas Tech                10-1  SOR=0.5673
   9. Oklahoma                  9-2  SOR=0.4791
  10. Alabama                   9-2  SOR=0.4772

Bottom 10 Teams by SOR (worst records given schedule):
  133. Massachusetts             0-11  SOR=-0.0000
  133. Charlotte                 0-11  SOR=-0.0000
  133. Georgia State             0-11  SOR=-0.0000
  133. Oklahoma State            0-11  SOR=-0.0000
  132. UTEP                      1-10  SOR=0.0000
  

---

## Strength of Schedule (SOS)

**Definition:** Average quality of opponents faced, accounting for both:
1. Opponents' records (direct strength)
2. Opponents' opponents' records (OOR - prevents inflated opponent records)

**Why it matters:** Prevents rewarding teams whose opponents only looked good because they beat weak teams.

**Calculation:** Weighted combination of opponents' win percentage (66.7%) and opponents' opponents' win percentage (33.3%).

In [23]:
# Cell 4: Calculate Strength of Schedule (SOS) using Win Percentages
# Formula: SOS = (2*OR + OOR)/3 where OR and OOR are win percentages
def calculate_team_sos(team, games_df, include_oor=True, oor_weight=0.33):
    """
    Calculate Strength of Schedule using metrics module.
    Includes opponent's opponents' records (OOR) to prevent inflated schedules.
    Formula: SOS = (2*OR + OOR)/3 where OR and OOR are win percentages
    """
    team_games = games_df[(games_df['home_team'] == team) | (games_df['away_team'] == team)]
    
    opponents = []
    for _, game in team_games.iterrows():
        opponent = game['away_team'] if game['home_team'] == team else game['home_team']
        opponents.append(opponent)
    
    # Get opponent records
    opponents_records = []
    opponents_opp_records = []
    
    for opp in opponents:
        opp_games = games_df[(games_df['home_team'] == opp) | (games_df['away_team'] == opp)]
        
        opp_wins = 0
        opp_losses = 0
        
        # Get opponent's record - EXCLUDE the game against the original team
        # This prevents circular dependency (team beating opponent makes opponent look worse)
        for _, g in opp_games.iterrows():
            # Skip the game between this team and the opponent
            if (g['home_team'] == team and g['away_team'] == opp) or \
               (g['home_team'] == opp and g['away_team'] == team):
                continue
            
            if g['home_team'] == opp:
                if g['home_score'] > g['away_score']:
                    opp_wins += 1
                else:
                    opp_losses += 1
            else:
                if g['away_score'] > g['home_score']:
                    opp_wins += 1
                else:
                    opp_losses += 1
        
        opponents_records.append((opp_wins, opp_losses))
        
        # Get opponent's opponents' records (for OOR calculation)
        opp_opp_records = []
        for _, g in opp_games.iterrows():
            opp_opp = g['away_team'] if g['home_team'] == opp else g['home_team']
            
            # Skip if opponent's opponent is the original team (circular)
            if opp_opp == team:
                continue
            
            opp_opp_games = games_df[
                (games_df['home_team'] == opp_opp) | (games_df['away_team'] == opp_opp)
            ]
            
            opp_opp_wins = 0
            opp_opp_losses = 0
            
            for _, gg in opp_opp_games.iterrows():
                if gg['home_team'] == opp_opp:
                    if gg['home_score'] > gg['away_score']:
                        opp_opp_wins += 1
                    else:
                        opp_opp_losses += 1
                else:
                    if gg['away_score'] > gg['home_score']:
                        opp_opp_wins += 1
                    else:
                        opp_opp_losses += 1
            
            opp_opp_records.append((opp_opp_wins, opp_opp_losses))
        
        opponents_opp_records.append(opp_opp_records)
    
    # Use metrics module SOS calculation
    sos_score = calculate_sos(
        opponents_records=opponents_records,
        opponents_opp_records=opponents_opp_records,
        include_oor=include_oor,
        oor_weight=oor_weight
    )
    
    return sos_score

# Calculate SOS for each team using win percentages
sos_scores = {}
for team in final_rankings['team']:
    sos_score = calculate_team_sos(team, games_df, include_oor=True, oor_weight=0.33)
    sos_scores[team] = sos_score

# Rank SOS scores (higher SOS = tougher schedule)
sos_series = pd.Series(sos_scores)
sos_ranks = sos_series.rank(method='min', ascending=False).to_dict()

print("✅ SOS calculated for all teams using win percentages")
print(f"   Formula: SOS = (2*OR + OOR)/3")
print(f"   Where OR = Opponents' win%, OOR = Opponents' opponents' win%")
print(f"SOS range: {sos_series.min():.3f} to {sos_series.max():.3f}")

print("\nTop 10 Toughest Schedules:")
top_sos = sos_series.sort_values(ascending=False).head(10)
for team, score in top_sos.items():
    rank = int(sos_ranks[team])
    wins = final_rankings[final_rankings['team'] == team]['wins'].values[0]
    losses = final_rankings[final_rankings['team'] == team]['losses'].values[0]
    print(f"  {rank:2d}. {team:<25} {wins}-{losses}  SOS={score:.3f}")

print("\nTop 10 Easiest Schedules:")
bottom_sos = sos_series.sort_values().head(10)
for team, score in bottom_sos.items():
    rank = int(sos_ranks[team])
    wins = final_rankings[final_rankings['team'] == team]['wins'].values[0]
    losses = final_rankings[final_rankings['team'] == team]['losses'].values[0]
    print(f"  {rank:2d}. {team:<25} {wins}-{losses}  SOS={score:.3f}")


✅ SOS calculated for all teams using win percentages
   Formula: SOS = (2*OR + OOR)/3
   Where OR = Opponents' win%, OOR = Opponents' opponents' win%
SOS range: 0.385 to 0.618

Top 10 Toughest Schedules:
   1. Florida                   3-8  SOS=0.618
   2. Wisconsin                 4-8  SOS=0.617
   3. Purdue                    1-10  SOS=0.607
   4. South Carolina            3-8  SOS=0.599
   5. UCLA                      3-9  SOS=0.595
   6. Arkansas                  1-10  SOS=0.589
   7. LSU                       6-5  SOS=0.585
   8. West Virginia             3-8  SOS=0.585
   9. Virginia Tech             2-9  SOS=0.584
  10. Oklahoma                  9-2  SOS=0.580

Top 10 Easiest Schedules:
  136. Akron                     4-7  SOS=0.385
  135. UConn                     8-3  SOS=0.391
  134. Fresno State              7-4  SOS=0.413
  133. Ohio                      7-4  SOS=0.414
  132. Bowling Green             3-8  SOS=0.422
  131. Central Michigan          6-5  SOS=0.422
  130. No

---

## Quality Wins and Bad Losses

**Quality Wins:** Wins against high-ranked opponents demonstrate ability to beat elite competition.
- Top 5 wins: Elite wins
- Top 12 wins: Playoff-caliber wins
- Top 25 wins: Quality wins
- **Road/Neutral Quality Wins:** The committee values wins against Top 25 teams away from home significantly more than home wins. These are tracked separately as a key differentiator.

**Bad Losses:** Losses to teams ranked outside Top 25 hurt resume significantly.

**FCS Handling:** Games against lower-division (FCS) opponents are treated as rank 999 and do not count toward quality wins, ensuring accurate resume evaluation.

**Evaluation Method:** Opponents are evaluated at their **current strength** (from final rankings) rather than at-the-time-played rank, aligning with the committee's "body of work" philosophy.

In [24]:
# Cell 5: Calculate Quality Wins, Bad Losses, and Road Context
# Enhanced with road/neutral quality wins tracking per CFP Committee protocols
quality_wins_data = {}
bad_losses_data = {}
vs_top25_data = {}
road_top25_wins_data = {}  # NEW: Track road/neutral ranked wins (key differentiator)

# Create a lookup for quick rank access (evaluates opponents at current strength)
team_rank_dict = dict(zip(final_rankings['team'], final_rankings['rank']))

for team in final_rankings['team']:
    team_games = games_df[
        (games_df['home_team'] == team) | (games_df['away_team'] == team)
    ]
    
    win_opponents = []
    loss_opponents = []
    road_ranked_wins = 0  # Counter for road/neutral quality wins
    
    for _, game in team_games.iterrows():
        # Determine if neutral site (critical for CFP evaluation)
        is_neutral = game.get('neutral_site', False)
        
        if game['home_team'] == team:
            opponent = game['away_team']
            won = game['home_score'] > game['away_score']
            is_road_win = False  # Home win
        else:
            opponent = game['home_team']
            won = game['away_score'] > game['home_score']
            is_road_win = True  # True road game
            
        opp_rank = team_rank_dict.get(opponent, 999)  # FCS/unranked teams = rank 999
        
        if won:
            win_opponents.append(opponent)
            # Check for High Quality Road/Neutral Win (Committee Differentiator)
            # The committee values wins against Top 25 teams away from home significantly more
            if opp_rank <= 25 and (is_road_win or is_neutral):
                road_ranked_wins += 1
        else:
            loss_opponents.append(opponent)
    
    # Calculate quality wins (using existing function)
    win_opp_ranks = [team_rank_dict.get(opp, 999) for opp in win_opponents]
    qw = calculate_quality_wins(
        opponent_ranks=win_opp_ranks,
        thresholds={'top_5': 5, 'top_12': 12, 'top_25': 25}
    )
    quality_wins_data[team] = qw
    road_top25_wins_data[team] = road_ranked_wins  # Store road/neutral quality wins
    
    # Calculate bad losses (losses to teams outside Top 25)
    loss_opp_ranks = [team_rank_dict.get(opp, 999) for opp in loss_opponents]
    bl_count, bl_ranks = identify_bad_losses(
        loss_opponent_ranks=loss_opp_ranks,
        threshold=25
    )
    bad_losses_data[team] = bl_count
    
    # Format Top 25 Record string (e.g., "3-1")
    top25_wins = qw['top_25']
    top25_losses = sum(1 for rank in loss_opp_ranks if rank <= 25)
    vs_top25_data[team] = f"{top25_wins}-{top25_losses}"

print("✅ Quality wins, road value, and bad losses calculated")
print("\nTeams with Most Top 25 Wins:")
top25_wins_sorted = sorted(
    [(team, qw['top_25']) for team, qw in quality_wins_data.items()],
    key=lambda x: x[1],
    reverse=True
)[:10]
for team, count in top25_wins_sorted:
    print(f"  {team:<25} {count} wins vs Top 25")

print("\nTeams with Most Road/Neutral Top 25 Wins (Committee Differentiator):")
road_top25_sorted = sorted(
    [(team, count) for team, count in road_top25_wins_data.items()],
    key=lambda x: x[1],
    reverse=True
)[:10]
for team, count in road_top25_sorted:
    if count > 0:
        print(f"  {team:<25} {count} road/neutral wins vs Top 25")


✅ Quality wins, road value, and bad losses calculated

Teams with Most Top 25 Wins:
  Ohio State                3 wins vs Top 25
  Indiana                   3 wins vs Top 25
  Texas                     3 wins vs Top 25
  BYU                       2 wins vs Top 25
  Oregon                    2 wins vs Top 25
  Texas Tech                2 wins vs Top 25
  Georgia                   2 wins vs Top 25
  Ole Miss                  2 wins vs Top 25
  Oklahoma                  2 wins vs Top 25
  Alabama                   2 wins vs Top 25

Teams with Most Road/Neutral Top 25 Wins (Committee Differentiator):
  Ohio State                2 road/neutral wins vs Top 25
  Indiana                   2 road/neutral wins vs Top 25
  BYU                       1 road/neutral wins vs Top 25
  Oregon                    1 road/neutral wins vs Top 25
  Texas Tech                1 road/neutral wins vs Top 25
  Texas A&M                 1 road/neutral wins vs Top 25
  Ole Miss                  1 road/neutral wins 

---

## Conference Champions

Conference championship status is critical for CFP selection (automatic bid consideration).

**Simulation Logic:**
1. **Identify Participants:** Top 2 teams in each conference based on:
   - Conference win percentage (primary)
   - Overall record (secondary)
   - Composite rank (tertiary tiebreaker)
2. **Simulate Championship Game:** Uses predictive scores (Massey + Elo) to determine winner
3. **Allow Upsets:** Lower-ranked teams can win if they have similar predictive ratings
4. **Skip Independents:** Teams without conferences cannot be conference champions

This ensures accurate playoff selection per the 5+7 format, where the 5 highest-ranked conference champions receive automatic bids.

In [25]:
# Cell 6: Simulate Conference Championship Games (Advanced "Waterfall" Protocol)

import numpy as np

# 1. SETUP: Build Data Structures
team_conferences = {}
for _, game in games_df.iterrows():
    if pd.notna(game.get('home_conference')):
        team_conferences[game['home_team']] = game['home_conference']
    if pd.notna(game.get('away_conference')):
        team_conferences[game['away_team']] = game['away_conference']

# Pre-calculate ranks for O(1) lookup
team_ranks = dict(zip(final_rankings['team'], final_rankings['rank']))
predictive_scores = dict(zip(final_rankings['team'], final_rankings.get('predictive_score', [0.5]*len(final_rankings))))

# 2. HELPER: Calculate Basic Conference Record
def get_conf_record(team, conf, games_df):
    """Returns (wins, losses, win_pct, list_of_conf_opponents)"""
    team_games = games_df[
        ((games_df['home_team'] == team) | (games_df['away_team'] == team))
    ]
    wins = 0
    losses = 0
    opponents = []
    
    for _, g in team_games.iterrows():
        opp = g['away_team'] if g['home_team'] == team else g['home_team']
        opp_conf = team_conferences.get(opp)
        
        if opp_conf == conf:
            opponents.append(opp)
            # Check if won
            if g['home_team'] == team and g['home_score'] > g['away_score']: wins += 1
            elif g['away_team'] == team and g['away_score'] > g['home_score']: wins += 1
            else: losses += 1
            
    win_pct = wins / (wins + losses) if (wins + losses) > 0 else 0
    return wins, losses, win_pct, opponents

# 3. HELPER: Advanced Tiebreaker Metrics
def calculate_pool_stats(team, pool_teams, games_df, team_opponents_map):
    """
    Step 1 of Tiebreaker: Calculate record ONLY against other tied teams.
    Returns: (win_pct, games_played, wins)
    """
    pool_wins = 0
    pool_games = 0
    
    # Get pre-calculated list of conf opponents
    conf_opps = team_opponents_map.get(team, [])
    
    for opp in conf_opps:
        if opp in pool_teams and opp != team:
            pool_games += 1
            # Find specific game result
            game = games_df[
                ((games_df['home_team'] == team) & (games_df['away_team'] == opp)) |
                ((games_df['home_team'] == opp) & (games_df['away_team'] == team))
            ]
            if not game.empty:
                g = game.iloc[0]
                if g['home_team'] == team and g['home_score'] > g['away_score']: pool_wins += 1
                elif g['away_team'] == team and g['away_score'] > g['home_score']: pool_wins += 1
    
    pct = pool_wins / pool_games if pool_games > 0 else 0.0
    return pct, pool_games, pool_wins

def calculate_conf_sos(team, team_records_map, team_opponents_map):
    """
    Step 2 of Tiebreaker: Cumulative Opponents' Winning Percentage.
    Formula: Sum(Opponents' Wins) / Sum(Opponents' Games)
    """
    conf_opps = team_opponents_map.get(team, [])
    if not conf_opps: return 0.0
    
    total_opp_wins = 0
    total_opp_games = 0
    
    for opp in conf_opps:
        if opp in team_records_map:
            rec = team_records_map[opp]
            total_opp_wins += rec['wins']
            total_opp_games += (rec['wins'] + rec['losses'])
            
    return total_opp_wins / total_opp_games if total_opp_games > 0 else 0.0

# 4. CORE LOGIC: Resolve the Top 2 Teams
def resolve_conference_seeds(teams, games_df, team_opponents_map, team_records_map):
    """
    Returns the top 2 teams sorted by cascading tiebreakers.
    """
    df = pd.DataFrame(teams)
    # Initial Sort: Conf Win % (Primary)
    df = df.sort_values('win_pct', ascending=False)
    
    final_order = []
    
    # Process groups of tied teams
    grouped = df.groupby('win_pct', sort=False)
    
    for win_pct, group in grouped:
        if len(final_order) >= 2:
            break
            
        group_teams = group.to_dict('records')
        
        if len(group_teams) == 1:
            final_order.append(group_teams[0])
        else:
            # --- TIEBREAKER LOGIC ---
            pool_names = [t['team'] for t in group_teams]
            
            # 1. Calculate Pool Stats
            for t in group_teams:
                t['pool_pct'], t['pool_games'], t['pool_wins'] = calculate_pool_stats(
                    t['team'], pool_names, games_df, team_opponents_map
                )
            
            # 2. Calculate SOS (Conference Opponents Only)
            for t in group_teams:
                t['conf_sos'] = calculate_conf_sos(t['team'], team_records_map, team_opponents_map)
            
            # 3. Analyze Balance
            game_counts = [t['pool_games'] for t in group_teams]
            is_balanced = len(set(game_counts)) == 1 and game_counts[0] > 0
            
            # Check for a "Sweeper" (Beat everyone else in the pool)
            # Necessary for unbalanced schedules where one team went 2-0 vs a pool of 4
            sweeper = None
            for t in group_teams:
                if t['pool_wins'] == len(group_teams) - 1:
                    sweeper = t['team']
                    break
            
            # --- SORTING STRATEGY ---
            if sweeper:
                # Scenario A: One team beat EVERYONE. They win H2H automatically.
                print(f"  [Tiebreaker] {sweeper} sweeps the {len(group_teams)}-team pool.")
                sorted_group = sorted(group_teams, key=lambda x: (x['team'] != sweeper, -x['conf_sos']))
                
            elif is_balanced:
                # Scenario B: Round Robin / Balanced. Use Pool Win %.
                sorted_group = sorted(
                    group_teams,
                    key=lambda x: (-x['pool_pct'], -x['conf_sos'], x['rank'])
                )
            else:
                # Scenario C (The Duke Fix): Unbalanced. Skip H2H, use SOS.
                # Duke (0-0 in pool) vs SMU (1-0 in pool). SMU didn't play everyone, so H2H is invalid.
                # We sort primarily by SOS.
                if len(group_teams) > 2:
                    print(f"  [Tiebreaker] Unbalanced H2H at {win_pct:.3f} ({len(group_teams)} teams). Skipping to SOS.")
                    
                sorted_group = sorted(
                    group_teams,
                    key=lambda x: (-x['conf_sos'], -x['pool_pct'], x['rank'])
                )
                
                # Debug output for verification
                if len(group_teams) > 2:
                    for i, t in enumerate(sorted_group):
                        print(f"    {i+1}. {t['team']}: SOS {t['conf_sos']:.3f} | Pool {t['pool_pct']:.3f} ({t['pool_games']}g)")

            final_order.extend(sorted_group)
            
    return final_order[:2]

# 5. EXECUTION
conf_champions = {}
valid_conferences = [c for c in set(team_conferences.values()) if c not in ['FBS Independents', 'Independent', 'Pac-12']]

print(f"{'CONFERENCE CHAMPIONSHIP SIMULATION':^80}")
print("=" * 80)

for conf in sorted(valid_conferences):
    # Get teams
    conf_team_names = [t for t, c in team_conferences.items() if c == conf]
    if not conf_team_names: continue
    
    # Pre-calculate data
    conf_data = []
    team_opponents_map = {} 
    team_records_map = {}   
    
    for team in conf_team_names:
        if team not in team_ranks: continue
        w, l, pct, opps = get_conf_record(team, conf, games_df)
        
        data = {
            'team': team, 'conference': conf,
            'wins': w, 'losses': l, 'win_pct': pct,
            'rank': team_ranks[team],
            'predictive': predictive_scores.get(team, 0.5)
        }
        conf_data.append(data)
        team_opponents_map[team] = opps
        team_records_map[team] = data

    # Resolve Top 2
    top_2 = resolve_conference_seeds(conf_data, games_df, team_opponents_map, team_records_map)
    
    if len(top_2) == 2:
        t1, t2 = top_2[0], top_2[1]
        
        # Simulate Game
        spread = t1['predictive'] - t2['predictive']
        winner = t1['team'] if spread > 0 else t2['team']
        upset = " (UPSET!)" if spread <= 0 else ""
        
        conf_champions[conf] = winner
        
        print(f"{conf:<20} #{t1['rank']} {t1['team']:<15} vs  #{t2['rank']} {t2['team']:<15} -> {winner}{upset}")
        
    elif len(top_2) == 1:
        conf_champions[conf] = top_2[0]['team']
        print(f"{conf:<20} {top_2[0]['team']} (Automatic)")

print("-" * 80)
print(f"Total Champions: {len(conf_champions)}")

                       CONFERENCE CHAMPIONSHIP SIMULATION                       
  [Tiebreaker] Unbalanced H2H at 0.750 (5 teams). Skipping to SOS.
    1. Duke: SOS 0.500 | Pool 0.000 (1g)
    2. Miami: SOS 0.446 | Pool 0.500 (2g)
    3. Georgia Tech: SOS 0.446 | Pool 0.500 (2g)
    4. Pittsburgh: SOS 0.431 | Pool 0.500 (2g)
    5. SMU: SOS 0.422 | Pool 1.000 (1g)
ACC                  #20 Virginia        vs  #39 Duke            -> Virginia
  [Tiebreaker] Unbalanced H2H at 0.875 (3 teams). Skipping to SOS.
    1. North Texas: SOS 0.438 | Pool 1.000 (1g)
    2. Navy: SOS 0.438 | Pool 0.000 (1g)
    3. Tulane: SOS 0.391 | Pool 0.000 (0g)
American Athletic    #16 North Texas     vs  #26 Navy            -> North Texas
  [Tiebreaker] Texas Tech sweeps the 2-team pool.
Big 12               #5 Texas Tech      vs  #3 BYU             -> Texas Tech
Big Ten              #1 Ohio State      vs  #2 Indiana         -> Ohio State
  [Tiebreaker] Jacksonville State sweeps the 2-team pool.
Conference USA 

---

## Resume Team Sheets

Complete resume view for all teams, showing:
- Rank and record
- SOR rank (how impressive is the record?)
- SOS rank (how tough was the schedule?)
- Performance vs Top 25 (W-L)
- Bad losses count
- Conference champion status

In [26]:
# Cell 7: Build Resume DataFrame
# Prepare data for resume builder
teams_data = final_rankings[['team', 'wins', 'losses']].copy()

# Add conference information from games data
if 'conference' not in final_rankings.columns:
    # Extract conference from games dataframe if not in final_rankings
    team_conferences = {}
    for _, game in games_df.iterrows():
        if pd.notna(game.get('home_conference')):
            team_conferences[game['home_team']] = game['home_conference']
        if pd.notna(game.get('away_conference')):
            team_conferences[game['away_team']] = game['away_conference']
    
    # Map conferences to teams_data
    teams_data['conference'] = teams_data['team'].map(team_conferences)
else:
    teams_data['conference'] = final_rankings['conference']

# Build composite ranks dict
composite_ranks = dict(zip(final_rankings['team'], final_rankings['rank']))

# Build resume DataFrame
resume_df = build_resume_dataframe(
    teams_data=teams_data,
    sor_scores=sor_scores,
    sos_scores=sos_scores,
    quality_wins=quality_wins_data,
    bad_losses=bad_losses_data,
    conf_champions=conf_champions,
    composite_ranks=composite_ranks
)

# Add vs_top_25 column with our calculated data
resume_df['vs_top_25'] = resume_df['team'].map(vs_top25_data)

# Save resume rankings
output_path = rankings_dir / f'resume_rankings_{year}_week{week}.csv'
resume_df.to_csv(output_path, index=False)

print(f"✅ Resume rankings saved to {output_path}")
print("\n" + "=" * 100)
print("TOP 25 RESUME TEAM SHEETS".center(100))
print("=" * 100)
print()

# Display top 25
display_cols = ['rank', 'team', 'record', 'sor_rank', 'sos_rank', 'vs_top_25', 'bad_losses', 'conf_champ']
resume_df[display_cols].head(25)

✅ Resume rankings saved to data/output/rankings/resume_rankings_2025_week15.csv

                                     TOP 25 RESUME TEAM SHEETS                                      



Unnamed: 0,rank,team,record,sor_rank,sos_rank,vs_top_25,bad_losses,conf_champ
0,1,Ohio State,11-0,2,35,3-0,0,Yes (Big Ten)
1,2,Indiana,11-0,1,48,3-0,0,No
2,3,BYU,10-1,3,15,2-1,0,No
3,4,Oregon,10-1,4,60,2-1,0,No
4,5,Texas Tech,10-1,8,81,2-0,1,Yes (Big 12)
5,6,Georgia,10-1,6,34,2-1,0,Yes (SEC)
6,7,Texas A&M,10-1,5,65,1-1,0,No
7,8,Ole Miss,10-1,7,67,2-1,0,No
8,9,Oklahoma,9-2,9,10,2-2,0,No
9,10,Notre Dame,10-2,11,49,1-2,0,No


---

## Resume vs Predictive Comparison

This section separates **deserving** (resume) from **best** (predictive):

**Resume Rankings** (what you've accomplished):
- Based on: Colley Matrix (win/loss only), Win %, SOR
- No margin of victory
- Rewards quality wins and tough schedules

**Predictive Rankings** (how good you are):
- Based on: Massey Ratings (MOV), Elo, efficiency metrics
- Uses capped margin of victory (28 points)
- HFA-adjusted
- Predicts future game outcomes

**Composite Rankings:**
- Weighted blend of resume and predictive
- Balances accomplishment with power

In [28]:
# Cell 8: Resume vs Predictive Split
from sklearn.preprocessing import MinMaxScaler

# Load component rankings to reconstruct resume and predictive scores
# Note: final_rankings (composite_rankings.csv) doesn't have colley_rating, win_pct, massey_rating, elo_rating
# So we need to load the component files
colley_rankings = pd.read_csv(rankings_dir / f'colley_rankings_{year}_week{week}.csv')
massey_rankings = pd.read_csv(rankings_dir / f'massey_rankings_{year}_week{week}.csv')
elo_rankings = pd.read_csv(rankings_dir / f'elo_rankings_{year}_week{week}.csv')
win_pct_rankings = pd.read_csv(rankings_dir / f'win_pct_rankings_{year}_week{week}.csv')

# Build resume-only rankings (Colley + Win% + SOR)
resume_data = colley_rankings.merge(
    win_pct_rankings[['team', 'wins', 'losses', 'win_pct']],
    on='team'
)
resume_data['sor_score'] = resume_data['team'].map(sor_scores)

# Normalize resume components
scaler_resume = MinMaxScaler()
resume_normalized = resume_data.copy()
resume_normalized[['colley_rating', 'win_pct', 'sor_score']] = scaler_resume.fit_transform(
    resume_data[['colley_rating', 'win_pct', 'sor_score']]
)

# SOR is already in correct direction (higher sor_score = harder achievement = better)
# No inversion needed - sor_score = -log10(prob), so higher is better

# Calculate resume score (equal weights)
resume_normalized['resume_score'] = (
    0.40 * resume_normalized['colley_rating'] +
    0.30 * resume_normalized['win_pct'] +
    0.30 * resume_normalized['sor_score']
)

resume_normalized['resume_rank'] = resume_normalized['resume_score'].rank(
    method='min', ascending=False
).astype(int)

# Build predictive-only rankings (Massey + Elo)
predictive_data = massey_rankings.merge(
    elo_rankings[['team', 'elo_rating']],
    on='team'
)

# Normalize predictive components
scaler_predictive = MinMaxScaler()
predictive_normalized = predictive_data.copy()
predictive_normalized[['massey_rating', 'elo_rating']] = scaler_predictive.fit_transform(
    predictive_data[['massey_rating', 'elo_rating']]
)

# Calculate predictive score
predictive_normalized['predictive_score'] = (
    0.55 * predictive_normalized['massey_rating'] +
    0.45 * predictive_normalized['elo_rating']
)

predictive_normalized['predictive_rank'] = predictive_normalized['predictive_score'].rank(
    method='min', ascending=False
).astype(int)

# Create comparison dataframe
resume_ranks_dict = dict(zip(resume_normalized['team'], resume_normalized['resume_rank']))
predictive_ranks_dict = dict(zip(predictive_normalized['team'], predictive_normalized['predictive_rank']))
composite_scores_dict = dict(zip(final_rankings['team'], final_rankings['composite_score']))

comparison_df = compare_resume_vs_predictive(
    resume_ranks=resume_ranks_dict,
    predictive_ranks=predictive_ranks_dict,
    composite_ranks=composite_ranks,
    composite_scores=composite_scores_dict,
    top_n=25
)

# Add delta columns
comparison_df['resume_vs_composite'] = comparison_df['resume_rank'] - comparison_df['composite_rank']
comparison_df['predictive_vs_composite'] = comparison_df['predictive_rank'] - comparison_df['composite_rank']

# Save comparison
comparison_path = exports_dir / f'resume_vs_predictive_{year}_week{week}.csv'
comparison_df.to_csv(comparison_path, index=False)

print(f"✅ Comparison saved to {comparison_path}")
print("\n" + "=" * 100)
print("RESUME VS PREDICTIVE COMPARISON - TOP 25".center(100))
print("=" * 100)
print()
print("Legend:")
print("  Resume Rank:     Based on what teams have accomplished (wins, SOR, schedule)")
print("  Predictive Rank: Based on how good teams are (power ratings, efficiency)")
print("  Composite Rank:  Balanced combination of both")
print()

comparison_df[[
    'team', 'resume_rank', 'predictive_rank', 'composite_rank',
    'resume_vs_composite', 'predictive_vs_composite'
]]


✅ Comparison saved to data/output/exports/resume_vs_predictive_2025_week15.csv

                              RESUME VS PREDICTIVE COMPARISON - TOP 25                              

Legend:
  Resume Rank:     Based on what teams have accomplished (wins, SOR, schedule)
  Predictive Rank: Based on how good teams are (power ratings, efficiency)
  Composite Rank:  Balanced combination of both



Unnamed: 0,team,resume_rank,predictive_rank,composite_rank,resume_vs_composite,predictive_vs_composite
0,Ohio State,2,1,1,1,0
1,Indiana,1,3,2,-1,1
2,BYU,3,8,3,0,5
3,Oregon,4,5,4,0,1
4,Texas Tech,8,2,5,3,-3
5,Georgia,6,9,6,0,3
6,Texas A&M,5,13,7,-2,6
7,Ole Miss,7,17,8,-1,9
8,Oklahoma,9,20,9,0,11
9,Notre Dame,11,4,10,1,-6


---

## Biggest Resume vs Predictive Discrepancies

Teams where resume and predictive rankings differ significantly:

In [29]:
# Cell 9: Analyze Discrepancies
comparison_full = comparison_df.copy()
comparison_full['resume_pred_diff'] = abs(
    comparison_full['resume_rank'] - comparison_full['predictive_rank']
)

biggest_discrepancies = comparison_full.nlargest(10, 'resume_pred_diff')

print("Teams with Biggest Resume vs Predictive Discrepancies:")
print("=" * 100)
print()

for _, row in biggest_discrepancies.iterrows():
    team = row['team']
    resume_rank = int(row['resume_rank'])
    pred_rank = int(row['predictive_rank'])
    comp_rank = int(row['composite_rank'])
    diff = int(row['resume_pred_diff'])
    
    # Determine if resume-favored or predictive-favored
    if resume_rank < pred_rank:
        favor = "RESUME-FAVORED"
        reason = "Strong wins/schedule but lower power rating"
    else:
        favor = "PREDICTIVE-FAVORED"
        reason = "High power rating but weaker resume"
    
    print(f"{team}")
    print(f"  Status: {favor}")
    print(f"  Resume Rank: #{resume_rank}")
    print(f"  Predictive Rank: #{pred_rank}")
    print(f"  Composite Rank: #{comp_rank}")
    print(f"  Difference: {diff} spots")
    print(f"  Why: {reason}")
    print()

Teams with Biggest Resume vs Predictive Discrepancies:

Illinois
  Status: RESUME-FAVORED
  Resume Rank: #28
  Predictive Rank: #43
  Composite Rank: #24
  Difference: 15 spots
  Why: Strong wins/schedule but lower power rating

Tulane
  Status: RESUME-FAVORED
  Resume Rank: #20
  Predictive Rank: #34
  Composite Rank: #25
  Difference: 14 spots
  Why: Strong wins/schedule but lower power rating

South Florida
  Status: PREDICTIVE-FAVORED
  Resume Rank: #24
  Predictive Rank: #12
  Composite Rank: #22
  Difference: 12 spots
  Why: High power rating but weaker resume

Oklahoma
  Status: RESUME-FAVORED
  Resume Rank: #9
  Predictive Rank: #20
  Composite Rank: #9
  Difference: 11 spots
  Why: Strong wins/schedule but lower power rating

Ole Miss
  Status: RESUME-FAVORED
  Resume Rank: #7
  Predictive Rank: #17
  Composite Rank: #8
  Difference: 10 spots
  Why: Strong wins/schedule but lower power rating

Texas A&M
  Status: RESUME-FAVORED
  Resume Rank: #5
  Predictive Rank: #13
  Compos

---

## Export Team Sheets

Export detailed team sheets for external use.

In [30]:
# Cell 10: Export Team Sheets
# Combine resume data with quality wins detail
team_sheets = resume_df.copy()

# Add quality wins breakdown
team_sheets['top_5_wins'] = team_sheets['team'].apply(
    lambda t: quality_wins_data.get(t, {}).get('top_5', 0)
)
team_sheets['top_12_wins'] = team_sheets['team'].apply(
    lambda t: quality_wins_data.get(t, {}).get('top_12', 0)
)
team_sheets['top_25_wins'] = team_sheets['team'].apply(
    lambda t: quality_wins_data.get(t, {}).get('top_25', 0)
)

# Add resume vs predictive ranks
team_sheets['resume_rank'] = team_sheets['team'].map(resume_ranks_dict)
team_sheets['predictive_rank'] = team_sheets['team'].map(predictive_ranks_dict)

# Export to CSV
team_sheets_path = exports_dir / f'team_sheets_{year}_week{week}.csv'
team_sheets.to_csv(team_sheets_path, index=False)

# Export top 25 to JSON for external consumption
top25_json = team_sheets.head(25).to_dict('records')
json_path = exports_dir / f'top25_team_sheets_{year}_week{week}.json'
with open(json_path, 'w') as f:
    json.dump(top25_json, f, indent=2)

print(f"✅ Team sheets exported:")
print(f"   CSV: {team_sheets_path}")
print(f"   JSON: {json_path}")
print()
print("Summary Statistics:")
print(f"  Total teams analyzed: {len(team_sheets)}")
print(f"  Conference champions: {len(conf_champions)}")
print(f"  Teams with Top 25 wins: {len([t for t, qw in quality_wins_data.items() if qw.get('top_25', 0) > 0])}")
print(f"  Teams with bad losses: {len([t for t, bl in bad_losses_data.items() if bl > 0])}")

✅ Team sheets exported:
   CSV: data/output/exports/team_sheets_2025_week15.csv
   JSON: data/output/exports/top25_team_sheets_2025_week15.json

Summary Statistics:
  Total teams analyzed: 136
  Conference champions: 9
  Teams with Top 25 wins: 30
  Teams with bad losses: 118


---

## Summary

Resume analysis complete. Key outputs:

1. **Resume Rankings** - Teams ranked by accomplishments (SOR, SOS, quality wins)
2. **Team Sheets** - Complete resume view for all teams
3. **Resume vs Predictive** - Clear separation of deserving vs best
4. **Quality Metrics** - Detailed breakdown of wins and losses

Next step: Proceed to **05_playoff_selection.ipynb** for 12-team bracket selection.