# CFP Ranking Algorithms - Resume vs Predictive

This notebook implements ranking algorithms split into two categories:

**RESUME Rankings** - What teams have accomplished:
- Colley Matrix - Pure win/loss analysis (no MOV)
- Win Percentage - Raw performance

**PREDICTIVE Rankings** - How good teams are:
- Massey Ratings - MOV-based power rating (capped at 28, HFA-adjusted)
- Elo System - Dynamic game-by-game updates

This separation ensures both "deserve-to-be-in" and "best-team" perspectives.

In [1]:
# Setup and Imports
import pandas as pd
import numpy as np
from pathlib import Path
import warnings
import sys
import os
warnings.filterwarnings('ignore')

# Add notebooks directory to path for notebook_utils
notebook_dir = Path.cwd()
if notebook_dir.name != 'notebooks':
    notebook_dir = notebook_dir / 'notebooks'
if str(notebook_dir) not in sys.path:
    sys.path.insert(0, str(notebook_dir))

# Add project root to path
project_root = notebook_dir.parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

from notebook_utils import (
    setup_notebook_env,
    load_cached_games,
    create_output_dirs,
    print_ranking_summary,
    validate_config
)

setup_notebook_env()

# Create output directory
output_dirs = create_output_dirs()
output_dir = output_dirs['rankings']

# Configuration
year = 2025
week = 15

# Load data
games_df = load_cached_games(year, week)
print(f'Loaded {len(games_df)} FBS games for {year} season, week {week}')
print(f'Output directory: {output_dir}')

Loaded 752 FBS games for 2025 season, week 15
Output directory: data/output/rankings


---

## RESUME RANKINGS

Resume rankings evaluate what teams have accomplished based purely on wins and losses.

**No margin of victory** - prevents blowout stat-padding and rewards quality wins regardless of score.

In [2]:
# Cell 2: Colley Matrix (Resume)
class ColleyMatrix:
    """
    Resume-based ranking using only wins/losses.
    No margin of victory - pure record evaluation.
    """
    
    def __init__(self, games_df):
        self.games = games_df
        self.teams = sorted(list(set(
            games_df['home_team'].unique().tolist() + 
            games_df['away_team'].unique().tolist()
        )))
        self.n_teams = len(self.teams)
        self.team_idx = {team: i for i, team in enumerate(self.teams)}
        
    def build_system(self):
        """
        Build Colley matrix C and vector b.
        
        Formula: b_i = 1 + 0.5*(w_i - l_i)
        Matrix C: diagonal = 2 + t_i, off-diagonal = -n_{ij}
        """
        C = np.zeros((self.n_teams, self.n_teams))
        
        # Initialize b vector with 1 (as per formula: 1 + 0.5*(w-l))
        b = np.ones(self.n_teams)
        
        # Count wins and losses for each team
        wins = {team: 0 for team in self.teams}
        losses = {team: 0 for team in self.teams}
        
        for _, game in self.games.iterrows():
            home_idx = self.team_idx[game['home_team']]
            away_idx = self.team_idx[game['away_team']]
            
            # Update diagonal (games played)
            C[home_idx, home_idx] += 1
            C[away_idx, away_idx] += 1
            
            # Update off-diagonal (negative games between teams)
            C[home_idx, away_idx] -= 1
            C[away_idx, home_idx] -= 1
            
            # Count wins and losses
            if game['home_score'] > game['away_score']:
                wins[game['home_team']] += 1
                losses[game['away_team']] += 1
            else:
                wins[game['away_team']] += 1
                losses[game['home_team']] += 1
        
        # Add 2 to diagonal (Laplace rule: C_ii = 2 + t_i)
        np.fill_diagonal(C, C.diagonal() + 2)
        
        # Build b vector using exact formula: b_i = 1 + 0.5*(w_i - l_i)
        for i, team in enumerate(self.teams):
            b[i] = 1 + 0.5 * (wins[team] - losses[team])
        
        return C, b
    
    def solve(self):
        """Solve Cr = b for ratings"""
        C, b = self.build_system()
        ratings = np.linalg.solve(C, b)
        
        results = pd.DataFrame({
            'team': self.teams,
            'colley_rating': ratings
        }).sort_values('colley_rating', ascending=False)
        
        return results

# Calculate Colley rankings
colley = ColleyMatrix(games_df)
colley_rankings = colley.solve()

print('Top 10 Colley Rankings (Resume):')
print(colley_rankings.head(10))

Top 10 Colley Rankings (Resume):
           team  colley_rating
81   Ohio State       0.977465
41      Indiana       0.961253
10          BYU       0.925632
34      Georgia       0.903087
86       Oregon       0.887842
107   Texas A&M       0.883712
85     Ole Miss       0.882280
82     Oklahoma       0.872103
2       Alabama       0.853312
109  Texas Tech       0.843921


In [3]:
# Cell 3: Win Percentage (Resume)
def calculate_win_percentage(games_df):
    """
    Calculate simple win percentage for each team.
    Pure resume metric - no MOV consideration.
    """
    teams = set(games_df['home_team'].unique()) | set(games_df['away_team'].unique())
    
    records = {team: {'wins': 0, 'losses': 0} for team in teams}
    
    for _, game in games_df.iterrows():
        home = game['home_team']
        away = game['away_team']
        
        if game['home_score'] > game['away_score']:
            records[home]['wins'] += 1
            records[away]['losses'] += 1
        else:
            records[away]['wins'] += 1
            records[home]['losses'] += 1
    
    results = []
    for team, record in records.items():
        total = record['wins'] + record['losses']
        win_pct = record['wins'] / total if total > 0 else 0
        results.append({
            'team': team,
            'wins': record['wins'],
            'losses': record['losses'],
            'win_pct': win_pct
        })
    
    return pd.DataFrame(results).sort_values('win_pct', ascending=False)

# Calculate win percentages
win_pct_rankings = calculate_win_percentage(games_df)

print('Top 10 Win Percentage Rankings (Resume):')
print(win_pct_rankings.head(10))

Top 10 Win Percentage Rankings (Resume):
              team  wins  losses   win_pct
3       Ohio State    11       0  1.000000
77         Indiana    11       0  1.000000
53         Georgia    10       1  0.909091
8    James Madison    10       1  0.909091
87             BYU    10       1  0.909091
85       Texas A&M    10       1  0.909091
112    North Texas    10       1  0.909091
14      Texas Tech    10       1  0.909091
37          Oregon    10       1  0.909091
4         Ole Miss    10       1  0.909091


In [4]:
# Cell 4: Combine Resume Rankings
# Merge Colley and Win Percentage
resume_rankings = colley_rankings.merge(
    win_pct_rankings[['team', 'wins', 'losses', 'win_pct']],
    on='team'
)

# Add rank columns
resume_rankings['colley_rank'] = resume_rankings['colley_rating'].rank(method='min', ascending=False).astype(int)
resume_rankings['win_pct_rank'] = resume_rankings['win_pct'].rank(method='min', ascending=False).astype(int)

# Calculate combined resume score (60% Colley, 40% Win%)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

resume_rankings['colley_norm'] = scaler.fit_transform(resume_rankings[['colley_rating']])
resume_rankings['win_pct_norm'] = resume_rankings['win_pct']  # Already 0-1

resume_rankings['resume_score'] = (
    0.60 * resume_rankings['colley_norm'] +
    0.40 * resume_rankings['win_pct_norm']
)

resume_rankings['resume_rank'] = resume_rankings['resume_score'].rank(method='min', ascending=False).astype(int)

# Sort by resume score
resume_rankings = resume_rankings.sort_values('resume_score', ascending=False)

print('='*80)
print('COMBINED RESUME RANKINGS')
print('='*80)
print()
print(resume_rankings[[
    'team', 'wins', 'losses', 'win_pct', 
    'colley_rating', 'resume_score', 'resume_rank'
]].head(15))

COMBINED RESUME RANKINGS

             team  wins  losses   win_pct  colley_rating  resume_score  \
0      Ohio State    11       0  1.000000       0.977465      1.000000   
1         Indiana    11       0  1.000000       0.961253      0.990270   
2             BYU    10       1  0.909091       0.925632      0.932528   
3         Georgia    10       1  0.909091       0.903087      0.918998   
4          Oregon    10       1  0.909091       0.887842      0.909848   
5       Texas A&M    10       1  0.909091       0.883712      0.907369   
6        Ole Miss    10       1  0.909091       0.882280      0.906510   
9      Texas Tech    10       1  0.909091       0.843921      0.883488   
12    North Texas    10       1  0.909091       0.813086      0.864982   
7        Oklahoma     9       2  0.818182       0.872103      0.864038   
8         Alabama     9       2  0.818182       0.853312      0.852761   
10     Notre Dame    10       2  0.833333       0.830524      0.845145   
19  James Ma

---

## PREDICTIVE RANKINGS

Predictive rankings evaluate how good teams are using margin of victory.

**MOV Capping at 28 points** - Prevents excessive blowout stat-padding

**HFA Adjustment of 3.75 points** - Neutralizes home field advantage for fair comparison

In [5]:
# Cell 5: Massey Ratings (Predictive) with MOV Cap and HFA Adjustment
from src.utils.metrics import calculate_home_field_adjusted_mov, cap_margin_of_victory

class MasseyRatings:
    """
    Predictive power rating using margin of victory.
    Applies MOV capping at 28 points and HFA adjustment of 3.75 points.
    """
    
    def __init__(self, games_df, mov_cap=28, hfa_adjustment=3.75):
        self.games = games_df.copy()
        self.mov_cap = mov_cap
        self.hfa = hfa_adjustment
        self.teams = sorted(list(set(
            games_df['home_team'].unique().tolist() + 
            games_df['away_team'].unique().tolist()
        )))
        self.n_teams = len(self.teams)
        self.team_idx = {team: i for i, team in enumerate(self.teams)}
        
    def apply_adjustments(self):
        """
        Apply MOV cap and home field adjustment.
        Uses metrics module functions for consistency.
        """
        adjusted_margins = []
        
        for _, game in self.games.iterrows():
            raw_margin = game['home_score'] - game['away_score']
            
            # Apply HFA adjustment first
            adjusted_margin = calculate_home_field_adjusted_mov(
                margin=raw_margin,
                is_home=True,
                is_neutral=game['neutral_site'],
                hfa_points=self.hfa
            )
            
            # Then apply MOV cap
            capped_margin = cap_margin_of_victory(adjusted_margin, cap=self.mov_cap)
            
            adjusted_margins.append(capped_margin)
        
        self.games['adj_margin'] = adjusted_margins
        
    def build_system(self):
        """
        Build Colleyized Massey system: Cr = p
        Uses Colley matrix structure (C = 2I + M) with point differential vector p.
        This is the "Colleyized Massey" approach.
        """
        self.apply_adjustments()
        
        # Build Colley matrix structure (same as Colley method)
        C = np.zeros((self.n_teams, self.n_teams))
        p = np.zeros(self.n_teams)
        
        for _, game in self.games.iterrows():
            home_idx = self.team_idx[game['home_team']]
            away_idx = self.team_idx[game['away_team']]
            margin = game['adj_margin']
            
            # Update diagonal (games played)
            C[home_idx, home_idx] += 1
            C[away_idx, away_idx] += 1
            
            # Update off-diagonal (negative games between teams)
            C[home_idx, away_idx] -= 1
            C[away_idx, home_idx] -= 1
            
            # Update point differential vector (cumulative)
            p[home_idx] += margin
            p[away_idx] -= margin
        
        # Add 2 to diagonal (Colley structure: C_ii = 2 + t_i)
        np.fill_diagonal(C, C.diagonal() + 2)
        
        return C, p
    
    def solve(self):
        """Solve Cr = p for ratings (Colleyized Massey)"""
        C, p = self.build_system()
        
        try:
            ratings = np.linalg.solve(C, p)
        except np.linalg.LinAlgError:
            ratings, residuals, rank, s = np.linalg.lstsq(C, p, rcond=None)
            if rank < self.n_teams:
                print(f'Warning: Massey matrix rank {rank} < {self.n_teams} teams. Using least squares solution.')
        
        results = pd.DataFrame({
            'team': self.teams,
            'massey_rating': ratings
        }).sort_values('massey_rating', ascending=False)
        
        return results

# Calculate Massey ratings with MOV cap and HFA adjustment
massey = MasseyRatings(games_df, mov_cap=28, hfa_adjustment=3.75)
massey_rankings = massey.solve()

print('Top 10 Massey Rankings (Predictive):')
print('(MOV capped at 28 points, HFA adjusted by 3.75 points)')
print(massey_rankings.head(10))

Top 10 Massey Rankings (Predictive):
(MOV capped at 28 points, HFA adjusted by 3.75 points)
           team  massey_rating
81   Ohio State      21.659482
109  Texas Tech      20.370698
41      Indiana      19.444894
79   Notre Dame      18.044307
123        Utah      16.091021
60        Miami      15.994943
86       Oregon      15.767610
10          BYU      13.578249
34      Georgia      13.409187
125  Vanderbilt      13.343915


In [6]:
# Cell 6: Elo Ratings (Predictive) with MOV Multiplier
class EloRatings:
    """
    Dynamic rating system that updates game-by-game with margin of victory multiplier.
    Predictive power rating.
    
    Formulas:
    - Expected Score: E_A = 1 / (1 + 10^(-(R_A - R_B)/400))
    - MOV Multiplier: S_adj = 1 / (1 + 10^(-(ScoreDiff - HFA)/C))
    - Rating Update: R'_A = R_A + K * (S_adj - E_A)
    """
    
    def __init__(self, k_factor=85, hfa_points=3.75, mov_scale=17, mov_cap=28):
        self.k = k_factor  # Higher K for short season (default 85 for CFB)
        self.hfa_points = hfa_points  # Home field advantage in points
        self.mov_scale = mov_scale  # Scaling constant C for MOV multiplier
        self.mov_cap = mov_cap  # Cap margin of victory
        self.ratings = {}
        
    def initialize_ratings(self, teams, prev_ratings=None):
        """Initialize team ratings"""
        if prev_ratings:
            for team in teams:
                if team in prev_ratings:
                    self.ratings[team] = 1500 + 0.95 * (prev_ratings[team] - 1500)
                else:
                    self.ratings[team] = 1500
        else:
            self.ratings = {team: 1500 for team in teams}
    
    def expected_score(self, rating_a, rating_b):
        """Calculate expected win probability"""
        return 1 / (1 + 10 ** ((rating_b - rating_a) / 400))
    
    def mov_multiplier(self, score_diff, is_neutral):
        """
        Calculate MOV-adjusted score using logistic function.
        Formula: S_adj = 1 / (1 + 10^(-(ScoreDiff - HFA)/C))
        """
        # Apply HFA adjustment (subtract from home team's margin)
        hfa_adjusted_diff = score_diff - (0 if is_neutral else self.hfa_points)
        
        # Cap the margin
        hfa_adjusted_diff = np.clip(hfa_adjusted_diff, -self.mov_cap, self.mov_cap)
        
        # Calculate MOV multiplier (maps score difference to 0-1 scale)
        # This represents how "impressive" the win was
        s_adj = 1 / (1 + 10 ** (-hfa_adjusted_diff / self.mov_scale))
        
        return s_adj
    
    def update_game(self, home_team, away_team, home_score, away_score, is_neutral):
        """Update ratings based on game result with MOV multiplier"""
        # Apply home field advantage (unless neutral site)
        hfa_bonus = 0 if is_neutral else 55  # 55 Elo points â‰ˆ 3.75 points
        home_rating = self.ratings[home_team] + hfa_bonus
        away_rating = self.ratings[away_team]
        
        # Calculate expected scores
        home_expected = self.expected_score(home_rating, away_rating)
        away_expected = 1 - home_expected
        
        # Calculate score difference (from home team's perspective)
        score_diff = home_score - away_score
        
        # Calculate MOV-adjusted score
        s_adj = self.mov_multiplier(score_diff, is_neutral)
        
        # Use s_adj directly as the actual score
        # s_adj > 0.5 means home team won, < 0.5 means away team won
        # The magnitude indicates the margin of victory
        home_actual = s_adj
        away_actual = 1 - s_adj
        
        # Update ratings
        self.ratings[home_team] += self.k * (home_actual - home_expected)
        self.ratings[away_team] += self.k * (away_actual - away_expected)
    
    def process_season(self, games_df):
        """Process all games chronologically"""
        teams = set(games_df['home_team'].unique()) | set(games_df['away_team'].unique())
        self.initialize_ratings(teams)
        
        # Sort games by date/week
        games_sorted = games_df.sort_values(['week', 'game_id'])
        
        # Process each game
        for _, game in games_sorted.iterrows():
            is_neutral = game['neutral_site']
            self.update_game(
                game['home_team'], 
                game['away_team'],
                game['home_score'],
                game['away_score'],
                is_neutral
            )
        
        # Return final ratings
        results = pd.DataFrame([
            {'team': team, 'elo_rating': rating} 
            for team, rating in self.ratings.items()
        ]).sort_values('elo_rating', ascending=False)
        
        return results

# Calculate Elo ratings with MOV multiplier
elo = EloRatings(k_factor=85, hfa_points=3.75, mov_scale=17, mov_cap=28)
elo_rankings = elo.process_season(games_df)

print('Top 10 Elo Rankings (Predictive with MOV multiplier):')
print('(K=85, HFA=3.75 points, MOV scale=17, MOV cap=28)')
print(elo_rankings.head(10))

Top 10 Elo Rankings (Predictive with MOV multiplier):
(K=85, HFA=3.75 points, MOV scale=17, MOV cap=28)
              team   elo_rating
3       Ohio State  1753.790259
14      Texas Tech  1751.111530
57      Notre Dame  1731.480859
77         Indiana  1726.425235
37          Oregon  1699.335811
7             Utah  1690.924666
8    James Madison  1676.987695
87             BYU  1673.836894
107          Miami  1673.571954
53         Georgia  1664.600878


In [7]:
# Cell 7: Combine Predictive Rankings
# Merge Massey and Elo
predictive_rankings = massey_rankings.merge(
    elo_rankings,
    on='team'
)

# Add rank columns
predictive_rankings['massey_rank'] = predictive_rankings['massey_rating'].rank(method='min', ascending=False).astype(int)
predictive_rankings['elo_rank'] = predictive_rankings['elo_rating'].rank(method='min', ascending=False).astype(int)

# Calculate combined predictive score (50% Massey, 50% Elo)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

predictive_rankings['massey_norm'] = scaler.fit_transform(predictive_rankings[['massey_rating']])
predictive_rankings['elo_norm'] = scaler.fit_transform(predictive_rankings[['elo_rating']])

predictive_rankings['predictive_score'] = (
    0.50 * predictive_rankings['massey_norm'] +
    0.50 * predictive_rankings['elo_norm']
)

predictive_rankings['predictive_rank'] = predictive_rankings['predictive_score'].rank(method='min', ascending=False).astype(int)

# Sort by predictive score
predictive_rankings = predictive_rankings.sort_values('predictive_score', ascending=False)

print('='*80)
print('COMBINED PREDICTIVE RANKINGS')
print('='*80)
print()
print(predictive_rankings[[
    'team', 'massey_rating', 'elo_rating', 
    'predictive_score', 'predictive_rank'
]].head(15))

COMBINED PREDICTIVE RANKINGS

             team  massey_rating   elo_rating  predictive_score  \
0      Ohio State      21.659482  1753.790259          1.000000   
1      Texas Tech      20.370698  1751.111530          0.983603   
2         Indiana      19.444894  1726.425235          0.951107   
3      Notre Dame      18.044307  1731.480859          0.940539   
6          Oregon      15.767610  1699.335811          0.886623   
4            Utah      16.091021  1690.924666          0.882470   
5           Miami      15.994943  1673.571954          0.865636   
7             BYU      13.578249  1673.836894          0.839702   
8         Georgia      13.409187  1664.600878          0.829465   
9      Vanderbilt      13.343915  1661.512784          0.825947   
15  James Madison      10.949889  1676.987695          0.814103   
11  South Florida      12.098995  1657.172090          0.808513   
10      Texas A&M      13.111509  1639.816184          0.803682   
12        Alabama      11.765617

---

## Export Rankings

In [8]:
# Cell 8: Export All Rankings
# Individual component rankings
colley_rankings.to_csv(output_dir / f'colley_rankings_{year}_week{week}.csv', index=False)
massey_rankings.to_csv(output_dir / f'massey_rankings_{year}_week{week}.csv', index=False)
elo_rankings.to_csv(output_dir / f'elo_rankings_{year}_week{week}.csv', index=False)
win_pct_rankings.to_csv(output_dir / f'win_pct_rankings_{year}_week{week}.csv', index=False)

# Combined rankings
resume_rankings.to_csv(output_dir / f'resume_rankings_{year}_week{week}.csv', index=False)
predictive_rankings.to_csv(output_dir / f'predictive_rankings_{year}_week{week}.csv', index=False)

print('Rankings exported:')
print(f'  Resume rankings: {output_dir}/resume_rankings_{year}_week{week}.csv')
print(f'  Predictive rankings: {output_dir}/predictive_rankings_{year}_week{week}.csv')
print(f'  Individual components:')
print(f'    - Colley (resume): {output_dir}/colley_rankings_{year}_week{week}.csv')
print(f'    - Win% (resume): {output_dir}/win_pct_rankings_{year}_week{week}.csv')
print(f'    - Massey (predictive): {output_dir}/massey_rankings_{year}_week{week}.csv')
print(f'    - Elo (predictive): {output_dir}/elo_rankings_{year}_week{week}.csv')

Rankings exported:
  Resume rankings: data/output/rankings/resume_rankings_2025_week15.csv
  Predictive rankings: data/output/rankings/predictive_rankings_2025_week15.csv
  Individual components:
    - Colley (resume): data/output/rankings/colley_rankings_2025_week15.csv
    - Win% (resume): data/output/rankings/win_pct_rankings_2025_week15.csv
    - Massey (predictive): data/output/rankings/massey_rankings_2025_week15.csv
    - Elo (predictive): data/output/rankings/elo_rankings_2025_week15.csv


---

## Summary

Rankings complete!

**Resume Rankings (What teams accomplished):**
- Colley Matrix - Win/loss using linear algebra
- Win Percentage - Raw performance

**Predictive Rankings (How good teams are):**
- Massey Ratings - MOV-based (capped at 28, HFA-adjusted by 3.75)
- Elo System - Dynamic game-by-game updates

**Next Step:**
- `03_composite_rankings.ipynb` - Combine resume + predictive + SOR/SOS