# Elo ratings based on regular-season games

This notebook is nearly identical to the notebook found at https://www.kaggle.com/code/lpkirwin/fivethirtyeight-elo-ratings/notebook , which I used as the starting point to implement these elo ratings.

This notebook implements Elo ratings for NCAA regular-season games using the same formula as FiveThirtyEight's NBA Elo ratings. My resources for this were:

- https://en.wikipedia.org/wiki/Elo_rating_system
- https://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/
- https://github.com/fivethirtyeight/nfl-elo-game/blob/master/forecast.py

(The last link above is for 538's NFL Elos (not NBA), but it was useful for a code example of the approach. )

The idea here is to get another feature to be plugged in (alongside seeds, etc.) when predicting tournament games.

In [7]:
# %pip install numpy
# %pip install pandas
# %pip install -U scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.4.1.post1-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.12.0-cp312-cp312-win_amd64.whl.metadata (60 kB)
     ---------------------------------------- 0.0/60.4 kB ? eta -:--:--
     ---------------------------------------- 0.0/60.4 kB ? eta -:--:--
     ------ --------------------------------- 10.2/60.4 kB ? eta -:--:--
     ------ --------------------------------- 10.2/60.4 kB ? eta -:--:--
     ------ --------------------------------- 10.2/60.4 kB ? eta -:--:--
     -------------------------------- ----- 51.2/60.4 kB 260.9 kB/s eta 0:00:01
     -------------------------------------- 60.4/60.4 kB 267.0 kB/s eta 0:00:00
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Downloading threadpoolctl-3.4.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.4.1

In [8]:
import numpy as np
import pandas as pd
from sklearn.metrics import log_loss

The following parameter `K` affects how quickly the Elo adjusts to new information. Here I'm just using the value that 538 found most appropriate for the NBA -- I haven't done any analysis around whether this value is also the best in terms of college basketball.

I also use the same home-court advantage as 538: the host team gets an extra 100 points added to their Elo.

In [11]:
K = 20.
HOME_ADVANTAGE = 100.
DATA_PATH = "../../march-machine-learning-mania-2024-data"

In [12]:
rs = pd.read_csv(DATA_PATH + "/MRegularSeasonCompactResults.csv")
rs.head(3)

Unnamed: 0,Season,DayNum,WTeamID,WScore,LTeamID,LScore,WLoc,NumOT
0,1985,20,1228,81,1328,64,N,0
1,1985,25,1106,77,1354,70,H,0
2,1985,25,1112,63,1223,56,H,0


In [13]:
team_ids = set(rs.WTeamID).union(set(rs.LTeamID))
len(team_ids)

378

I'm going to initialise all teams with a rating of 1500. There are two differences here with the 538 approach:

- New entrants (when and where there are any) will start at the average 1500 Elo rather than a lower rating probably more appropriate for a new team.
- There is no reversion to the mean between seasons. Each team's Elo starts exactly where it left off the previous season.  My justification here is that we only care about the end-of-season rating in terms of making predictions on the NCAA tournament, so even if ratings are a little off at first, they have the entire regular season to converge to something more appropriate.

In [14]:
# This dictionary will be used as a lookup for current
# scores while the algorithm is iterating through each game
elo_dict = dict(zip(list(team_ids), [1500] * len(team_ids)))

In [15]:
# Elo updates will be scaled based on the margin of victory
rs['margin'] = rs.WScore - rs.LScore

The three functions below contain the meat of the Elo calculation:

In [16]:
def elo_pred(elo1, elo2):
    return(1. / (10. ** (-(elo1 - elo2) / 400.) + 1.))

def expected_margin(elo_diff):
    return((7.5 + 0.006 * elo_diff))

def elo_update(w_elo, l_elo, margin):
    elo_diff = w_elo - l_elo
    pred = elo_pred(w_elo, l_elo)
    mult = ((margin + 3.) ** 0.8) / expected_margin(elo_diff)
    update = K * mult * (1 - pred)
    return(pred, update)

In [17]:
# I'm going to iterate over the games dataframe using 
# index numbers, so want to check that nothing is out
# of order before I do that.
assert np.all(rs.index.values == np.array(range(rs.shape[0]))), "Index is out of order."

In [18]:
preds = []
w_elo = []
l_elo = []

# Loop over all rows of the games dataframe
for row in rs.itertuples():
    
    # Get key data from current row
    w = row.WTeamID
    l = row.LTeamID
    margin = row.margin
    wloc = row.WLoc
    
    # Does either team get a home-court advantage?
    w_ad, l_ad, = 0., 0.
    if wloc == "H":
        w_ad += HOME_ADVANTAGE
    elif wloc == "A":
        l_ad += HOME_ADVANTAGE
    
    # Get elo updates as a result of the game
    pred, update = elo_update(elo_dict[w] + w_ad,
                              elo_dict[l] + l_ad, 
                              margin)
    elo_dict[w] += update
    elo_dict[l] -= update
    
    # Save prediction and new Elos for each round
    preds.append(pred)
    w_elo.append(elo_dict[w])
    l_elo.append(elo_dict[l])

In [19]:
rs['w_elo'] = w_elo
rs['l_elo'] = l_elo

Let's take a look at the last few games in the games dataframe to check that the Elo ratings look reasonable.

In [20]:
rs.tail(10)

Unnamed: 0,Season,DayNum,WTeamID,WScore,LTeamID,LScore,WLoc,NumOT,margin,w_elo,l_elo
187279,2024,131,1433,66,1386,60,N,0,6,1740.599065,1610.559505
187280,2024,131,1436,66,1262,61,H,0,5,1706.829704,1521.851879
187281,2024,131,1443,78,1431,71,N,0,7,1606.102006,1509.687131
187282,2024,131,1458,76,1345,75,N,1,1,1891.365469,2077.683888
187283,2024,131,1463,69,1165,57,N,0,12,1702.498239,1603.064254
187284,2024,132,1120,86,1196,67,N,0,19,1974.905278,1855.421014
187285,2024,132,1182,57,1433,51,N,0,6,1671.421584,1730.490537
187286,2024,132,1228,93,1458,87,N,0,6,1962.80138,1885.385706
187287,2024,132,1412,85,1396,69,N,0,16,1678.45386,1524.918378
187288,2024,132,1463,62,1135,61,N,0,1,1703.887346,1464.923589


Looks OK. How well do they generally predict games? Since all of the Elo predictions calculated above have a true outcome of 1, it's really simple to check what the log loss would be on these 150k games:

In [21]:
np.mean(-np.log(preds))

0.5362111392019109

(This is a pretty rough measure, because this is looking only at regular-season games, which is not really what we're ultimately interested in predicting.)

Final step: for each team, pull out the final Elo rating at the end of each regular season. This is a bit annoying because the team ID could be in either the winner or loser column for their last game of the season..

In [22]:
def final_elo_per_season(df, team_id):
    d = df.copy()
    d = d.loc[(d.WTeamID == team_id) | (d.LTeamID == team_id), :]
    d.sort_values(['Season', 'DayNum'], inplace=True)
    d.drop_duplicates(['Season'], keep='last', inplace=True)
    w_mask = d.WTeamID == team_id
    l_mask = d.LTeamID == team_id
    d['season_elo'] = None
    d.loc[w_mask, 'season_elo'] = d.loc[w_mask, 'w_elo']
    d.loc[l_mask, 'season_elo'] = d.loc[l_mask, 'l_elo']
    out = pd.DataFrame({
        'team_id': team_id,
        'season': d.Season,
        'season_elo': d.season_elo
    })
    return(out)

In [23]:
df_list = [final_elo_per_season(rs, id) for id in team_ids]
season_elos = pd.concat(df_list)

In [24]:
season_elos.sample(10)

Unnamed: 0,team_id,season,season_elo
11351,1333,1987,1571.507739
56260,1174,1998,1491.21798
15206,1190,1988,1345.84368
92502,1407,2006,1489.115907
43834,1253,1995,1662.596313
113325,1266,2010,1889.845536
155987,1152,2018,1034.87917
128835,1169,2013,1380.167543
43824,1400,1995,1817.16618
181511,1156,2023,1517.37151


Before I save the end-of-season elo ratings into a csv file, I'm going to add the team names as a column.

In [26]:
team_name_df = pd.read_csv(DATA_PATH + "/MTeams.csv")
season_elos = season_elos.merge(team_name_df, left_on='team_id', right_on='TeamID', how='left')
season_elos.drop('TeamID', axis=1, inplace=True) # Drop redundant 'TeamID' column

season_elos = season_elos[['team_id', 'TeamName', 'season', 'season_elo']]
season_elos.to_csv("results/season_elos.csv", index=None)