

# Classic Elo

Classic elo algorithm is great for establishing a baseline for comparison of other methods. Due to its age and simplicity, it will almost certainly not beat any liquid markets. However, it shines when you have a lot of historical data and there are jumps in class. I.e., in college football, classic Elo will properly rate a strong group of five team even after they've played 10 easy opponents. It will still favor teams from historically better conferences. Another example is in soccer when a newly promoted side enters a better division. Despite the promoted team having won 60%+ of their games the prior year, Elo will not project a gigantic winning percentage going forward against tougher competition. Elo also has the advantage that you do not need to know underlying distributions, only win/loss. 

### Advantages

- Simplicity
- Speed
- Good at handling cross-league, cross-conference play
- Computationally cheap

### Disadvantages

- No uncertainty. This causes problems when there are new players that you have no idea how good they are, or a player coming off extended absence
- Cannot account for score difference. A win by 50 is the same as a win by 5.
- Constant K factor. If players try harder in some games than others, it is not intrinsically capable of handling that
- As with any rating system, if players are allowed to selectively choose opponents, it can ruin the fidelity of the ratings



In [90]:


import os

import numpy as np
import pandas as pd

from tqdm import tqdm
from pandas.api.types import is_datetime64_any_dtype as is_datetime


### customize to your own
DATA_PATH = 'D://Medium/'



In [102]:
# data['result'] = np.where(data['score_diff']>0, 1, 0)
print(list(m_data))




['season', 'team_score', 'opp_score', 'is_home', 'numot', 'team_fgm', 'team_fga', 'team_fgm3', 'team_fga3', 'team_ftm', 'team_fta', 'team_or', 'team_dr', 'team_ast', 'team_to', 'team_stl', 'team_blk', 'team_pf', 'opp_fgm', 'opp_fga', 'opp_fgm3', 'opp_fga3', 'opp_ftm', 'opp_fta', 'opp_or', 'opp_dr', 'opp_ast', 'opp_to', 'opp_stl', 'opp_blk', 'opp_pf', 'team_name', 'opp_name', 'date']


In [124]:


def load_data():
    return pd.read_csv(os.path.join(DATA_PATH, 'ncaam_sample_data.csv'))

def process(data):
    
    data['team_reb'] = data['team_or'].copy()+data['team_dr'].copy()
    data['opp_reb'] = data['opp_or'].copy()+data['opp_dr'].copy()
    data = data.copy()[['date','season','team_name','opp_name','is_home','team_score','opp_score', 'team_reb','opp_reb']]
    data['date'] = pd.to_datetime(data['date'])
    data['score_diff'] = data['team_score'].copy()-data['opp_score'].copy()
    data['reb_diff'] = data['team_reb'].copy()-data['opp_reb'].copy()
    
    ## long format preferred
    data =data.melt(
        id_vars=['date','season','team_name','opp_name','is_home'], 
        value_vars=['score_diff','reb_diff'], 
        var_name='stat', 
        value_name='difference'
    )
    
    data['result'] = np.where(data['difference']>0, 1, 0)
    data['result'] = np.where(data['difference']==0, 0.5, data['result'].copy())
    
    data = data.sort_values(by=['date','team_name','stat']).reset_index(drop=True)
    
    return data

m_data = load_data()
m_data = process(m_data)


In [153]:

class EloClassic():
    
    """
    
    Implements classic Elo algorithm for multiple stats
    
    """
    
    
    def __init__(self, 
                 data, 
                 k=10, 
                 protag_id='team_name', 
                 antag_id='opp_name', 
                 hfa=True, 
                 result_col='result',
                 priors = None
                ):
        
        self.data = data
        self.k=k
        self.protag_id = protag_id
        self.antag_id = antag_id
        assert('stat' in list(self.data)), 'No stat column'
        self.stats = list(self.data.stat.unique())
        self.hfa = hfa
        self.result_col = result_col
        self.priors = priors
        
        results = list(self.data[self.result_col].unique())
        assert(all([(np.isclose(r,0)|(np.isclose(r,1)|(np.isclose(r,0.5)))) for r in results])), "Results must be zero (for loss) or one (for win) or 0.5 (for tie)"
        
        col_names = list(self.data)
        col_names = [cn.lower().strip() for cn in col_names]
        assert((('date' in col_names)|('rating_period' in col_names))), "Need either a date column or a rating period column"
        self.data.columns=col_names
        
        if 'date' in col_names:
            date_dtype = self.data.date.dtype
            assert(is_datetime(date_dtype)), "Date column must be of type datetime"
        else:
            date_dtype_check = True
        
        if (('date' in col_names) & ('rating_period' not in col_names)):
            self.data['rating_period'] = self.data.date.copy().rank(method='dense')

        self.protag_ids = set(self.data[self.protag_id].unique())
        self.antag_ids = set(self.data[self.antag_id].unique())
        
        assert(len(protag_ids.symmetric_difference(antag_ids))==0), "In SPR format, need a row for each team in dataframe (two rows per game)"
        
        self.num_stats = len(self.stats)
        self.num_protags = len(protag_ids)
        self.num_games = len(self.data)//2
        
        self.protag2index = {}
        for i,protag_id in enumerate(self.protag_ids):
            self.protag2index[protag_id] = i
            
        self.stat2index = {}
        for j,stat in enumerate(self.stats):
            self.stat2index[stat] = j
            
        self.data['protag_idx'] = self.data[self.protag_id].map(self.protag2index)
        self.data['antag_idx'] = self.data[self.antag_id].map(self.protag2index)
        self.data['stat_idx'] = self.data['stat'].map(self.stat2index)
        
        self.rating_matrix = np.ones((self.num_protags, self.num_stats))*1500
        
        return
    
    def _rating_period_update(self, protag_ratings, antag_ratings, results):
        probs = 1/(1+10**((antag_ratings-protag_ratings)/400))
        return protag_ratings + self.k*(results - probs)
    
    def info(self):
        
        print(f"There are {self.num_stats} stats: {self.stats}")
        print(f"There are {self.num_protags} unique players/teams.")
        print(f"There are {self.num_games} games.")
        
        return
    
    
    def history(self):
        
        history = []
        quick_iterator = self.data.groupby(['rating_period'])
        for rp_index, rating_period in tqdm(quick_iterator, total=len(quick_iterator)):
            
            ## append pregame ratings to history
            pregame_protag_ratings = self.rating_matrix[rating_period.protag_idx.values, rating_period.stat_idx.values]
            pregame_antag_ratings = self.rating_matrix[rating_period.antag_idx.values, rating_period.stat_idx.values]

            to_append = rating_period[['date',self.protag_id,self.antag_id,'stat',self.result_col]].copy()
            to_append['pregame_rating'] = pregame_protag_ratings
            to_append['pregame_opp_rating'] = pregame_antag_ratings
            
            new_ratings = self._rating_period_update(pregame_protag_ratings, pregame_antag_ratings, rating_period.result.values)
            
            history.append(to_append)
            
            self.rating_matrix[rating_period.protag_idx.values, rating_period.stat_idx.values] = new_ratings

        self.history = pd.concat(history, axis=0).reset_index(drop=True)
        return self.history
    
    
    def optimize(self):
        """
        optimizes k value and home field advantage
        """
        
        return
    
    
    def update(self):
        
        return
    
    def predict(self):
        
        return
    
    


EC = EloClassic(m_data)
EC.info()
    
    

There are 2 stats: ['reb_diff', 'score_diff']
There are 363 unique players/teams.
There are 206560 games.


In [147]:

protag_ratings = np.array([1600, 1400])
antag_ratings = np.array([1400, 1600])
results = np.array([1,0])
probs = 1/(1+10**(()/400))
probs



array([0.75974693, 0.24025307])

In [150]:

EC.history()


100%|████████████████████████████████████████████████████████████████████████████| 2674/2674 [00:01<00:00, 1770.46it/s]


Unnamed: 0,date,team_name,opp_name,stat,result,pregame_rating,pregame_opp_rating
0,2002-11-14,Alabama,Oklahoma,reb_diff,1.0,1500.000000,1500.000000
1,2002-11-14,Alabama,Oklahoma,score_diff,1.0,1500.000000,1500.000000
2,2002-11-14,Memphis,Syracuse,reb_diff,0.0,1500.000000,1500.000000
3,2002-11-14,Memphis,Syracuse,score_diff,1.0,1500.000000,1500.000000
4,2002-11-14,Oklahoma,Alabama,reb_diff,0.0,1500.000000,1500.000000
...,...,...,...,...,...,...,...
413115,2022-04-02,Villanova,Kansas,score_diff,0.0,1949.472836,1976.384518
413116,2022-04-04,Kansas,North Carolina,reb_diff,0.0,1779.906395,1960.127020
413117,2022-04-04,Kansas,North Carolina,score_diff,1.0,1980.998000,1893.447282
413118,2022-04-04,North Carolina,Kansas,reb_diff,1.0,1960.127020,1779.906395


In [151]:

history_test = EC.history.copy()


In [152]:

history_test.corr()


Unnamed: 0,result,pregame_rating,pregame_opp_rating
result,1.0,0.214158,-0.214158
pregame_rating,0.214158,1.0,0.402717
pregame_opp_rating,-0.214158,0.402717,1.0


In [123]:

rating_period


Unnamed: 0,date,season,team_name,opp_name,is_home,stat,difference,result,rating_period,protag_idx,antag_idx,stat_idx
413116,2022-04-04,2022,Kansas,North Carolina,0,reb_diff,-15,0,2674.0,211,60,0
413117,2022-04-04,2022,Kansas,North Carolina,0,score_diff,3,1,2674.0,211,60,1
413118,2022-04-04,2022,North Carolina,Kansas,0,reb_diff,15,1,2674.0,60,211,0
413119,2022-04-04,2022,North Carolina,Kansas,0,score_diff,-3,0,2674.0,60,211,1


In [None]:


self.rating_matrix[]

