





# Classic Elo

The classic Elo algorithm is great for establishing a baseline for comparison of other methods. Due to its age and simplicity, it will almost certainly not be smarter than any liquid markets or more advanced algorithms. However, it is still fairly powerful. It shines when you have a lot of historical data and there are jumps in class. I.e., in college football, classic Elo will properly rate a strong group of five team even after they've played 10 easy opponents. In other words, it can store information about historically good conferences for a long time despite no interplay between conferences. Another example is in soccer when a newly promoted side enters a better division. Despite the promoted team having won 60%+ of their games the prior year, Elo will not project a gigantic winning percentage going forward against tougher competition. Elo also has the advantage that you do not need to know underlying distributions, only win/loss. Since most competitions have a winner and loser, it's very universal.

### Advantages

- Simplicity
- Computational speed
- Universality
- Ratings 'memory' - league heirarchies develop with enough data
- Robust to param choices - a k between 10 and 30 will usually give you results that pass an eye test

### Disadvantages

- No uncertainty. This causes problems when there are new players that you have no idea how good they are, or a player coming off extended absence
- Cannot account for score difference. A win by 50 is the same as a win by 5.
- There is no attempt to model style of play or player idiosyncracies
- Constant K factor. Without some edits, it will not put more weight on certain games than others, i.e., if a team or player is particularly motivated. 
- As with any rating system, if players are allowed to selectively choose opponents, it can ruin the fidelity of the ratings
- Is meant for 1v1 ratings, or team v team. Cannot innately handle 1vMany or Many vs. Many.



In [35]:

import gc
import os

import numpy as np
import pandas as pd

from copy import copy
from tqdm import tqdm
from oddsmaker.state_space import Elo
from pandas.api.types import is_datetime64_any_dtype as is_datetime


### customize to your own
DATA_PATH = 'D://Medium/'



## Quick Start

In this brief introduction, I'll show you how to use the oddsmaker Elo class. The most difficult part, which hopefully isn't that difficult, is to format the data in the correct way. It then takes two primary parameters, k value (how much to update the rankings) and home field advantage (optional). I have some sample men's basketball data below that I will use for demonstration. 



In [3]:


def load_data():
    return pd.read_csv(os.path.join(DATA_PATH, 'ncaam_sample_data.csv'))

m_data = load_data()
m_data.head()


Unnamed: 0,season,team_score,opp_score,is_home,numot,team_fgm,team_fga,team_fgm3,team_fga3,team_ftm,...,opp_or,opp_dr,opp_ast,opp_to,opp_stl,opp_blk,opp_pf,team_name,opp_name,date
0,2003,68,62,0,0,27,58,3,14,11,...,10,22,8,18,9,2,20,Alabama,Oklahoma,2002-11-14
1,2003,70,63,0,0,26,62,8,20,10,...,20,25,7,12,8,6,16,Memphis,Syracuse,2002-11-14
2,2003,62,68,0,0,22,53,2,10,16,...,14,24,13,23,7,1,22,Oklahoma,Alabama,2002-11-14
3,2003,63,70,0,0,24,67,6,24,9,...,15,28,16,13,4,4,18,Syracuse,Memphis,2002-11-14
4,2003,55,81,-1,0,20,46,3,11,12,...,12,24,12,9,9,3,18,E Washington,Wisconsin,2002-11-15



Like I alluded to above, the most difficult part is getting this data in a format which the class accepts. After that, you can take advantage of all the functions. The columns it requires are:

1. **"date"/"rating_period"** - will accept either. a player should only play once per date or once per rating period.  
2. **protag_id** (i.e, player_id or team_id)  
3. **antag_id** (i.e, opp_team_id or opp_player_id)  
4. **"stat"** - name of stat  
5. **"result"** - 0 for loss, 0.5 for tie, 1 for win.

Optional columns are:  
**is_home** - for home field advantage. Assumes -1 for away, 0 for neutral, 1 for home. 

Below, I'll show an example of a function that takes my sample data and processes it into an acceptable format. 



In [4]:

def process(data):
    
    ## date must be a datetime. rating period also acceptable as a numeric
    data['date'] = pd.to_datetime(data['date'])
    
    ## In this example, I'm interested in the difference between two stats - score and rebounds
    data['team_reb'] = data['team_or'].copy()+data['team_dr'].copy()
    data['opp_reb'] = data['opp_or'].copy()+data['opp_dr'].copy()
    data = data.copy()[['date','season','team_name','opp_name','is_home','team_score','opp_score', 'team_reb','opp_reb']]
    data['score_diff'] = data['team_score'].copy()-data['opp_score'].copy()
    data['reb_diff'] = data['team_reb'].copy()-data['opp_reb'].copy()
    
    ## long format necessary
    ## if you are not familiar with the melt function, I would recommend reading about it
    ## https://towardsdatascience.com/reshape-pandas-dataframe-with-melt-in-python-tutorial-and-visualization-29ec1450bb02
    data =data.melt(
        id_vars=['date','season','team_name','opp_name','is_home'], 
        value_vars=['score_diff','reb_diff'], 
        var_name='stat', 
        value_name='difference'
    )
    
    data['result'] = np.where(data['difference']>0, 1, 0)
    data['result'] = np.where(data['difference']==0, 0.5, data['result'].copy())
    data = data.drop(columns=['difference'])
    data = data.sort_values(by=['date','team_name','stat']).reset_index(drop=True)
    
    return data

m_data = process(m_data)
m_data.head(10)



Unnamed: 0,date,season,team_name,opp_name,is_home,stat,result
0,2002-11-14,2003,Alabama,Oklahoma,0,reb_diff,1.0
1,2002-11-14,2003,Alabama,Oklahoma,0,score_diff,1.0
2,2002-11-14,2003,Memphis,Syracuse,0,reb_diff,0.0
3,2002-11-14,2003,Memphis,Syracuse,0,score_diff,1.0
4,2002-11-14,2003,Oklahoma,Alabama,0,reb_diff,0.0
5,2002-11-14,2003,Oklahoma,Alabama,0,score_diff,0.0
6,2002-11-14,2003,Syracuse,Memphis,0,reb_diff,1.0
7,2002-11-14,2003,Syracuse,Memphis,0,score_diff,0.0
8,2002-11-15,2003,E Washington,Wisconsin,-1,reb_diff,0.0
9,2002-11-15,2003,E Washington,Wisconsin,-1,score_diff,0.0



I'll divide the data into historical, holdout, and future to demonstrate some functions. 


In [5]:

historical = m_data.copy().loc[m_data['season']<2021].reset_index(drop=True)
holdout =  m_data.copy().loc[m_data['season']==2021].reset_index(drop=True)
future =  m_data.copy().loc[m_data['season']>2021].reset_index(drop=True)


In [6]:

### for chess they historically use a k of between 12-30. I'll start with 20 as a guess
### home field advantage differs between stats and sports. I think anything between 20-50 is a good guess

elo = Elo(historical, k=20, hfa=20)

## use info to inspect 
elo.info()


There are 2 stats: ['reb_diff', 'score_diff']
There are 358 unique players/teams.
There are 187,894 games from 2002-11-14 00:00:00 to 2020-03-11 00:00:00.


In [7]:

## use run_history to get historical ratings that are not leaky (i.e., don't use the current game's info to cheat)
## takes 1 second for 200,000 games, not bad
## Arpad Elo used to do these calculations by hand, poor guy
history, score = elo.run_history()
history.head()


100%|████████████████████████████████████████████████████████████████████████████| 2419/2419 [00:01<00:00, 1379.42it/s]


Unnamed: 0,date,rating_period,team_name,opp_name,is_home,hfa,stat,result,pregame_rating,pregame_opp_rating,rating_adjustment,probability
0,2002-11-14,1.0,Alabama,Oklahoma,0,0,reb_diff,1.0,1500.0,1500.0,10.0,0.5
1,2002-11-14,1.0,Alabama,Oklahoma,0,0,score_diff,1.0,1500.0,1500.0,10.0,0.5
2,2002-11-14,1.0,Memphis,Syracuse,0,0,reb_diff,0.0,1500.0,1500.0,-10.0,0.5
3,2002-11-14,1.0,Memphis,Syracuse,0,0,score_diff,1.0,1500.0,1500.0,10.0,0.5
4,2002-11-14,1.0,Oklahoma,Alabama,0,0,reb_diff,0.0,1500.0,1500.0,-10.0,0.5


In [8]:

## current ratings gives you the last known rating of each team
current_ratings = elo.current_ratings()
print("End of season 2020 ratings")
## some reasonable numbers, Virginia is probably way too high
current_ratings.sort_values(by=['rating'],ascending=False).head(5)


End of season 2020 ratings


Unnamed: 0,rating,team_name,stat
111,2076.956609,Virginia,score_diff
213,2072.528765,Kansas,score_diff
415,2055.879704,Gonzaga,score_diff
137,2042.261178,Duke,score_diff
273,2009.089036,Villanova,score_diff


In [9]:

## we can then use some plotting helpers to make sure things look reasonable



In [10]:

EC = Elo(train, k=17, hfa=29)
EC.info()
# EC.optimize()

score


NameError: name 'train' is not defined

In [11]:

new_data = test.copy()
col_names= list(new_data)
if 'date' in col_names:
    date_dtype = new_data.date.dtype
    assert(is_datetime(date_dtype)), "Date column must be of type datetime"
    oldest_date = new_data.date.min()
    assert('date' in list(self.data)), "Old data does not contain date column, new data does. Confused whether to use rating period or date."
    print(f"Oldest date in new data is {oldest_date}, starting update from that date...")
    old_data = self.data.copy().loc[self.data['date']<oldest_date].reset_index(drop=True)
    old_data['rating_period'] = old_data.date.copy().rank(method='dense')
else:
    ### use rating period instead
    oldest_rp = new_data.rating_period.min()
    




NameError: name 'test' is not defined

In [12]:

historical



Unnamed: 0,date,season,team_name,opp_name,is_home,stat,result
0,2002-11-14,2003,Alabama,Oklahoma,0,reb_diff,1.0
1,2002-11-14,2003,Alabama,Oklahoma,0,score_diff,1.0
2,2002-11-14,2003,Memphis,Syracuse,0,reb_diff,0.0
3,2002-11-14,2003,Memphis,Syracuse,0,score_diff,1.0
4,2002-11-14,2003,Oklahoma,Alabama,0,reb_diff,0.0
...,...,...,...,...,...,...,...
375783,2020-03-11,2020,Washington St,Colorado,0,score_diff,1.0
375784,2020-03-11,2020,Weber St,CS Sacramento,0,reb_diff,0.0
375785,2020-03-11,2020,Weber St,CS Sacramento,0,score_diff,0.0
375786,2020-03-11,2020,Xavier,DePaul,0,reb_diff,0.0


In [36]:
class Elo():
    
    """
    
    Implements classic Elo algorithm for a single stat or multiple stats simultaneously

    Parameters
    ----------
    data : pandas dataframe
        Dataframe containing the following columns:
            - protag_id: id of protagonist (player or team)
            - antag_id: id of antagonist (player or team)
            - stat: name of stat
            - result: 0 for loss, 1 for win, 0.5 for tie
            - is_home: 1 for home, -1 for away, 0 for neutral
            - date: date of game
            - rating_period: rating period of game (if no date)
    k : int, float, or dict
        Elo k value(s) to use for each stat. If int or float, applies to all stats. If dict, keys must be stat names, values are k values for each stat
    hfa : int, float, or dict
        Home field advantage value(s) to use for each stat. If int or float, applies to all stats. If dict, keys must be stat names, values are hfa values for each stat
    protag_id : str, optional
        Name of protagonist id column in data. The default is 'team_name'.
    antag_id : str, optional
        Name of antagonist id column in data. The default is 'opp_name'.
    result_col : str, optional
        Name of result column in data. The default is 'result'.
    priors : dict, optional
        Dictionary of prior ratings to use for each stat. Keys are protag_ids, values are dicts of stat:rating pairs. The default is None.  
    SRPM: True, False, or None
        "Single Row Per Match" parameter. If True, when team A plays team B, there is only row. When False, the class assumes that there is also a team B vs team A row with mirrored statistics. If None, tries to auto-detect. 
    
    """
    
    
    def __init__(self, 
                 data, 
                 k,
                 hfa=None,
                 protag_id='team_name', 
                 antag_id='opp_name',
                 result_col='result',
                 priors = None,
                 SRPM = None
                ):
        
        self.data = data.copy()
        self.k = k
        self.hfa = hfa
        self.protag_id = protag_id
        self.antag_id = antag_id
        self.result_col = result_col
        self.priors = priors
        self.SRPM = SRPM
        
        self.col_names = list(self.data)
        self.col_names = [cn.lower().strip() for cn in self.col_names]
        self.data.columns=self.col_names
        
        ### column checks ###
        assert(self.protag_id in list(data)), f"{self.protag_id} not in columns, please specify team column name with protag_id argument"
        assert(self.antag_id in list(data)), f"{self.antag_id} not in columns, please specify opponent column name with antag_id argument"
        assert('stat' in list(self.data)), 'No stat column, please add a stat name column to your data '
        assert(self.result_col in list(self.data)), 'Please include an outcome/result column, can specify the name with result col argument'
        assert((('rating_period' in list(self.data))|('date' in list(self.data)))), 'Please make sure either rating_period or date is in columns'
        
        self.stats = sorted(list(self.data.stat.unique()))
        
        ### parameter check ###
        if self.priors is None:
            print("No priors provided, starting every entity with the same rating...")
        else:
            assert(type(self.priors)==dict), "Priors must be a dictionary type"
            
        if self.k is None:
            raise ValueError("Must provide K value")
        else:
            assert(type(self.k) in [dict, float, int]), "K must be a single numeric quantity, or a dict that has unique values for each stat"
            self._add_k(k)
            
        if (('is_home' in self.col_names)&(hfa is None)):
            raise ValueError("If is_home is provided, then need home field advantage (hfa) parameter")
        if (('is_home' not in self.col_names)&(hfa is not None)):
            raise ValueError("If home field advantage is provided, then also need is_home column to denote whether or not is home")
            
        if self.hfa is None:
            print("No home field provided, assuming no home field advantage, or there is already an 'hfa' column...")
        else:
            assert(type(self.hfa) in [dict, float, int]), "HFA must be a single numeric quantity, or a dict that has unique values for each stat"
            self._add_hfa()
        
        ## figure out date or rating period
        if 'date' in self.col_names:
            self.has_date = True
            date_dtype = self.data.date.dtype
            assert(is_datetime(date_dtype)), "Date column must be of type datetime"
        else:
            self.has_date = False

        if (('date' in self.col_names) & ('rating_period' not in self.col_names)):
            self.data['rating_period'] = self.data.date.copy().rank(method='dense')
            
        ### check if data is symmetrical, i.e., for every team a vs team b there is a team b vs team a
        protags = self.data.groupby(['rating_period','stat'])[self.protag_id].apply(set).reset_index().copy()
        antags =self.data.groupby(['rating_period','stat'])[self.antag_id].apply(set).reset_index().copy()
        sym_test = protags.merge(antags, how='left', on=['rating_period','stat'])
        sym_test['sym_diff'] = sym_test[[self.protag_id,self.antag_id]].apply(lambda x: len(x[self.protag_id].symmetric_difference(x[self.antag_id])), axis=1)
        sym_val = sym_test['sym_diff'].mean()
        
        if self.SRPM is None:
            if sym_val < 0.05:
                print("95%+ of matches have symmetrical partner, inferring that there are two rows per match.")
                self.SRPM = False
            else:
                print("Inferring that there is a single row per match.")
                self.SRPM = True
                
        all_teams = set(self.data[self.protag_id].values).union(set(self.data[self.antag_id].values))    
            
        ### useful meta information
        self.num_stats = len(self.stats)
        self.num_protags = len(all_teams)
        self.num_games = len(self.data)//2
        self.num_rating_periods = len(self.data.rating_period)
        
        print(f"This dataframe has {self.num_stats} unique stats for {self.num_games} matches over {self.num_rating_periods} rating periods played by {self.num_protags} players/teams.")
        
        ### checks are done, building internal variables...
        
        ### index maps
        self.protag2index = {}
        for i,protag_id in enumerate(list(all_teams)):
            self.protag2index[protag_id] = i
            
        self.stat2index = {}
        for j,stat in enumerate(self.stats):
            self.stat2index[stat] = j
            
        ### initialize ratings
        self.data['protag_idx'] = self.data[self.protag_id].copy().map(self.protag2index)
        self.data['antag_idx'] = self.data[self.antag_id].copy().map(self.protag2index)
        self.data['stat_idx'] = self.data['stat'].copy().map(self.stat2index)
        if hfa is not None:
            self.data['hfa'] = self.data['stat'].copy().map(self.hfa).copy()*self.data['is_home'].copy()
        elif hfa not in self.col_names:
            self.data['hfa'] = 0
        assert(len(self.data.loc[self.data.hfa.isnull()])==0), f"{self.data.loc[self.data.hfa.is_null()].stat.unique()} do not have a home field advantage number"
        
        self.history = "Use .run_history() to create history"
        
        self.reset_ratings_mat()
        
    def reset_ratings_mat(self):

        self.rating_matrix = np.ones((self.num_protags, self.num_stats))*1500

        ### add priors for those specified
        ### priors must be a dict in form of {protag_id:{stat:rating}}
        if self.priors is not None:
            assert(type(self.priors)==dict), "Priors must be a dict"
            for protag_id, stat_dict in self.priors.items():
                assert(type(stat_dict)==dict), "Each protag_id key in priors dict must be a dict of stat:rating pairs"
                for stat, rating in stat_dict.items():
                    assert(stat in self.stats), f"{stat} is not a stat in the dataset"
                    self.rating_matrix[self.protag2index[protag_id], self.stat2index[stat]] = rating
        return
    
    def _add_k(self, k=None, data=None):

        """
        
        Adds k to self.data if none is passed, otherwise adds k to passed data
        
        """
        if k is None:
            k = copy(self.k)
        
        ### determine format of provided k values
        if type(k)==dict:
            assert(key in self.stats for key in k.keys()), "each key in k value dict must be a stat"
            kval_not_provided = []
            for stat in self.stats:
                if stat not in k:
                    print(f"No k value provided for {stat}, using average of other kvalues...")
                    kval_not_provided.append(stat)
            self.k = k
            for stat in kval_not_provided:
                self.k[stat] = np.mean(self.k.values())
                
        elif ((isinstance(k, int))|(isinstance(k, float))):
            self.k = {}
            for stat in self.stats:
                self.k[stat] = k
        else:
            raise ValueError("K values must either be a numeric (to assign to all stats) or a dict (where keys are stat names, values to be applied individually)")
        
        if data is None:
            self.data['k'] = self.data['stat'].map(self.k).copy()
            assert(len(self.data.loc[self.data.k.isnull()])==0), f"{self.data.loc[self.data.k.is_null()].stat.unique()} do not have a k factor specified in the k factor dict"
        else:
            data['k'] = data['stat'].map(self.k).copy()
            assert(len(data.loc[data.k.isnull()])==0), f"{data.loc[data.k.is_null()].stat.unique()} do not have a k factor specified in the k factor dict"
            return data
        
        return
    
    def _add_hfa(self, hfa=None, data=None):

        """
        
        Adds home field advantage to self.data if none is passed, otherwise adds hfa to passed data and returns it
        
        """
        
        if hfa is None:
            hfa = copy(self.hfa)
        
        if data is None:
            ### apply to self.data
            ### check for home field advantage
            if 'is_home' not in list(self.data):
                ### if no home field advantage column, assume no home field advantage
                self.data['is_home']=0
                self.has_hfa = False
            else:
                self.has_hfa = True

            locs = (self.data['is_home'].unique())
            assert(all([(np.isclose(l,0)|(np.isclose(l,1)|(np.isclose(l,-1)))) for l in locs])), "is_home col needs either 1 for home, -1 for away, or 0 for neutral"
            
            ### determine format of provided home field advantages
            if ((hfa is None)|(hfa==0)):
                self.hfa = {}
                for stat in self.stats:
                    self.hfa[stat] = 0
            elif type(hfa)==dict:
                assert(key in self.stats for key in hfa.keys()), "each key in home field advantage dict must be a stat"
                for stat in self.stats:
                    if stat not in hfa:
                        print(f"No home field advantage provided for {stat}")
                self.hfa = hfa
            elif ((isinstance(hfa, int))|(isinstance(hfa, float))):
                self.hfa = {}
                for stat in self.stats:
                    self.hfa[stat] = hfa
            else:
                raise ValueError("Home field advantage must either be a numeric (to assign to all stats) or a dict (where keys are stat names, values to be applied individually)")

        else:
            ### apply to passed data
            if 'is_home' not in list(data):
                data['is_home']=0
            if data.is_home.isnull().any():
                data['is_home'].fillna(0, inplace=True)
            locs = (data['is_home'].unique())
            assert(all([(np.isclose(l,0)|(np.isclose(l,1)|(np.isclose(l,-1)))) for l in locs])), "is_home col needs either 1 for home, -1 for away, or 0 for neutral"

        if data is None:
            self.data['hfa'] = self.data['stat'].map(self.hfa).copy()*self.data['is_home'].copy()
            assert(len(self.data.loc[self.data.hfa.isnull()])==0), f"{self.data.loc[self.data.hfa.is_null()].stat.unique()} do not have a home field advantage number"
        else:
            data['hfa'] = data['stat'].map(self.hfa).copy()*data['is_home'].copy()
            assert(len(data.loc[data.hfa.isnull()])==0), f"{data.loc[data.hfa.isnull()].stat.unique()} do not have a home field advantage number"
            return data

        return
        
        
elo = Elo(historical,k=10,hfa=0)


No priors provided, starting every entity with the same rating...
95%+ of matches have symmetrical partner, inferring that there are two rows per match.
This dataframe has 2 unique stats for 187894 matches over 375788 rating periods played by 358 players/teams.


Unnamed: 0,date,season,team_name,opp_name,is_home,stat,result,k,rating_period,protag_idx,antag_idx,stat_idx,hfa
0,2002-11-14,2003,Alabama,Oklahoma,0,reb_diff,1.0,10,1.0,299,328,0,0
1,2002-11-14,2003,Alabama,Oklahoma,0,score_diff,1.0,10,1.0,299,328,1,0
2,2002-11-14,2003,Memphis,Syracuse,0,reb_diff,0.0,10,1.0,36,171,0,0
3,2002-11-14,2003,Memphis,Syracuse,0,score_diff,1.0,10,1.0,36,171,1,0
4,2002-11-14,2003,Oklahoma,Alabama,0,reb_diff,0.0,10,1.0,328,299,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
375783,2020-03-11,2020,Washington St,Colorado,0,score_diff,1.0,10,2419.0,226,88,1,0
375784,2020-03-11,2020,Weber St,CS Sacramento,0,reb_diff,0.0,10,2419.0,116,245,0,0
375785,2020-03-11,2020,Weber St,CS Sacramento,0,score_diff,0.0,10,2419.0,116,245,1,0
375786,2020-03-11,2020,Xavier,DePaul,0,reb_diff,0.0,10,2419.0,157,344,0,0


In [10]:
priors = {}
priors = None
prior_type = type(priors)
prior_type in [dict, str, float]

False

In [22]:
m_data['rating_period'] = m_data.date.copy().rank(method='dense')

In [20]:

stats = list(m_data.stat.unique())
first_stat = stats[0]
m_data.loc[m_data['stat']==first_stat]




Unnamed: 0,date,season,team_name,opp_name,is_home,stat,result
0,2002-11-14,2003,Alabama,Oklahoma,0,reb_diff,1.0
2,2002-11-14,2003,Memphis,Syracuse,0,reb_diff,0.0
4,2002-11-14,2003,Oklahoma,Alabama,0,reb_diff,0.0
6,2002-11-14,2003,Syracuse,Memphis,0,reb_diff,1.0
8,2002-11-15,2003,E Washington,Wisconsin,-1,reb_diff,0.0
...,...,...,...,...,...,...,...
413110,2022-04-02,2022,Kansas,Villanova,0,reb_diff,1.0
413112,2022-04-02,2022,North Carolina,Duke,0,reb_diff,1.0
413114,2022-04-02,2022,Villanova,Kansas,0,reb_diff,0.0
413116,2022-04-04,2022,Kansas,North Carolina,0,reb_diff,0.0


In [37]:

### check if data is symmetrical, i.e., for every team a vs team b there is a team b vs team a
protags = m_data.groupby(['rating_period','stat'])['team_name'].apply(set).reset_index().copy()
antags = m_data.groupby(['rating_period','stat'])['opp_name'].apply(set).reset_index().copy()
sym_test = protags.merge(antags, how='left', on=['rating_period','stat'])
sym_test['sym_diff'] = sym_test[['team_name','opp_name']].apply(lambda x: len(x.team_name.symmetric_difference(x.opp_name)), axis=1)
sym_val = sym_test['sym_diff'].mean()

if sym_val < 0.05:
    print("95%+ of matches have symmetrical partner, assuming symmetrical format (team a vs. team b and vice versa are both included)")
else:
    print("Large amount of matches do not have symmetrical partner, assuming one row per game (team a vs. team b is included, but not the team b vs. team a mirror)")
    
    

95%+ of matches have symmetrical partner, assuming symmetrical format (team a vs. team b and vice versa are both included)


0.0