# Get Elo Rating

The purpose of this notebook is to explore the best way to calculate the ELO rating for teams across the life of the AFL. 

As with any notebook first let's import the libraries that will be needed

In [1]:
import pandas as pd
import plotly.express as px

Import the dataset

In [2]:
matches = pd.read_csv('../afltables_matches.csv', index_col=0)

To keep this simpler and work out the formulas first lets clean the dataset to only the features we need and just the first season of AFL

In [3]:
columns_wanted = ['year', 'round', 'home_team', 'home_final_score', 'away_team', 'away_final_score']
season_1897 = matches.loc[matches.year == 1897, columns_wanted].copy()

> How crazy that we have the stats of matches from 1897!!

So the general gist of ELO is that every team starts with the score of 1500. When you win a match you gain some points from the loser. You gain more points than normal if you beat a team with a higher ELO rating than you and vice versa if you beat a team with a lower ELO than you then you'll gain less points. 

Over time the better teams will end up with a higher ELO score.

Let's try working out the ELO change for the first match recorded.

In [4]:
first_match = season_1897.head(1).copy()
first_match

Unnamed: 0,year,round,home_team,home_final_score,away_team,away_final_score
0,1897,1,Fitzroy,49.0,Carlton,16.0


## How to calculate an ELO rating
First how do we actually calculate the new ratings.

For this first match both Fitzroy and Carlton will be given the initial ELO rating of 1500. 

In [5]:
first_match['home_team_current_rating'] = 1500
first_match['away_team_current_rating'] = 1500
first_match

Unnamed: 0,year,round,home_team,home_final_score,away_team,away_final_score,home_team_current_rating,away_team_current_rating
0,1897,1,Fitzroy,49.0,Carlton,16.0,1500,1500


## The Formulas

> The ELO rating formula has a few variables that can be tweaked based on the game it's rating however, for this initial exploration we will use the standard forumla

To calculate the ELO rating you need two formulas

1. The expected rating
2. The ELO rating change

### Calculate the expected probabilty
The formula is:

$$
E_1 = \frac{1}{1 + 10^\frac{R_2 - R_1}{400}}
$$

Where $E$ is the expected ELO rating for a given team, $R$ is the current ELO rating for a given team. This formula is the magic to working out the expected probability that team will win given their ELO rating and their competitors.

### Calculate ELO Rating Change

$$
NR = R_1 + K(S_1 - E_1)
$$

Where $NR$ is the new rating for a given team, $R$ is the current rating for a given team, $K$ is a weighting (default value is 16), $S$ is the actual result and $E$ is the expected probability calculated earlier.

As a football match can be wither won, loss or drawn the results are as follows:
- Win = 1
- Loss = 0
- Draw = 0.5

Let's create two functions that take the relevant inputs and find these two formulas

In [6]:
def calculate_expected_probability(team1_current_elo, team2_current_elo):
    r1 = team1_current_elo
    r2 = team2_current_elo
    return 1 / (1 + pow(10, ((r2 - r1) / 400)))

def calculate_rating_change(team1_current_elo, team2_current_elo, result):
    e1 = calculate_expected_probability(team1_current_elo, team2_current_elo)
    r1 = team1_current_elo
    k = 16
    s1 = result
    return r1 + k * (s1 - e1)

Let's apply this to the first round

First things first let's work out who actually won the match

In [7]:
# Who won?
get_winner = lambda match: match.home_team if match.home_final_score > match.away_final_score else match.away_team
def get_winner(match):
    """`match` is a row in the dataset"""
    if match.home_final_score > match.away_final_score:
        return match.home_team
    elif match.home_final_score < match.away_final_score:
        return match.away_team
    else:
        return "Draw" 
    
first_match['winner'] = first_match.apply(get_winner, axis=1)
first_match

Unnamed: 0,year,round,home_team,home_final_score,away_team,away_final_score,home_team_current_rating,away_team_current_rating,winner
0,1897,1,Fitzroy,49.0,Carlton,16.0,1500,1500,Fitzroy


We also need a way to determine the $S$ (result) for a team. If they won they should get a value of 1, if they lost then 0 and if it was a draw then 0.5

In [8]:
def get_result_value(team_name, winner):
    if team_name == winner:
        return 1
    elif winner == "Draw":
        return 0.5
    else:
        return 0

Perfect! Let's pull this all together and work out the teams new ratings after the match.

In [9]:
# Calculate Fitzroy's new rating
first_match['home_team_new_rating'] = calculate_rating_change(first_match.home_team_current_rating[0],
                                                              first_match.away_team_current_rating[0],
                                                              get_result_value(first_match.home_team[0], first_match.winner[0]))

# Calculate Carlton's new rating
first_match['away_team_new_rating'] = calculate_rating_change(first_match.away_team_current_rating[0],
                                                              first_match.home_team_current_rating[0],
                                                              get_result_value(first_match.away_team[0], first_match.winner[0]))

print(f"The winner was {first_match.winner[0]}")
print(f"Fitzroy's new rating: {first_match.home_team_new_rating[0]}")
print(f"Carlton's new rating: {first_match.away_team_new_rating[0]}")


The winner was Fitzroy
Fitzroy's new rating: 1508.0
Carlton's new rating: 1492.0


Great so we can see that Fitroy gained 8 points from Carlton.

For sake of demonstration lets see what happened if Fitzroy had 1600 ELO to begin with.

In [10]:
round(calculate_rating_change(1600, 1500, 1))

1606

As you could see they would have only gained 6 points rather than 8. This makes the system slightly fairer as teams who are expected to win don't gain as much of a reward. 

Ideally we will want to hone this change in values so that over the lifespan of the sport teams don't exceed a maximum of 2000 or a minimum of 1000. 

Let's now calculate the New ELO ratings for the rest of the round.

In [11]:
first_round = season_1897[season_1897['round'] == '1'].copy()
first_round['home_team_current_rating'] = 1500
first_round['away_team_current_rating'] = 1500

first_round['winner'] = first_round.apply(get_winner, axis=1)


first_round['home_team_new_rating'] = first_round.apply(
    lambda row: calculate_rating_change(row.home_team_current_rating,
                                        row.away_team_current_rating,
                                        get_result_value(row.home_team, row.winner)),
    axis = 1)

first_round['away_team_new_rating'] = first_round.apply(
    lambda row: calculate_rating_change(row.away_team_current_rating,
                                        row.home_team_current_rating,
                                        get_result_value(row.away_team, row.winner)),
    axis = 1)

first_round

Unnamed: 0,year,round,home_team,home_final_score,away_team,away_final_score,home_team_current_rating,away_team_current_rating,winner,home_team_new_rating,away_team_new_rating
0,1897,1,Fitzroy,49.0,Carlton,16.0,1500,1500,Fitzroy,1508.0,1492.0
1,1897,1,Collingwood,41.0,St Kilda,16.0,1500,1500,Collingwood,1508.0,1492.0
2,1897,1,Geelong,24.0,Essendon,47.0,1500,1500,Essendon,1492.0,1508.0
3,1897,1,South Melbourne,27.0,Melbourne,44.0,1500,1500,Melbourne,1492.0,1508.0


Awesome! We can see that all the winners gained 8 points whilst all the losers lost 8. 

Now we have a problem of calculating round two and getting the current rating. We can't hardcode in 1500 this time so what do we do?

Unfortunately, the nature of this problem doesn't lend itself to vectorisation. We can't treat every row independently as a team's current rating is reliant on their previous new rating and we need to update multiple columns at the same time. As a result we need to go old school and use a for loop (archaic I know :)). We will iterate through each row and calculate both teams current and future ratings.
Now how we actually get the current rating for a team could likely be done through dataframe manipulation to find their previous match and then new rating but I found it much easier to simply create a separate dictionary to hold each teams current rating as the script processes the dataset. During each for loop you look in the dictionary for a teams current rating and then at the end of the loop you update the team's current rating with the new rating.

In [12]:
# Initialise Columns
season_1897['home_team_current_rating'] = 1500
season_1897['away_team_current_rating'] = 1500
season_1897['home_team_new_rating'] = 1500
season_1897['away_team_new_rating'] = 1500

# Calculate winners
season_1897['winner'] = season_1897.apply(get_winner, axis=1)

# Create dict to hold current ratings
current_ratings = {team: 1500 for team in season_1897.home_team.unique()}

# Loop through every match of the season
for index, match in season_1897.iterrows():
    # Get Current Ratings
    home_team_current_rating = current_ratings[match.home_team]
    away_team_current_rating = current_ratings[match.away_team]
    
    # Get New Ratings
    home_team_new_rating = calculate_rating_change(home_team_current_rating, 
                                                   away_team_current_rating, 
                                                   get_result_value(match.home_team, match.winner))
    
    away_team_new_rating = calculate_rating_change(away_team_current_rating, 
                                                   home_team_current_rating, 
                                                   get_result_value(match.away_team, match.winner))
    
    # Update Current Rating Dict
    current_ratings[match.home_team] = home_team_new_rating
    current_ratings[match.away_team] = away_team_new_rating
        
    # Update DF
    season_1897.at[index, 'home_team_current_rating'] = home_team_current_rating
    season_1897.at[index, 'away_team_current_rating'] = away_team_current_rating
    season_1897.at[index, 'home_team_new_rating'] = home_team_new_rating
    season_1897.at[index, 'away_team_new_rating'] = away_team_new_rating

season_1897


Unnamed: 0,year,round,home_team,home_final_score,away_team,away_final_score,home_team_current_rating,away_team_current_rating,home_team_new_rating,away_team_new_rating,winner
0,1897,1,Fitzroy,49.0,Carlton,16.0,1500,1500,1508,1492,Fitzroy
1,1897,1,Collingwood,41.0,St Kilda,16.0,1500,1500,1508,1492,Collingwood
2,1897,1,Geelong,24.0,Essendon,47.0,1500,1500,1492,1508,Essendon
3,1897,1,South Melbourne,27.0,Melbourne,44.0,1500,1500,1492,1508,Melbourne
4,1897,2,South Melbourne,40.0,Carlton,36.0,1492,1492,1500,1484,South Melbourne
...,...,...,...,...,...,...,...,...,...,...,...
57,1897,Semi Final,Geelong,29.0,Essendon,35.0,1558,1554,1550,1562,Essendon
58,1897,Semi Final,Essendon,70.0,Collingwood,30.0,1562,1535,1569,1527,Essendon
59,1897,Semi Final,Geelong,46.0,Melbourne,37.0,1550,1530,1557,1522,Geelong
60,1897,Semi Final,Geelong,52.0,Collingwood,48.0,1557,1527,1565,1520,Geelong


Voila! We've easily calculate the ELO rating for every team and every match throughout the 1897 season. Now let's apply this to the entire dataset.

Also note a little check has been made if the match recorded was a "bye". In that case the loop was continued as the team should not gain or lost any points. 

In [13]:
# Initialise Columns
matches['home_team_current_rating'] = 1500
matches['away_team_current_rating'] = 1500
matches['home_team_new_rating'] = 1500
matches['away_team_new_rating'] = 1500

# Calculate winners
matches['winner'] = matches.apply(get_winner, axis=1)

# Create dict to hold current ratings
current_ratings = {team: 1500 for team in matches.home_team.unique()}

# Loop through every match
for index, match in matches.iterrows():
    if match.is_bye == 1:
        matches.at[index, 'home_team_current_rating'] = current_ratings[match.home_team]
        matches.at[index, 'home_team_new_rating'] = current_ratings[match.home_team]
        continue
    # Update Current Ratings
    home_team_current_rating = current_ratings[match.home_team]
    away_team_current_rating = current_ratings[match.away_team]
    
    # Get New Ratings
    home_team_new_rating = calculate_rating_change(home_team_current_rating, 
                                                   away_team_current_rating, 
                                                   get_result_value(match.home_team, match.winner))
    
    away_team_new_rating = calculate_rating_change(away_team_current_rating, 
                                                   home_team_current_rating, 
                                                   get_result_value(match.away_team, match.winner))
    
    # Update Current Rating Dict
    current_ratings[match.home_team] = home_team_new_rating
    current_ratings[match.away_team] = away_team_new_rating
        
    # Update DF
    matches.at[index, 'home_team_current_rating'] = home_team_current_rating
    matches.at[index, 'away_team_current_rating'] = away_team_current_rating
    matches.at[index, 'home_team_new_rating'] = home_team_new_rating
    matches.at[index, 'away_team_new_rating'] = away_team_new_rating

matches.tail(5)


Unnamed: 0,yearly_match_number,year,round,day,date,time,attendance,venue,home_team,home_q1_goals,...,away_q4_goals,away_q4_points,away_q4_score,away_final_score,is_bye,home_team_current_rating,away_team_current_rating,home_team_new_rating,away_team_new_rating,winner
16207,175,2020,Semi Final,Fri,09-Oct-2020,6:50 PM,13778.0,Carrara,Richmond,5.0,...,6.0,13.0,49.0,49.0,0,1700,1490,1703,1487,Richmond
16208,176,2020,Semi Final,Sat,10-Oct-2020,6:40 PM,21396.0,Gabba,Geelong,4.0,...,5.0,2.0,32.0,32.0,0,1657,1607,1664,1600,Geelong
16209,177,2020,Preliminary Final,Fri,16-Oct-2020,7:20 PM,,Adelaide Oval,Port Adelaide,2.0,...,6.0,10.0,46.0,46.0,0,1624,1703,1618,1710,Richmond
16210,178,2020,Preliminary Final,Sat,17-Oct-2020,6:40 PM,29121.0,Gabba,Brisbane Lions,2.0,...,11.0,16.0,82.0,82.0,0,1570,1664,1564,1670,Geelong
16211,179,2020,Grand Final,Sat,24-Oct-2020,6:30 PM,29707.0,Gabba,Richmond,2.0,...,7.0,8.0,50.0,50.0,0,1710,1670,1717,1663,Richmond


Woo! All done! We can see all the way up to the recent 2020 Grand Final have we been able to calculate the ELO rating. 

Before we call it a day let's make a crude visualisation of all the team's ELO rating changes over time.

In [14]:

import plotly.graph_objects as go
fig = px.line(matches, x=matches.index, y="home_team_current_rating", color = "home_team")
fig.update_layout(title = 'AFL Elo Ratings',
                  xaxis_title = 'Index',
                  yaxis_title = "ELO Rating",
                  legend = {"title": "Team"})
fig

How great! We can already see some of the great football dynasties being represented in their ELO rating. We can also see the fall of Fitzroy :(. 

This also highlights an error in the dataset where North Melbourne are called the Kangaroos for a little while but this can be solved in other pre-processing projects. 

Additionally, I don't think the formulas need to be tweaked much at all at this stage as all the teams successfully stay within the 1000 & 2000 limits we wanted to impose. 

### Final Thoughts
This will work for the time being but I think the rating should also be influenced by the final scoreline. If a team wins by a larger margin they should be rewarded with some more points. This will obviously make it worse for teams who perform poorly but make it better when they almost beat a team much better than them. For example if the bottom of the ladder loses to first by 50 points that is expected and they should lose their points as normal however, if they only lose by 1 then thats a superb effort and they should be rewarded with losing a lot less points. 

In [112]:
mask = matches['round'] == "Grand Final"
grand_final_ratings = pd.concat([matches.loc[mask, 'home_team_current_rating'], matches.loc[mask, 'away_team_current_rating']])

fig2 = px.box(grand_final_ratings)
fig2.update_layout(
    title = "Distribution of Grand Final Team ELO Ratings",
    xaxis_title = None,
    yaxis_title = "ELO Rating"
)
fig2.layout.xaxis.update(showticklabels=False)
fig2

In [129]:
# The lowest ELO rating to make a grand final
mask = ((matches.home_team_current_rating == grand_final_ratings.min()) | (matches.away_team_current_rating == grand_final_ratings.min())) & (matches['round'] == "Grand Final")
matches.loc[mask, ['year', 'home_team', 'away_team', 'winner', 'home_team_current_rating', 'away_team_current_rating']]

Unnamed: 0,year,home_team,away_team,winner,home_team_current_rating,away_team_current_rating
1297,1913,Fitzroy,St Kilda,Fitzroy,1595,1434


In [18]:
import plotly.graph_objects as go
fig = px.line(matches, x=matches.index, y="home_team_current_rating", 
              color = "home_team", hover_data=["home_team", "home_team_current_rating", "year", "round"])
fig.update_layout(title = 'AFL Elo Ratings',
                  xaxis_title = 'Index',
                  yaxis_title = "ELO Rating",
                  legend = {"title": "Team"})
fig