## Offensive and defensive ELO ratings to predict number of goals

We keep track of two ratings for all teams offensive rating ($R_O$) and defensive rating ($R_D$).

We can then predict the number of goals a team will score by taking the difference of their offensive rating and the opponent's defensive rating.

The number of goals scored against them can be calculated by considering it from the opponent's perspective.

$E[\text{team}] = R_O[\text{team}] - R_D[\text{opponent}]$

We can update a team's offensive rating by adding the difference between the actual number of goals and the expected goals multiplied by the learning rate.
We can update a team's defensive rating by adding the difference between the expected goals scored against them and the actual number of goals scored against them multiplied by the learning rate.

$R_O[\text{team}] = R_O[\text{team}] + k(G[\text{team}] - E[\text{team}])$

$R_D[\text{team}] = R_D[\text{team}] + k(E[\text{opponent}] - G[\text{opponent}])$

We start every team with a rating of 0. The order of the training data makes a difference to the model and so the training data should be in chronological order in order to account for teams changing over team.


In [1]:
import numpy as np
import pandas as pd
from collections import defaultdict
from tqdm import tqdm, tqdm_notebook
import sklearn.model_selection

In [2]:
match_data = pd.read_csv("data/football-data.co.uk/updated-epl-training.csv")

In [3]:
def encode_result(result):
    if result == 'H':
        return 1
    elif result == 'A':
        return 0
    return 0.5

In [4]:
class GoalElo:
    def __init__(self, initial_rating=0, learning_rate=0.2, draw_size=0.5):
        self.offensive_ratings = defaultdict(lambda: initial_rating)
        self.defensive_ratings = defaultdict(lambda: initial_rating)
        self.match_count = defaultdict(lambda: 0)
        self.learning_rate = learning_rate
        self.draw_size = draw_size

    def predict(self, team, opponent):
        ''' Predicts the number of goals team will score against opponent. '''
        return self.offensive_ratings[team] - self.defensive_ratings[opponent]

    def predict_result(self, team, opponent):
        goals_scored = self.predict(team, opponent)
        goals_conceded = self.predict(opponent, team)
        goal_difference = goals_scored - goals_conceded
        result = 1 if goal_difference > 0 else 0
        if abs(goal_difference) < self.draw_size:
            result = 0.5
        return result

    def update_match(self, home, away, home_actual_goals, away_actual_goals):
        ''' Updates the offensive and defensive ratings of both teams in a match. '''
        home_expected_goals = self.predict(home, away)
        away_expected_goals = self.predict(away, home)
        self.offensive_ratings[home] += self.learning_rate * (home_actual_goals - home_expected_goals)
        self.offensive_ratings[away] += self.learning_rate * (away_actual_goals - away_expected_goals)
        self.defensive_ratings[home] += self.learning_rate * (away_expected_goals - away_actual_goals)
        self.defensive_ratings[away] += self.learning_rate * (home_expected_goals - home_actual_goals)
        self.match_count[home] += 1
        self.match_count[away] += 1

    def ratings_dataframe(self):
        ''' Creates an easy to read dataframe of the ratings '''
        df = pd.DataFrame(self.offensive_ratings.items(), columns=['Team', 'Offensive Rating'])
        df['Defensive Rating'] = df['Team'].map(self.defensive_ratings)
        df['Matches'] = df['Team'].map(self.match_count)
        df = df.sort_values('Offensive Rating', ascending=False)
        return df

    def train(self, df):
        ''' Takes a data frame of matches with columns HomeTeam, AwayTeam, FTHG, FTAG and updates teams ratings using the data in order. '''
        for i, row in df.iterrows():
            self.update_match(row['HomeTeam'], row['AwayTeam'], row['FTHG'], row['FTAG'])

    def test(self, df):
        ''' Takes a data frame of matches with columns HomeTeam, AwayTeam, FTHG, FTAG and uses the ratings to predict the number of goals scored by each side. It measures the average mean square error and average mean absolute error per match. '''
        mse = 0
        mae = 0
        count = 0
        for i, row in df.iterrows():
            error_home = self.predict(row['HomeTeam'], row['AwayTeam']) - row['FTHG']
            error_away = self.predict(row['AwayTeam'], row['HomeTeam']) - row['FTAG']
            mse += error_home ** 2 + error_away ** 2
            mae += abs(error_home) + abs(error_away)
            count += 1
        return mse / count, mae / count

    def test_result(self, df):
        ''' Takes a data frame of matches with columns HomeTeam, AwayTeam, FTR and predicts the outcome using the ratings. It measures the number of correct predictions as a percentage of the size of the data. '''
        correct = 0
        count = 0
        for i, row in df.iterrows():
            goal_difference = round(self.predict(row['HomeTeam'], row['AwayTeam']) - self.predict(row['AwayTeam'], row['HomeTeam']))
            result = 0.5
            if goal_difference > 0:
                result = 1
            if goal_difference < 0:
                result = 0
            if result == encode_result(row['FTR']):
                correct += 1
            count += 1
        return correct / count



In [5]:
training, test = sklearn.model_selection.train_test_split(match_data, test_size=0.05, shuffle=False)

In [7]:
learning_rates = np.linspace(0.0001, 0.1, num=10)
mse_list = []
mae_list = []
accuracy = []

for learning_rate in tqdm(learning_rates):
    goal_elo = GoalElo(learning_rate=learning_rate)
    goal_elo.train(training)
    mse, mae = goal_elo.test(test)
    mse_list.append(mse)
    mae_list.append(mae)
    accuracy.append(goal_elo.test_result(test))
print("Accuracy:") 
pd.DataFrame(zip(learning_rates, mse_list, mae_list, accuracy), columns=['Learning Rate', 'Mean Square Error / Match', 'Mean Absolute Error / Match', 'Accuracy'])

100%|██████████| 10/10 [00:03<00:00,  2.97it/s]Accuracy:



Unnamed: 0,Learning Rate,Mean Square Error / Match,Mean Absolute Error / Match,Accuracy
0,0.0001,6.919693,2.687661,0.240506
1,0.0112,3.482109,2.038388,0.464135
2,0.0223,3.52408,2.042918,0.464135
3,0.0334,3.547938,2.041889,0.455696
4,0.0445,3.560787,2.037325,0.464135
5,0.0556,3.570631,2.03306,0.472574
6,0.0667,3.58114,2.029605,0.459916
7,0.0778,3.593675,2.027921,0.447257
8,0.0889,3.608569,2.027247,0.438819
9,0.1,3.625739,2.027807,0.421941


In [25]:
goal_elo = ShotElo(learning_rate=0.1)
goal_elo.train(training)
# print(shot_elo.ratings_dataframe())
print(goal_elo.predict('Man City', 'Chelsea'))
print(goal_elo.predict('Chelsea', 'Man City'))

2.149439628503313
1.2314645420992298
