## Offensive and defensive ELO ratings to predict number of goals

We keep track of two ratings for all teams offensive rating ($R_O$) and defensive rating ($R_D$).

We can then predict the number of goals a team will score by taking the difference of their offensive rating and the opponent's defensive rating.

The number of goals scored against them can be calculated by considering it from the opponent's perspective.

$E[\text{team}] = R_O[\text{team}] - R_D[\text{opponent}]$

We can update a team's offensive rating by adding the difference between the actual number of goals and the expected goals multiplied by the learning rate.
We can update a team's defensive rating by adding the difference between the expected goals scored against them and the actual number of goals scored against them multiplied by the learning rate.

$R_O[\text{team}] = R_O[\text{team}] + k(G[\text{team}] - E[\text{team}])$

$R_D[\text{team}] = R_D[\text{team}] + k(E[\text{opponent}] - G[\text{opponent}])$

We start every team with a rating of 0. The order of the training data makes a difference to the model and so the training data should be in chronological order in order to account for teams changing over team.


In [1]:
import numpy as np
import pandas as pd
from collections import defaultdict
from tqdm import tqdm, tqdm_notebook
import sklearn.model_selection

In [2]:
match_data = pd.read_csv("data/football-data.co.uk/updated-epl-training.csv")

In [36]:
def encode_result(result):
    if result == 'H':
        return 1
    elif result == 'A':
        return 0
    return 0.5

In [42]:
class ShotElo:
    def __init__(self, initial_rating=0, learning_rate=0.2):
        self.offensive_ratings = defaultdict(lambda: initial_rating)
        self.defensive_ratings = defaultdict(lambda: initial_rating)
        self.match_count = defaultdict(lambda: 0)
        self.learning_rate = learning_rate

    def predict(self, team, opponent):
        return self.offensive_ratings[team] - self.defensive_ratings[opponent]

    def update_match(self, home, away, home_actual_goals, away_actual_goals):
        home_expected_goals = self.predict(home, away)
        away_expected_goals = self.predict(away, home)
        self.offensive_ratings[home] += self.learning_rate * (home_actual_goals - home_expected_goals)
        self.offensive_ratings[away] += self.learning_rate * (away_actual_goals - away_expected_goals)
        self.defensive_ratings[home] += self.learning_rate * (away_expected_goals - away_actual_goals)
        self.defensive_ratings[away] += self.learning_rate * (home_expected_goals - home_actual_goals)
        self.match_count[home] += 1
        self.match_count[away] += 1

    def ratings_dataframe(self):
        df = pd.DataFrame(self.offensive_ratings.items(), columns=['Team', 'Offensive Rating'])
        df['Defensive Rating'] = df['Team'].map(self.defensive_ratings)
        df['Matches'] = df['Team'].map(self.match_count)
        df = df.sort_values('Offensive Rating', ascending=False)
        return df

    def train(self, df):
        for i, row in df.iterrows():
            self.update_match(row['HomeTeam'], row['AwayTeam'], row['FTHG'], row['FTAG'])

    def test(self, df):
        mse = 0
        mae = 0
        count = 0
        for i, row in df.iterrows():
            error_home = self.predict(row['HomeTeam'], row['AwayTeam']) - row['FTHG']
            error_away = self.predict(row['AwayTeam'], row['HomeTeam']) - row['FTAG']
            mse += error_home ** 2 + error_away ** 2
            mae += abs(error_home) + abs(error_away)
            count += 1
        return mse / count, mae / count

    def test_result(self, df):
        correct = 0
        count = 0
        for i, row in df.iterrows():
            goal_difference = round(self.predict(row['HomeTeam'], row['AwayTeam']) - self.predict(row['AwayTeam'], row['HomeTeam']))
            result = 0.5
            if goal_difference > 0:
                result = 1
            if goal_difference < 0:
                result = 0
            if result == encode_result(row['FTR']):
                correct += 1
            count += 1
        return correct / count



In [4]:
training, test = sklearn.model_selection.train_test_split(match_data, test_size=0.05, shuffle=False)

In [44]:
learning_rates = np.linspace(0.0001, 0.1, num=50)
mse_list = []
mae_list = []
accuracy = []

for learning_rate in tqdm(learning_rates):
    shot_elo = ShotElo(learning_rate=learning_rate)
    shot_elo.train(training)
    mse, mae = shot_elo.test(test)
    mse_list.append(mse)
    mae_list.append(mae)
    accuracy.append(shot_elo.test_result(test))
print("Accuracy:") 
pd.DataFrame(zip(learning_rates, mse_list, mae_list, accuracy), columns=['Learning Rate', 'Mean Square Error / Match', 'Mean Absolute Error / Match', 'Accuracy'])

100%|██████████| 50/50 [00:17<00:00,  2.83it/s]Accuracy:



Unnamed: 0,Learning Rate,Mean Square Error / Match,Mean Absolute Error / Match,Accuracy
0,0.0001,7.083477,2.731485,0.236287
1,0.002139,3.875892,2.051142,0.392405
2,0.004178,3.574284,2.023903,0.392405
3,0.006216,3.526738,2.034406,0.409283
4,0.008255,3.519893,2.040117,0.42616
5,0.010294,3.524397,2.04385,0.447257
6,0.012333,3.53274,2.04652,0.451477
7,0.014371,3.542142,2.04863,0.451477
8,0.01641,3.55144,2.050544,0.451477
9,0.018449,3.56015,2.051996,0.459916


In [25]:
shot_elo = ShotElo(learning_rate=0.05)
shot_elo.train(training)
# print(shot_elo.ratings_dataframe())
print(shot_elo.predict('Man City', 'Chelsea'))
print(shot_elo.predict('Chelsea', 'Man City'))

2.149439628503313
1.2314645420992298
