## Elo rating
We evaluate the result of a match as a 0 for a loss, 0.5 for a draw and 1 for a win.
Useful paper: http://www.glicko.net/research/acjpaper.pdf

### Expected Outcome
We can compute the expected outcome $E_H$ of a match between two teams by using the equation (from the perspective of the home team).
The probability of the home team winning is equal to the expected outcome.
\begin{equation}
Q_H = 10^{\frac{R_H}{400}}
\\
Q_A = 10^{\frac{R_A}{400}}
\\
E_H = \frac{Q_H}{Q_H + Q_A} = P(\text{Home Winning})
\end{equation}

### Update rating
We update the rating of a team after a match using the equation and using the expected result $E_H$ and the actual result $S_H$. $K$ is a constant which is commonly 32.
\begin{equation}
R_H' = R_H + K * (S_H - E_H)
\end{equation}

In [1]:
import numpy as np
import pandas as pd
from collections import defaultdict

In [2]:
training_data = pd.read_csv("data/football-data.co.uk/updated-epl-training.csv")

In [3]:
def encode_result(result):
    outcome = 0.5
    if result == 'H':
        outcome = 1
    elif result == 'A':
        outcome = 0
    return outcome

In [4]:
def expected_outcome(homeTeam, awayTeam):
    Q_home = 10 ** (ratings[homeTeam] / 400)
    Q_away = 10 ** (ratings[awayTeam] / 400)
    expected_outcome = Q_home / (Q_home + Q_away)
    return expected_outcome


In [5]:
def update_rating(homeTeam, awayTeam, result):
    outcome = encode_result(result)
    expected = expected_outcome(homeTeam, awayTeam)
    ratings[homeTeam] = ratings[homeTeam] + k * (outcome - expected)
    ratings[awayTeam] = ratings[awayTeam] + k * ((1 - outcome) - (1 - expected))
    match_count[homeTeam] += 1
    match_count[awayTeam] += 1

In [51]:
training_size = 4500
test_size = 1000
correct = 0
tested = 0
ratings = defaultdict(lambda: 1000)
match_count = defaultdict(lambda: 0)
draw_size = 0.01
k = 32
for i, match in training_data.iterrows():
    if i < training_size:
        update_rating(match['HomeTeam'], match['AwayTeam'], match['FTR'])
    elif i - training_size < test_size:
        expected = expected_outcome(match['HomeTeam'], match['AwayTeam'])
        if expected < 0.5 - draw_size:
            exp_result = 0
        elif expected > 0.5 + draw_size:
            exp_result = 1
        else:
            exp_result = 0.5
        
        if exp_result == encode_result(match['FTR']):
           correct += 1
        tested += 1
    else:
        break
print("Trained on", int(sum(match_count.values()) / 2), "matches")
print("Tested on", tested, "matches")
print("Accuracy:", correct / tested)



Trained on 4500 matches
Tested on 187 matches
Accuracy: 0.5080213903743316


In [42]:
df = pd.DataFrame(ratings.items(), columns=['Team', 'Rating'])
df['Matches'] = df['Team'].map(match_count)
df = df.sort_values('Rating', ascending=False)
print(df)

                Team       Rating  Matches
11         Liverpool  1331.301726      400
15          Man City  1320.042236      400
9          Tottenham  1254.312716      400
16           Chelsea  1218.055587      400
18        Man United  1199.623807      400
0            Arsenal  1166.246671      400
4            Everton  1055.363160      400
12          West Ham  1040.701789      362
31         Leicester  1039.699070      172
32       Bournemouth  1024.084698      134
34          Brighton  1018.863476       58
33           Watford  1015.713399      134
30    Crystal Palace  1010.247857      210
36  Sheffield United  1000.000000        0
37             Leeds  1000.000000        0
21            Wolves   997.004037      134
19         Newcastle   987.498476      324
13             Wigan   970.086997      190
3              Stoke   967.060887      380
1          West Brom   965.599288      342
20           Burnley   961.835437      172
28       Southampton   957.105890      248
26         