# Forecasting The Final Premier League Table 2019/20

The Premier League suspended its 2019/20 season on March 13, 2020 with 92 games left unplayed, due to emergency measures required for dealing the the worldwide COVID-19 pandemic. 

At the time of writing (April 8, 2020) it is unclear if these games will ever be played. 

Given that our model can forecast the result of every game, we can also forecast what the final league table would be. 

In [1]:
# install the packages we need

import pandas as pd
import numpy as np

In [2]:
# load the Premier League table at the date of the suspension

Table = pd.read_excel("../../Data/Week 3/Premier League table March 13 2020.xlsx")
Table

Unnamed: 0,Position,club,Played,Won,Drawn,Lost,F,A,Points
0,1,Liverpool,29,27,1,1,66,21,82
1,2,Manchester City,28,18,3,7,68,31,57
2,3,Leicester City,29,16,5,8,58,28,53
3,4,Chelsea,29,14,6,9,51,39,48
4,5,Manchester United,29,12,9,8,44,30,45
5,6,Wolverhampton Wanderers,29,10,13,6,41,34,43
6,7,Sheffield United,28,11,10,7,30,25,43
7,8,Tottenham Hotspur,29,11,8,10,47,40,41
8,9,Arsenal,28,9,13,6,40,36,40
9,10,Burnley,29,11,6,12,34,40,39


In [3]:
# load the forecasts we produced in the last session

forecasts19_20 = pd.read_excel("../../Data/Week 3/forecasts19_20.xlsx")
forecasts19_20

Unnamed: 0.1,Unnamed: 0,date,Home team,away team,notplayed,month,day,year,FTHG,FTAG,...,538apr,B365res,lhTMratio,winvalue,predA,predD,predH,Maxprob,logitpred,logittrue
0,0,2019-10-08 00:00:00,AFC Bournemouth,Sheffield United,0,8.0,10.0,2019.0,1.0,1.0,...,0.24,H,1.508400,1,0.167224,0.206556,0.626221,0.626221,H,0
1,1,2019-10-08 00:00:00,Burnley,Southampton,0,8.0,10.0,2019.0,3.0,0.0,...,0.26,H,-0.148950,2,0.335116,0.264595,0.400289,0.400289,H,1
2,2,2019-10-08 00:00:00,Crystal Palace,Everton,0,8.0,10.0,2019.0,0.0,0.0,...,0.26,A,-0.789990,1,0.418441,0.262960,0.318599,0.418441,A,0
3,3,2019-10-08 00:00:00,Tottenham Hotspur,Aston Villa,0,8.0,10.0,2019.0,3.0,1.0,...,0.18,H,1.837186,2,0.143318,0.188803,0.667879,0.667879,H,1
4,4,2019-10-08 00:00:00,Watford,Brighton and Hove Albion,0,8.0,10.0,2019.0,0.0,3.0,...,0.25,H,0.169961,0,0.296876,0.259675,0.443449,0.443449,H,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,375,2020-05-17 00:00:00,Leicester City,Manchester United,1,,,,,,...,,,-0.630567,0,0.397067,0.264815,0.338118,0.397067,A,0
376,376,2020-05-17 00:00:00,Manchester City,Norwich City,1,,,,,,...,,,2.637690,0,0.096869,0.144880,0.758251,0.758251,H,0
377,377,2020-05-17 00:00:00,Newcastle United,Liverpool,1,,,,,,...,,,-1.445677,0,0.508728,0.246059,0.245213,0.508728,A,0
378,378,2020-05-17 00:00:00,Southampton,Sheffield United,1,,,,,,...,,,1.213235,0,0.191308,0.221556,0.587136,0.587136,H,0


We want to create a subset of the unplayed games. First we restrict the data to only those variables that we need: 1. our prediction for each game and 2. the identity of the home team and the away team. Of course, we also need the indicator as to whether a game was played or not:

In [4]:
Unplayed = forecasts19_20[['Home team','away team','notplayed','logitpred']]
Unplayed

Unnamed: 0,Home team,away team,notplayed,logitpred
0,AFC Bournemouth,Sheffield United,0,H
1,Burnley,Southampton,0,H
2,Crystal Palace,Everton,0,A
3,Tottenham Hotspur,Aston Villa,0,H
4,Watford,Brighton and Hove Albion,0,H
...,...,...,...,...
375,Leicester City,Manchester United,1,A
376,Manchester City,Norwich City,1,H
377,Newcastle United,Liverpool,1,A
378,Southampton,Sheffield United,1,H


Now we create the subset of unplayed games:

In [5]:
Unplayed = Unplayed[Unplayed['notplayed']==1].copy()
Unplayed.describe()

Unnamed: 0,notplayed
count,92.0
mean,1.0
std,0.0
min,1.0
25%,1.0
50%,1.0
75%,1.0
max,1.0


We now assign the points for each result: 3 points for win and zero for a loss. (As we established in the last session, our model doesn't forecast draws). We therefore allocate the points to the home and away team conditional on our forecast result:

In [6]:
Unplayed['Hpts'] = np.where(Unplayed['logitpred']=="H", 3, 0)
Unplayed['Apts'] = np.where(Unplayed['logitpred']=="H", 0, 3)
Unplayed

Unnamed: 0,Home team,away team,notplayed,logitpred,Hpts,Apts
288,Aston Villa,Sheffield United,1,H,3,0
289,Manchester City,Arsenal,1,H,3,0
290,AFC Bournemouth,Crystal Palace,1,H,3,0
291,Aston Villa,Chelsea,1,A,0,3
292,Brighton and Hove Albion,Arsenal,1,A,0,3
...,...,...,...,...,...,...
375,Leicester City,Manchester United,1,A,0,3
376,Manchester City,Norwich City,1,H,3,0
377,Newcastle United,Liverpool,1,A,0,3
378,Southampton,Sheffield United,1,H,3,0


Each row contains a result with two teams. We need to create a list of results with only one team in each row. To do this we will create two subsets, one for home teams and the other for away teams, and then stack (concatenate) the two subsets on top of each other. 

First, let's look at the home team results. We rename the columns, so that when we concatenate with the away teams the columns will have consistent names:

In [7]:
Results = Unplayed[['Home team','Hpts']].rename(columns={'Home team': 'club','Hpts':'XPoints'})
Results

Unnamed: 0,club,XPoints
288,Aston Villa,3
289,Manchester City,3
290,AFC Bournemouth,3
291,Aston Villa,0
292,Brighton and Hove Albion,0
...,...,...
375,Leicester City,0
376,Manchester City,3
377,Newcastle United,0
378,Southampton,3


Now we generate a subset of our forecast away team points:

In [8]:
AResults = Unplayed[['away team','Apts']].rename(columns={'away team': 'club','Apts':'XPoints'})
AResults

Unnamed: 0,club,XPoints
288,Sheffield United,0
289,Arsenal,0
290,Crystal Palace,0
291,Chelsea,3
292,Arsenal,3
...,...,...
375,Manchester United,3
376,Norwich City,0
377,Liverpool,3
378,Sheffield United,0


Now we concatenate the two dfs (Results and AResults) into a single df 

In [9]:
Results = pd.concat([Results, AResults])
Results

Unnamed: 0,club,XPoints
288,Aston Villa,3
289,Manchester City,3
290,AFC Bournemouth,3
291,Aston Villa,0
292,Brighton and Hove Albion,0
...,...,...
375,Manchester United,3
376,Norwich City,0
377,Liverpool,3
378,Sheffield United,0


In [10]:
Results.describe()

Unnamed: 0,XPoints
count,184.0
mean,1.5
std,1.504093
min,0.0
25%,0.0
50%,1.5
75%,3.0
max,3.0


Now we use .groupby to sum the forecast points won by each team:

In [11]:
PtsX = Results.groupby('club')['XPoints'].sum().reset_index()
PtsX

Unnamed: 0,club,XPoints
0,AFC Bournemouth,12
1,Arsenal,21
2,Aston Villa,6
3,Brighton and Hove Albion,6
4,Burnley,12
5,Chelsea,21
6,Crystal Palace,3
7,Everton,21
8,Leicester City,12
9,Liverpool,24


We can now merge these points forecasts into the table which showed the points won up until March 13, 2020 when league play was suspended:

In [12]:
Table = pd.merge(Table, PtsX, on= 'club', how = 'left')
Table

Unnamed: 0,Position,club,Played,Won,Drawn,Lost,F,A,Points,XPoints
0,1,Liverpool,29,27,1,1,66,21,82,24
1,2,Manchester City,28,18,3,7,68,31,57,30
2,3,Leicester City,29,16,5,8,58,28,53,12
3,4,Chelsea,29,14,6,9,51,39,48,21
4,5,Manchester United,29,12,9,8,44,30,45,24
5,6,Wolverhampton Wanderers,29,10,13,6,41,34,43,15
6,7,Sheffield United,28,11,10,7,30,25,43,0
7,8,Tottenham Hotspur,29,11,8,10,47,40,41,27
8,9,Arsenal,28,9,13,6,40,36,40,21
9,10,Burnley,29,11,6,12,34,40,39,12


Our forecast points for the end of the season is therefore the sum of Points (actually won) and the XPoints (our forecast for the remaining games): 

In [13]:
Table['finalpoints']=Table['Points']+ Table['XPoints']
Table

Unnamed: 0,Position,club,Played,Won,Drawn,Lost,F,A,Points,XPoints,finalpoints
0,1,Liverpool,29,27,1,1,66,21,82,24,106
1,2,Manchester City,28,18,3,7,68,31,57,30,87
2,3,Leicester City,29,16,5,8,58,28,53,12,65
3,4,Chelsea,29,14,6,9,51,39,48,21,69
4,5,Manchester United,29,12,9,8,44,30,45,24,69
5,6,Wolverhampton Wanderers,29,10,13,6,41,34,43,15,58
6,7,Sheffield United,28,11,10,7,30,25,43,0,43
7,8,Tottenham Hotspur,29,11,8,10,47,40,41,27,68
8,9,Arsenal,28,9,13,6,40,36,40,21,61
9,10,Burnley,29,11,6,12,34,40,39,12,51


Points determine league position, which is crucial not just for determining the champion, but also qualification for European competition (The Champions League and Europa League) and, perhaps more importantly, which teams get relegated to the Football League Championship in the following season (the bottom three teams).

The positions on March 13, 2020 are listed in the df. We now create a variable 'rank', which is the position of each team if XPoints are added:

In [14]:
Table.sort_values("finalpoints", inplace = True, ascending = False)
Table['rank'] = Table['finalpoints'].rank(ascending= False)
Table

Unnamed: 0,Position,club,Played,Won,Drawn,Lost,F,A,Points,XPoints,finalpoints,rank
0,1,Liverpool,29,27,1,1,66,21,82,24,106,1.0
1,2,Manchester City,28,18,3,7,68,31,57,30,87,2.0
3,4,Chelsea,29,14,6,9,51,39,48,21,69,3.5
4,5,Manchester United,29,12,9,8,44,30,45,24,69,3.5
7,8,Tottenham Hotspur,29,11,8,10,47,40,41,27,68,5.0
2,3,Leicester City,29,16,5,8,58,28,53,12,65,6.0
8,9,Arsenal,28,9,13,6,40,36,40,21,61,7.0
5,6,Wolverhampton Wanderers,29,10,13,6,41,34,43,15,58,8.5
11,12,Everton,29,10,7,12,37,46,37,21,58,8.5
9,10,Burnley,29,11,6,12,34,40,39,12,51,10.0


## Conclusions

This exercise is a nice application of our forecasting model, but given the likelihood that the 92 games may never be played, it is possible that some exercise of this kind might actually be required. 

Note that the model does not resolves ties- if two team get equal points, they are awarded half of the two ranks they occupy. Thus Chelsea and Manchester United in this model are tied in 3rd and 4th positions, so they are given a value of 3.5 each. In practice, goal difference is used to separate teams of equal points, and this model could be developed to generate a forecast of goal difference as well.

There is little doubt that had the full season been played Liverpool would have won the title- it was almost a mathematical certainty as of March 13, 2020. Qualification for Europe competition and relegation from the Premier League were much less clear. 

One very notable change that arises from our model is that when the league was suspended Bournemouth was in 18th place and would have been relegated if it stayed in this position, while our model forecasts the team would have won enough points to rise to 16th place, avoiding relegation, while Brighton would have sunk to 18th place and been relegated. The reason for this is that on average  Bournemouth's TM value equaled 70% of its remaining opponents, but Brighton's TM value was only equal to 50% of its remaining opponents. But clearly, fans of each team would have opposite opinions about a modeling exercise of this kind!

Assuming the season cannot be completed, the Premier League will have to decide how to treat unplayed games. At the least, this exercise shows that this decision, whatever it turns out to be, will not be without controversy.  