# Outrights simulator

##### The idea is to build a tool, utilising various data sources and models already existing, which is more accurate than the bookmakers in prediciting the outcomes which can be wagered on.

##### I do not claim I have succeeded at that - this is only an attempt at doing so.

##### Note: gambling is net negative, mkay.

## 1. Introduction

The most popular bets offered by bookmakers are wagered on the outcome of single games.

More complex bets (e.g. parlays) are characterised by a significantly higher spread.

Bookmakers are generally pretty good at estimating the odds and so it's difficult to be profitable, even theoretically.

Practically, there are also issues of taxation and the pattern of restricting bettors who seem to be profitable.

Now, contrary to what would seem to be the case, I have justification to believe the following two theses:

1) There exist models which can predict the outcomes of games more accurately than what the implied odds are.

2) **Bookmakers are bad at predicting the outright outcomes.**

Therefore, the strategy I have come up with is to build on the outrights according to external models - and hope for the best.

## 2. An example

More seriously, there are 2 kinds of errors that the bookmaker can make.

Let me illustrate it with a simplified example.

There's a football (soccer) team who has 2 games in the season left to play. They'll become a champion only if they win both of these games.

The bookmaker offers odds 2.5 for the team to win the first game and 2.0 to win the second game, meaning 40% and 50% respectively. Let's assume these are independent events.

The bookmaker also offers 6.0 odds for the team to be the champion.

The first kind of error that might have happened is that the bookmaker estimated the strength of the team of its opponents inaccurately. 

Perhaps the team is signifantly stronger and the actual probabilities are 50% and 60% instead.

The second kind of error, which is quite clear, is that the bookmaker's outright odds are better than the odds that would be implied from their own predictions on the outcomes of single games.

If the odds are 2.5 and 2.0 respectively, the implied odds on the team to become the champion should be 5.0.

Anything less than that means a higher spread for the bookmarker, whereas odds better than that allow the punter to hedge profitably.

Using a model to predict the outcomes of all games in the season allows one to potentially exploit the second error.

## 3. How it works

Note: The whole process relies on an external model(s) and a number of **sketchy heuristics**.

The results are bound to be very rough estimations.

The idea is that in cases where there is a very significant disparity between the model's prediction and the bookie's offer following the model's pick is +EV.

The model estimates ELO rating of the teams in a given league and takes into account all games already played in a given season.

It then utilises a popular Monte Carlo method on the Author's supercomputer to run a number of simulations how the season's going to play out.

In more detail:

1. Get details on all games in the season via the API-FOOTBALL.

2. Get ratings of the teams in the league from Opta Power Rankings.

3. Run a linear regression to translate the ratings from Opta into ELO ratings from clubelo.com.

4. Simulate the season a number of times based on the ELO ratings obtained above.

5. Compare the model's predictions to the odds offered by the bookmaker.

The whole process is presented below. There are a couple of steps which require a small amount of manual labor, notably: scraping data from Opta, aligning the teams' names which differ across the data websites and checking the available odds and bets offered by the bookie.

### 3. When it worked in the past - and when it did not 

What follows are my somewhat subjective observations over some time.

I believe that models updating the team's strength after each games - as both Opta and clubelo do - are slightly more sensitive to the cases where either a "weak" team is outperforming or a "strong" team is underpeforming.

An example would be the famous Leicester run in 2015/2016 Premier League season - I am hoping to see the elo-based model suggest that Leicester is being underestimated throughout the season, compared to the bookmakers' odds.

See: https://www.skysports.com/football/news/11712/10261535/premier-league-2015-16-how-the-odds-changed-as-leicester-claimed-the-title for a nice overview of the outright odds for Leicester for that season.

In [1]:
import statistics

In [2]:
import pandas as pd

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
from helpers import *

In [5]:
ELO_DATE = '2015-11-24'

In [6]:
NUMBER_OF_SIMS = 10000

In [7]:
# download_elo_data(ELO_DATE)

In [8]:
# api_get_leagues()

In [9]:
find_league_id('GB-ENG', 'Premier League')

39

In [10]:
df = get_api_teams_and_elo_from_clubelo(ELO_DATE, 'ENG')
df.head(20)

Unnamed: 0,Club,Elo
0,Manchester City,1869
1,Manchester United,1835
2,Arsenal,1834
3,Chelsea,1797
4,Tottenham,1767
5,Liverpool,1755
6,Everton,1728
7,Southampton,1705
8,Stoke City,1692
9,Leicester,1689


In [11]:
# df2 = get_api_teams_and_elo_from_clubelo('2016-05-20', 'ENG')
# df2.head(20)

In [12]:
# elo_drift_df = pd.merge(df, df2, how='inner', on='Club', suffixes=('_before', '_after'))
# elo_drift_df['Elo_Drift'] = elo_drift_df['Elo_after'] - elo_drift_df['Elo_before']
# season_stdev = statistics.stdev(elo_drift_df['Elo_Drift'])
# print(season_stdev)
# elo_drift_df.head(20)

In [13]:
standings_df = build_historical_standings_table_after_at_most_n_rounds(league_id=39, season=2015, country_code_elo='ENG', country_code_api='ENG', elo_date=ELO_DATE, last_round_no=13)
standings_df.head(20)

Unnamed: 0,Club,Elo,Points,Games played
1,Leicester,1689,28,13
2,Manchester United,1835,27,13
3,Arsenal,1834,26,13
4,Manchester City,1869,26,13
5,Tottenham,1767,24,13
6,West Ham,1646,21,13
7,Southampton,1705,20,13
8,Everton,1728,20,13
9,Liverpool,1755,20,13
10,Crystal Palace,1659,19,13


In [14]:
sample_season = simulate_season_after_n_rounds(league_id=39, season=2015, standings_df=standings_df, reverse=False, round_to_overwrite_with_sims_from=14)
sample_season.head(20)

Unnamed: 0,Club,Points,Elo,Games played
3,Manchester City,75,1869,37
1,Manchester United,72,1835,37
2,Arsenal,72,1834,37
7,Everton,64,1728,37
5,West Ham,58,1646,37
10,Stoke City,57,1692,37
13,Chelsea,57,1797,37
0,Leicester,56,1689,37
8,Liverpool,52,1755,37
6,Southampton,52,1705,37


After 13 matchdays - 2015-11-24

Official odds: 101.00

In [15]:
results = run_multiple_sims(league_id=39, season=2015, country_code_elo='ENG', country_code_api=None, elo_date=ELO_DATE, number_of_sims=NUMBER_OF_SIMS, number_of_winning_places=1, last_round_for_standings=13, round_to_overwrite_with_sims_from=14)
results.head(20)

                 Club   Elo  Points  Games played
1           Leicester  1689      28            13
2   Manchester United  1835      27            13
3             Arsenal  1834      26            13
4     Manchester City  1869      26            13
5           Tottenham  1767      24            13
6            West Ham  1646      21            13
7         Southampton  1705      20            13
8             Everton  1728      20            13
9           Liverpool  1755      20            13
10     Crystal Palace  1659      19            13
11         Stoke City  1692      19            13
12          West Brom  1639      17            13
13            Watford  1602      16            13
14            Chelsea  1797      14            13
15            Swansea  1647      14            13
16            Norwich  1591      12            13
17          Newcastle  1570      10            13
18         Sunderland  1570       9            13
19        Bournemouth  1559       9            13


100%|██████████| 10000/10000 [01:35<00:00, 104.43it/s]

10000 simulations
1 winning places





Unnamed: 0,Club,RTB Wins,LTB Wins,% RTB winrate,% LTB winrate,Exp. RTB odds,Exp. LTB odds
1,Manchester City,3448,3204.0,34.5,32.0,2.9,3.12
2,Manchester United,3272,3022.0,32.7,30.2,3.06,3.31
3,Arsenal,2614,2418.0,26.1,24.2,3.83,4.14
4,Tottenham,403,346.0,4.0,3.5,24.81,28.9
5,Leicester,111,92.0,1.1,0.9,90.09,108.7
6,Liverpool,81,60.0,0.8,0.6,123.46,166.67
7,Everton,32,23.0,0.3,0.2,312.5,434.78
8,Chelsea,16,13.0,0.2,0.1,625.0,769.23
9,Southampton,10,9.0,0.1,0.1,1000.0,1111.11
10,Stoke City,6,5.0,0.1,0.0,1666.67,2000.0


After 23 matchdays - 2016-01-25

Official odds: 9.00

In [16]:
ELO_DATE = '2016-01-25'
results = run_multiple_sims(league_id=39, season=2015, country_code_elo='ENG', country_code_api=None, elo_date=ELO_DATE, number_of_sims=NUMBER_OF_SIMS, number_of_winning_places=1, last_round_for_standings=23, round_to_overwrite_with_sims_from=24)
results.head(20)

                 Club   Elo  Points  Games played
1           Leicester  1742      47            23
2     Manchester City  1865      44            23
3             Arsenal  1834      44            23
4           Tottenham  1795      42            23
5   Manchester United  1778      37            23
6            West Ham  1683      36            23
7           Liverpool  1729      34            23
8         Southampton  1722      33            23
9          Stoke City  1706      33            23
10            Watford  1639      32            23
11     Crystal Palace  1656      31            23
12            Everton  1713      29            23
13            Chelsea  1811      28            23
14          West Brom  1644      28            23
15            Swansea  1644      25            23
16        Bournemouth  1614      25            23
17            Norwich  1598      23            23
18          Newcastle  1591      21            23
19         Sunderland  1589      19            23


100%|██████████| 10000/10000 [01:37<00:00, 102.68it/s]

10000 simulations
1 winning places





Unnamed: 0,Club,RTB Wins,LTB Wins,% RTB winrate,% LTB winrate,Exp. RTB odds,Exp. LTB odds
1,Manchester City,3941,3608.0,39.4,36.1,2.54,2.77
2,Arsenal,3167,2893.0,31.7,28.9,3.16,3.46
3,Leicester,1989,1751.0,19.9,17.5,5.03,5.71
4,Tottenham,815,710.0,8.2,7.1,12.27,14.08
5,Manchester United,77,56.0,0.8,0.6,129.87,178.57
6,Liverpool,5,3.0,0.0,0.0,2000.0,3333.33
7,Southampton,3,2.0,0.0,0.0,3333.33,5000.0
8,West Ham,2,1.0,0.0,0.0,5000.0,10000.0
9,Stoke City,1,0.0,0.0,0.0,10000.0,inf


After 27 matchdays - 2016-03-01

Official odds: 2.87

In [17]:
ELO_DATE = '2016-03-01'
results = run_multiple_sims(league_id=39, season=2015, country_code_elo='ENG', country_code_api=None, elo_date=ELO_DATE, number_of_sims=NUMBER_OF_SIMS, number_of_winning_places=1, last_round_for_standings=27, round_to_overwrite_with_sims_from=28)
results.head(20)

                 Club   Elo  Points  Games played
1           Leicester  1776      56            27
2           Tottenham  1844      54            27
3             Arsenal  1839      51            27
4     Manchester City  1861      48            27
5   Manchester United  1799      44            27
6            West Ham  1697      43            27
7           Liverpool  1746      41            27
8         Southampton  1742      40            27
9          Stoke City  1707      39            27
10            Watford  1656      37            27
11            Chelsea  1827      36            27
12            Everton  1734      35            27
13          West Brom  1663      35            27
14     Crystal Palace  1641      32            27
15        Bournemouth  1621      29            27
16            Swansea  1644      27            27
17          Newcastle  1598      25            27
18            Norwich  1588      24            27
19         Sunderland  1608      23            27


100%|██████████| 10000/10000 [01:35<00:00, 104.63it/s]

10000 simulations
1 winning places





Unnamed: 0,Club,RTB Wins,LTB Wins,% RTB winrate,% LTB winrate,Exp. RTB odds,Exp. LTB odds
1,Leicester,4056,3675.0,40.6,36.8,2.47,2.72
2,Tottenham,3823,3461.0,38.2,34.6,2.62,2.89
3,Arsenal,1649,1403.0,16.5,14.0,6.06,7.13
4,Manchester City,452,352.0,4.5,3.5,22.12,28.41
5,Manchester United,18,16.0,0.2,0.2,555.56,625.0
6,Liverpool,1,0.0,0.0,0.0,10000.0,inf
7,West Ham,1,0.0,0.0,0.0,10000.0,inf


After 34 matchdays - 2016-04-19

Official odds: 1.44

In [18]:
ELO_DATE = '2016-04-19'
results = run_multiple_sims(league_id=39, season=2015, country_code_elo='ENG', country_code_api=None, elo_date=ELO_DATE, number_of_sims=NUMBER_OF_SIMS, number_of_winning_places=1, last_round_for_standings=34, round_to_overwrite_with_sims_from=35)
results.head(20)

                 Club   Elo  Points  Games played
1           Leicester  1797      73            34
2           Tottenham  1847      68            34
3             Arsenal  1835      63            34
4     Manchester City  1872      61            34
5   Manchester United  1785      59            34
6            West Ham  1724      56            34
7           Liverpool  1808      55            34
8         Southampton  1741      51            34
9          Stoke City  1692      47            34
10            Chelsea  1796      45            34
11            Watford  1645      41            34
12        Bournemouth  1641      41            34
13            Swansea  1659      40            34
14            Everton  1714      40            34
15          West Brom  1653      40            34
16     Crystal Palace  1647      38            34
17         Sunderland  1623      33            34
18            Norwich  1582      31            34
19          Newcastle  1585      29            34


100%|██████████| 10000/10000 [01:31<00:00, 108.85it/s]

10000 simulations
1 winning places





Unnamed: 0,Club,RTB Wins,LTB Wins,% RTB winrate,% LTB winrate,Exp. RTB odds,Exp. LTB odds
1,Leicester,8955,8589,89.6,85.9,1.12,1.16
2,Tottenham,1045,743,10.4,7.4,9.57,13.46


Whether a fluke or not, it is difficult to tell; while the odds after 13 rounds seemed to be on point, futher down in the season, we have estimated higher probability of Leicester winning the title - which ultimately did happen.

We can imagine a model which would be even more aggressive with updating each team's estimated strength: even in the one here, Leicester has basically the same Elo as Chelsea, while there's a difference of 73 vs 45 points in the table between the two teams.

K-factor, which basically describes this "speed" of updates, is set to 20 by the clubelo themselves as one fitting best.