# Outrights simulator [WIP]

##### The idea is to build a tool, utilising various data sources and models already existing, which is more accurate than the bookmakers in prediciting the outcomes which can be wagered on.

##### I do not claim I have succeeded at that - this is only an attempt at doing so.

##### Note: gambling is net negative, mkay.

## 1. Introduction

The most popular bets offered by bookmakers are wagered on the outcome of single games.

More complex bets (e.g. parlays) are characterised by a significantly higher spread.

Bookmakers are generally pretty good at estimating the odds and so it's difficult to be profitable, even theoretically.

Practically, there are also issues of taxation and the pattern of restricting bettors who seem to be profitable.

Now, contrary to what would seem to be the case, I have justification to believe the following two theses:

1) There exist models which can predict the outcomes of games more accurately than what the implied odds are.

2) **Bookmakers are bad at predicting the outright outcomes.**

Therefore, the strategy I have come up with is to build on the outrights according to external models - and hope for the best.

More seriously, there are 2 kinds of errors that the bookmaker can make.

Let me illustrate it with a simplified example.

There's a football (soccer) team who has 2 games in the season left to play. They'll become a champion only if they win both of these games.

The bookmaker offers odds 2.5 for the team to win the first game and 2.0 to win the second game, meaning 40% and 50% respectively. Let's assume these are independent events.

The bookmaker also offers 6.0 odds for the team to be the champion.

The first kind of error that might have happened is that the bookmaker estimated the strength of the team of its opponents inaccurately. 

Perhaps the team is signifantly stronger and the actual probabilities are 50% and 60% instead.

The second kind of error, which is quite clear, is that the bookmaker's outright odds are better than the odds that would be implied from their own predictions on the outcomes of single games.

If the odds are 2.5 and 2.0 respectively, the implied odds on the team to become the champion should be 5.0.

Anything less than that means a higher spread for the bookmarker, whereas odds better than that allow the punter to hedge profitably.

Using a model to predict the outcomes of all games in the season allows one to potentially exploit the second error.

## 2. How it works

Note: The whole process relies on an external model(s) and a number of **sketchy heuristics**.

The results are bound to be very rough estimations.

The idea is that in cases where there is a very significant disparity between the model's prediction and the bookie's offer following the model's pick is +EV.

The model estimates ELO rating of the teams in a given league and takes into account all games already played in a given season.

It then utilises a popular Monte Carlo method on the Author's supercomputer to run a number of simulations how the season's going to play out.

In more detail:

1. Get details on all games in the season via the API-FOOTBALL.

2. Get ratings of the teams in the league from Opta Power Rankings.

3. Run a linear regression to translate the ratings from Opta into ELO ratings from clubelo.com.

4. Simulate the season a number of times based on the ELO ratings obtained above.

5. Compare the model's predictions to the odds offered by the bookmaker.

The whole process is presented below. There are a couple of steps which require a small amount of manual labor, notably: scraping data from Opta, aligning the teams' names which differ across the data websites and checking the available odds and bets offered by the bookie.

## 3. When it worked in the past - and when it did not 

What follows are my somewhat subjective observations over some time.





I believe that models updating the team's strength after each games - as both Opta and clubelo do - are more sensitive to the cases where either a "weak" team is outperforming or a "strong" team is underpeforming.

The perfect example is the famous Leicester run in 2015/2016 Premier League season.

See: https://www.skysports.com/football/news/11712/10261535/premier-league-2015-16-how-the-odds-changed-as-leicester-claimed-the-title



[after 13 matchdays] - 2015-11-24

In [1]:
import statistics

In [23]:
import pandas as pd

In [24]:
from helpers import *

In [25]:
# ELO_DATE = '2015-11-24'
ELO_DATE = '2016-01-24'

In [26]:
# download_elo_data(ELO_DATE)
download_elo_data('2025-11-03')

In [17]:
# api_get_leagues()

In [18]:
find_league_id('GB-ENG', 'Premier League')

39

In [19]:
df = get_api_teams_and_elo_from_clubelo(ELO_DATE, 'ENG')
df.head(20)

Unnamed: 0,Club,Elo
0,Manchester City,1865
1,Arsenal,1845
2,Chelsea,1801
3,Tottenham,1795
4,Manchester United,1778
5,Leicester,1742
6,Liverpool,1729
7,Everton,1725
8,Southampton,1722
9,Stoke City,1706


In [20]:
df2 = get_api_teams_and_elo_from_clubelo('2016-05-20', 'ENG')
df2.head(20)

Unnamed: 0,Club,Elo
0,Manchester City,1851
1,Arsenal,1843
2,Leicester,1814
3,Tottenham,1807
4,Liverpool,1803
5,Chelsea,1796
6,Manchester United,1791
7,Southampton,1781
8,West Ham,1726
9,Everton,1698


In [21]:
elo_drift_df = pd.merge(df, df2, how='inner', on='Club', suffixes=('_before', '_after'))
elo_drift_df['Elo_Drift'] = elo_drift_df['Elo_after'] - elo_drift_df['Elo_before']
season_stdev = statistics.stdev(elo_drift_df['Elo_Drift'])
print(season_stdev)
elo_drift_df.head(20)


36.80599922495473


Unnamed: 0,Club,Elo_before,Elo_after,Elo_Drift
0,Manchester City,1865,1851,-14
1,Arsenal,1845,1843,-2
2,Chelsea,1801,1796,-5
3,Tottenham,1795,1807,12
4,Manchester United,1778,1791,13
5,Leicester,1742,1814,72
6,Liverpool,1729,1803,74
7,Everton,1725,1698,-27
8,Southampton,1722,1781,59
9,Stoke City,1706,1684,-22


In [11]:
standings_df = build_historical_standings_table_after_n_rounds(league_id=39, season=2015, country_code_elo='ENG', country_code_api='ENG', after_round=13, elo_date=ELO_DATE, modify_elo=False, stdev=season_stdev)
standings_df.head(20)

Unnamed: 0,Club,Elo,Points,Games played
1,Leicester,1762,28,13
2,Manchester United,1901,27,13
3,Arsenal,1935,26,13
4,Manchester City,1917,26,13
5,Tottenham,1708,24,13
6,West Ham,1646,21,13
7,Southampton,1727,20,13
8,Everton,1737,20,13
9,Liverpool,1695,20,13
10,Crystal Palace,1635,19,13


In [12]:
sample_season = simulate_season(league_id=39, season=2015, after_round=13, standings_df=standings_df, reverse=False, modify_elo_in_sim=False)
sample_season.head(20)

Unnamed: 0,Club,Points,Elo,Games played
2,Arsenal,77,1935,38
3,Manchester City,75,1917,38
1,Manchester United,73,1901,38
6,Southampton,68,1727,38
4,Tottenham,67,1708,38
10,Stoke City,62,1726,38
7,Everton,58,1737,38
8,Liverpool,54,1695,38
13,Chelsea,54,1798,38
11,West Brom,52,1688,38


In [14]:
results = run_multiple_sims(league_id=39, season=2015, country_code_elo='ENG', country_code_api=None, after_round=13, elo_date=ELO_DATE, number_of_sims=10000, number_of_winning_places=1, reverse=False, modify_elo_in_sim=False, modify_elo_retro=False)
results.head(20)

                 Club   Elo  Points  Games played
1           Leicester  1689      28            13
2   Manchester United  1835      27            13
3             Arsenal  1834      26            13
4     Manchester City  1869      26            13
5           Tottenham  1767      24            13
6            West Ham  1646      21            13
7         Southampton  1705      20            13
8             Everton  1728      20            13
9           Liverpool  1755      20            13
10     Crystal Palace  1659      19            13
11         Stoke City  1692      19            13
12          West Brom  1639      17            13
13            Watford  1602      16            13
14            Chelsea  1797      14            13
15            Swansea  1647      14            13
16            Norwich  1591      12            13
17          Newcastle  1570      10            13
18         Sunderland  1570       9            13
19        Bournemouth  1559       9            13


100%|██████████| 10000/10000 [01:10<00:00, 142.40it/s]

9264 simulations
1 winning places





Unnamed: 0,Club,Wins,% winrate,Expected odds
1,Manchester City,3481,38.0,2.66
2,Manchester United,2779,30.0,3.33
3,Arsenal,2473,27.0,3.75
4,Tottenham,322,3.0,28.77
5,Liverpool,94,1.0,98.55
6,Leicester,75,1.0,123.52
7,Everton,18,0.0,514.67
8,Chelsea,16,0.0,579.0
9,Stoke City,3,0.0,3088.0
10,Southampton,2,0.0,4632.0


In [22]:
results = run_multiple_sims(league_id=39, season=2015, country_code_elo='ENG', country_code_api=None, after_round=23, elo_date=ELO_DATE, number_of_sims=10000, number_of_winning_places=1, reverse=False, modify_elo_in_sim=False, modify_elo_retro=False)
results.head(20)

                 Club   Elo  Points  Games played
1           Leicester  1742      47            23
2     Manchester City  1865      44            23
3             Arsenal  1845      44            23
4           Tottenham  1795      42            23
5   Manchester United  1778      37            23
6            West Ham  1683      36            23
7           Liverpool  1729      34            23
8          Stoke City  1706      33            23
9         Southampton  1722      33            23
10            Watford  1639      32            23
11     Crystal Palace  1656      31            23
12            Everton  1725      29            23
13            Chelsea  1801      28            23
14          West Brom  1644      28            23
15            Swansea  1633      25            23
16        Bournemouth  1614      25            23
17            Norwich  1598      23            23
18          Newcastle  1591      21            23
19         Sunderland  1589      19            23


100%|██████████| 10000/10000 [01:12<00:00, 138.34it/s]

9084 simulations
1 winning places





Unnamed: 0,Club,Wins,% winrate,Expected odds
1,Manchester City,3532,39.0,2.57
2,Arsenal,3267,36.0,2.78
3,Leicester,1586,17.0,5.73
4,Tottenham,626,7.0,14.51
5,Manchester United,64,1.0,141.94
6,Liverpool,4,0.0,2271.0
7,West Ham,3,0.0,3028.0
8,Crystal Palace,1,0.0,9084.0
9,Everton,1,0.0,9084.0
