Reinforcement Learning for Sports betting

How I discovered that I won't be a millionaire that easily

What is it about?

Using the library Stable-Baselines and football data Beat The Bookie: Odds Series Football Dataset this was an attempt to let agent learn how to bet and win in a long run.

SPOILER ALERT!

I'm still poor

Which part of the data did I use?

Only average odds at closing time were used.

What does observation look like?

Average odds at closing time and current bankroll. I also added wrapper that adds predictions on winner (given odds) using tiny, classical supervised network on different dataset but still let RL agent make decision what and how much to bet.

Therefore state representation eventually became a vector of 7 numbers:

normalized bankroll (how close to losing agent is)
bookies odds for home team, away team and draw (normalized or not)
chances of each outcome according to my supervised network (this is not the best model - only 0.44 F1 score on the same dataset)

Environment

Environment (located in betting_env/environment/betting_environment.py) implements OpenAI gym interface. Environment accepts only discrete actions defined as in betting_env/environment/actions.py. Epoch ends when either winning limit is reached (for instance double of initial bankroll) or when all the matches from dataset has been played (hasn't happen). It even has tests in tests/betting_env_tests!

Wrappers for state normalization and keras model are in the same folder.

Metrics

To compare results I used Neptune.AI console (it's free and fun) for single number comparison in addition to good old error bars, tensorboard and bunch of txt file.

Neptune.ai console

Tensorboard

Text output

'all algoritms and hyperparameters to be tested:'
{'A2C': [{'lr_schedule': 'linear', 'policy': 'MlpPolicy'}],
 'DQN': [{'policy': 'MlpPolicy'}],
 'PPO': [{'policy': 'MlpPolicy'}]}
https://ui.neptune.ai/asdd/sandbox/e/SAN-290
Algorithm: A2C
Hyperparamters: {'policy': 'MlpPolicy', 'lr_schedule': 'linear'}

Eval num_timesteps=200, episode_reward=-20.24 +/- 32.30
Episode length: 100.00 +/- 0.00
New best mean reward!
Eval num_timesteps=400, episode_reward=-36.34 +/- 30.92
Episode length: 100.00 +/- 0.00
...

Error bar

Example parameters

--algorithms
PPO
A2C
DQN
--render_test
False
--video_record_test
False
--episode_max_steps
100
--total_timesteps
1000
--verbose
False
--norm_obs
False
--norm_reward
False
--neptune_api_token
<<YOUR NEPTUNE API TOKEN, IT'S FREE!>>

To learn more visit main.py

Customisation

Other environments

It's (relatively) easy to use this for other simple environments. Substitute exp.betting_env_creator.get_env_function with your function that follows the same interface.

import gym
def get_my_custom_env():
    def make_env(g=10):

        env = gym.make('Pendulum-v0', g=g)
        return env

    #they have to match paramters of make_env function
    train_env_parameters_dict = {"g": 10} # train on earth
    eval_env_parameters_dict = {"g": 24} # lets evaluate it on Jupiter!
    test_env_parameters_dict = {"g": 10}

    return (
        make_env, #function that returns gym.Env instance
        "Pendulum-v0", #string name of this environment for logging purpouse
        train_env_parameters_dict, # dictionary of parameters to be passed to make_env function when creating train set
        eval_env_parameters_dict, # dictionary of parameters to be passed to make_env function when creating eval set
        test_env_parameters_dict,# dictionary of parameters to be passed to make_env function when creating test set
    )

Setting algorithm hyperparamters for grid search

Because I expect to be the only user of this code, I decided to keep grid search parameters in code instead of parsing a yaml or json file: Just change values in exp.algorithm_registry.A2C_hyperparameters to create grid search for A2C algorithm, exp.algorithm_registry.PPO_hyperparameters for PPO etc.

def A2C_hyperparameters():

    return {
        'policy': ["MlpPolicy"],
        'lr_schedule': ["linear", 'constant'],
         'gamma': [0.9,0.9999],
    }

Changing function A2C_hyperparameters to match the one above and setting --algorithms parameter to A2C will produce following grid search:

'all algoritms and hyperparameters to be tested:'
{'A2C': [{'gamma': 0.9, 'lr_schedule': 'linear', 'policy': 'MlpPolicy'},
         {'gamma': 0.9999, 'lr_schedule': 'linear', 'policy': 'MlpPolicy'},
         {'gamma': 0.9, 'lr_schedule': 'constant', 'policy': 'MlpPolicy'},
         {'gamma': 0.9999, 'lr_schedule': 'constant', 'policy': 'MlpPolicy'}],

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
betting_env		betting_env
exp		exp
imgs		imgs
resources		resources
tests/betting_env_tests		tests/betting_env_tests
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

betting_env

betting_env

exp

exp

imgs

imgs

resources

resources

tests/betting_env_tests

tests/betting_env_tests

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Reinforcement Learning for Sports betting

What is it about?

Which part of the data did I use?

What does observation look like?

Environment

Metrics

Neptune.ai console

Tensorboard

Text output

Error bar

Example parameters

Customisation

Other environments

Setting algorithm hyperparamters for grid search

About

Releases

Packages

Languages

bartekwojcik/SportBettingRL

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Sports betting

What is it about?

Which part of the data did I use?

What does observation look like?

Environment

Metrics

Neptune.ai console

Tensorboard

Text output

Error bar

Example parameters

Customisation

Other environments

Setting algorithm hyperparamters for grid search

About

Resources

Stars

Watchers

Forks

Languages