Skip to content

Multi-Armed Bandit method of accurately estimating the largest parameter out of a set of candidates.

Notifications You must be signed in to change notification settings

FlynnOwen/multi-armed-bandits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi Armed Bandits (MAB).

A study consists of a collection of bandits, $B = \{b_1, ..., b_n\}$, where a bandit, $b_i$ has an associated parameter $\theta_i$ (or two parameters) that are within the exponential family of distributions. Thus, we have a collection $\theta = \{\theta_1, ..., \theta_n \}$ of parameters (for the sake of two parameter distribution (e.g Gaussian), we refer to the primary parameter ($\mu$) as this single parameter).

When 'pulling' a bandit, we are generating a random variable from this bandit's distribution. For example, if generating a random variable from bandit $i$ that follows a Poisson distribution, we denote this $X_i \thicksim Pois(\theta_i)$. 'Payout' or 'reward' is what we call this generated random variable, $x_i$, and is what we wish to maximize. In the case of gambling machines, it could be a 'win' (Bernoulli), or some form of 'monetary payout' (Gaussian).

We therefore wish to find the optimal bandit to exploit; or in other words the bandit with the highest parameter: $\tilde{\theta} = max(\{\theta_1, ..., \theta_n\})$.

There is a second constraint however, that is doing this using the fewest generations (bandit pulls) as possible. Reasons for this include being able to maximize cumulative reward, given a fixed number of generations (e.g if there is a time constraint), or generating may have another associated cost (e.g paying for each armed bandit pull).

The problem can therefore be encapsulated as: Given a fixed number of generations, $g \in \{1, ..., G\}$, maximizing cumulative reward: $\sum{x_g}$. This can be done by a number of different strategies - and all include some relation to balancing exploration (trying to accurately estimate each parameter $\{\hat{\theta}_1, ..., \hat{\theta}_n \}$), with exploration (the cumulative reward gained by just generating from the optimal bandit $\tilde{\theta}$).

Usage

Pre-requisites

To run the below commands, it's suggested that Just is installed.

Running Simulations

Options exist for simulating both directly using the command line, or via passing a configuration file in the form of JSON. The second is recommended for reproducable simulation studies.

To see all possible simulation commands run

just help

or help for a specific command:

just {{COMMAND}} help

e.g:

just list-distributions help

To view possible distributions, and information about their associated parameters, run:

just list-distributions

Simulating directly

To run a simulation by passing arguments directly, run:

just {{COMMAND}} {{*ARGS}}

Where COMMAND is a command returned from just help, and args are those required by a specific command upon running just {{COMMAND}} help.

Simulating from config (json) file

There is the option to perform a MAB simulation, reading arguments and configuration from a config file. To perform this simulation, run:

just simulate-from-json {{COMMAND}} {{CONFIG}}

Testing

To run all unit and integration tests in the repository, execute:

just local-test

Sample Outputs

Metrics from the simulations are output to stdout.

Plots are also optionally produced, detailing the the performance of the simulation.

About

Multi-Armed Bandit method of accurately estimating the largest parameter out of a set of candidates.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published