Skip to content

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Notifications You must be signed in to change notification settings

MaxenceGiraud/ucb-nonstationary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Implementation of the paper by Aurélien Garivier and Eric Moulines, On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems [1]. We also try some variants of the algorithms and compare them together.

Our experiments with the different algorithms are compiled in the notebook experiements.ipynb/

Installation

To install simply clone the project :

git clone https://github.com/MaxenceGiraud/ucb-nonstationary
cd ucb-nonstationary/

Usage

import numpy as np
import nsucb
from bandit_env import *

# Arms sequence
def arm_f(t):
    arms = [Bernoulli(0.5),Bernoulli(0.1),Bernoulli(0.4)]
    if t> 300 and t<500 :
        arms[1] = Bernoulli(0.9)
    return arms 

n=3 # nb of arms
mab = MAB_NS(3,arm_f)

# Algorithms
ucb = nsucb.UCB(n)
d= nsucb.DiscountedUCB(n)
sw= nsucb.SlidingUCB(n)

# Run simulations
RunExpes([ucb,d,sw],mab,50,T,non_stationary=True,quantiles=False)

To compile the report, you will need latex installed and an appropriate compiler, then you can simply :

cd report/
pdflatex main.tex

TODO

  • Implement non stationary Bandit
  • Discounted UCB
  • Sliding-Window UCB

References

[1] Garivier, Aurélien & Moulines, Eric. (2008). On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems.

About

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Topics

Resources

Stars

Watchers

Forks