For others things to do, and issues to solve, see the issue tracker on GitHub.
- Add support to save my figures in SVG, EPS, PDF
- Finish to write a perfectly clean CLI client to my Python server
- Write a small library that can be included in any other C++ program to do : 1. start the socket connexion to the server, 2. then play one step at a time,
- Check that the library can be used within a GNU Radio block !
- remove the
delta_t_save
"feature" - remove the
delta_t_plot
"feature"
- Clean up initial code, keep it clean and commented, OK.
- Lint the code and make it (almost) "perfect" regarding Python style recommandation, OK.
- Pass it to Python 3.4 (while still being valid Python 2.7), OK. It is valid Python, both v2 (2.7), and v3 (3.4, 3.5, 3.6, 3.7).
- Add more arms: Gaussian, Exponential, Poisson, Uniform, and more.
- Add my aggregated bandit algorithm, and state of the art aggregation algorithms. Cf. my IEEE WCNC 2018 article.
- In fact, exhaustive grid search cannot be easily used as it cannot run on-line! Sadly, OK.
- add plots that show the percentage of optimal arms play (e.g., as done in this paper)
- fully profile my code, with
cProfile
for functions andline_profiler
for line-by-line. No surprise here:Beta.py
is the slow part, as it takes time to sample and compute the quantiles (even by using the goodnumpy.random
/scipy.stats
functions). See for instance this log file (withcProfile
) or this one (withline_profiler
). - I could have tried to improve the efficiency bottlenecks, with smart
numpy
/scipy
code (I did not find anything useful), ornumba.jit
? (does not seem to be possible), orcython
code (not so easy, not so interesting)... Maybe usingnumexpr
: nope! Maybe usingglumpy
for faster vizualisations: nope! Using pypy is impossible as it does not support all ofnumpy
yet, and none ofmatplotlib
. - Explore the behavior of my Aggregator algorithm, and understand it better (and improve it? it would only be by parameter tweaking, not interesting, so NOPE).
- Rewards not in
{0, 1}
are handled with a trick, with a "random binarization", cf., [Agrawal & Goyal, 2012] (algorithm 2): when rewardr_t \in [0, 1]
is observed, the player receives the result of a Bernoulli sample of averager_t
:r_t <- sample from Bernoulli(r_t)
so it is well in{0, 1}
. Works fine forExponential
arms, for instance. - Test again (and adapt, if needed) each single-player policy against non-Bernoulli arms (Gaussian, Exponential, Poisson).
- My framework can also handle rewards which are not bounded in
[0, 1]
, but can not handle unbounded rewards (eg. non-truncated Gaussian or Poisson), yet.
- I implemented two variants of the
KL-UCB
policy:KL-UCB-Plus
from [Capp茅 et al., 2013],KL-UCB-H-Plus
from [Lai, 1987], andKL-UCB-++
from [M茅nard & Garivier, 2017]. - I implemented all the algorithms from this document, and from this repository (
Exp3
). - I don't see any other simple bandit algorithm I could implement, I wrote all the one described in this reference survey [Bubeck & Cesa-Bianchi, 2012].
- keep it up-to-date on GitHub
- I could document this project better. Now there is a Sphinx documentation (
make -B apidoc doc
then open_build/html/index.html
). Each file has a nice docstring, some useful comments for the interesting parts, and the GitHub project contains insights on how to use the framework as well as the organization of the code. I added theAPI.md
file to document the arms and policies API.
- use hdf5 (with
h5py
) to store the data, on the run (to never lose data, even if the simulation gets killed). - even more "secure": be able to interrupt the simulation, save its state and then load it back if needed (for instance if you want to leave the office for the weekend).
- implement a multi-player simulation environment as well! Done, in EvaluatorMultiPlayers.
- implement different collision models (4 different models as far as now), and try it on each, with different setting (K < M, M = K, M < K, static or dynamic, Bernoulli or non-Bernoulli arms etc).
- implement the basic multi-player policies: the fully decentralized
Selfish
policy (where every player runs her own policy, without even knowing that there is other player, but receiving0
reward in case of collision), two stupid centralized policiesCentralizedFixed
andCentralizedCycling
, and two oracle policiesOracleNotFair
andOracleFair
. - implement a centralized non-oracle policy, which is just a multiple-play single-player policy, in
CentralizedMultiplePlay
. The single-player policy uses thechoiceMultiple(nb)
method to chose directlyM
arms for theM
players. It works very well: no collision, and very fast identification of the bestM
arms! - plot several "multi-players" policy on the same graphs (e.g., the cumulative centralized regret of
M
players followingSelfish[UCB]
against the regret ofM
players followingSelfish[klCUB]
, orALOHA
vsrhoRand
vsMusicalChair
).
- I implemented the "Musical Chair" policy, from [Shamir et al., 2015], in
MusicalChair
. FIXME the "Dynamic Musical Chair" that regularly reinitialize "Musical Chair"... - I implemented the "rho_rand" from [Anandkumar et al., 2009], in
rhoRand
. It consists of the rho_rand collision avoidance protocol, and any single-player policy. FIXME therhoEst
policy that estimate the number of users from collision. - I implemented the "MEGA" multi-player policy from [Avner & Mannor, 2014], in
MEGA
. It consists of the ALOHA collision avoidance protocol and a Epsilon-greedy arm selection algorithm. - I also generalized it by implementing the ALOHA collision avoidance protocol for any single-player policy, in
ALOHA
. FIXME it has too much parameter, how to chose them?? - TODO "TDFS" from [Liu & Zhao, 2009].
- add the possibility to have a varying number of dynamic users for multi-users simulations in AlgoBandits.git
- implement the experiments from [Musical Chair], [rhoRand] articles, and Navik Modi's experiments
- publish on GitHub my Python/Cython implementation of Lempel-Ziv: it's here: https://github.com/Naereen/Lempel-Ziv_Complexity!