💥 TODO

For others things to do, and issues to solve, see the issue tracker on GitHub.

Add support to save my figures in SVG, EPS, PDF

C++ library

Finish to write a perfectly clean CLI client to my Python server
Write a small library that can be included in any other C++ program to do : 1. start the socket connexion to the server, 2. then play one step at a time,
Check that the library can be used within a GNU Radio block !

Clean up things (recently) - FIXME

remove the delta_t_save "feature"
remove the delta_t_plot "feature"

Initial things to do! - OK

Clean up initial code, keep it clean and commented, OK.
Lint the code and make it (almost) "perfect" regarding Python style recommandation, OK.
Pass it to Python 3.4 (while still being valid Python 2.7), OK. It is valid Python, both v2 (2.7), and v3 (3.4, 3.5, 3.6, 3.7).
Add more arms: Gaussian, Exponential, Poisson, Uniform, and more.
Add my aggregated bandit algorithm, and state of the art aggregation algorithms. Cf. my IEEE WCNC 2018 article.

Improve the code? - OK

In fact, exhaustive grid search cannot be easily used as it cannot run on-line! Sadly, OK.
add plots that show the percentage of optimal arms play (e.g., as done in this paper)
fully profile my code, with cProfile for functions and line_profiler for line-by-line. No surprise here: Beta.py is the slow part, as it takes time to sample and compute the quantiles (even by using the good numpy.random/scipy.stats functions). See for instance this log file (with cProfile) or this one (with line_profiler).
I could have tried to improve the efficiency bottlenecks, with smart numpy/scipy code (I did not find anything useful), or numba.jit ? (does not seem to be possible), or cython code (not so easy, not so interesting)... Maybe using numexpr: nope! Maybe using glumpy for faster vizualisations: nope! Using pypy is impossible as it does not support all of numpy yet, and none of matplotlib.
Explore the behavior of my Aggregator algorithm, and understand it better (and improve it? it would only be by parameter tweaking, not interesting, so NOPE).
Rewards not in {0, 1} are handled with a trick, with a "random binarization", cf., [Agrawal & Goyal, 2012] (algorithm 2): when reward r_t \in [0, 1] is observed, the player receives the result of a Bernoulli sample of average r_t: r_t <- sample from Bernoulli(r_t) so it is well in {0, 1}. Works fine for Exponential arms, for instance.
Test again (and adapt, if needed) each single-player policy against non-Bernoulli arms (Gaussian, Exponential, Poisson).
My framework can also handle rewards which are not bounded in [0, 1], but can not handle unbounded rewards (eg. non-truncated Gaussian or Poisson), yet.

More single-player MAB algorithms? - OK

I implemented two variants of the KL-UCB policy: KL-UCB-Plus from [Cappé et al., 2013], KL-UCB-H-Plus from [Lai, 1987], and KL-UCB-++ from [Ménard & Garivier, 2017].
I implemented all the algorithms from this document, and from this repository (Exp3).
I don't see any other simple bandit algorithm I could implement, I wrote all the one described in this reference survey [Bubeck & Cesa-Bianchi, 2012].

Publicly release it and document it - OK

keep it up-to-date on GitHub
I could document this project better. Now there is a Sphinx documentation (make -B apidoc doc then open _build/html/index.html). Each file has a nice docstring, some useful comments for the interesting parts, and the GitHub project contains insights on how to use the framework as well as the organization of the code. I added the API.md file to document the arms and policies API.

Better storing of the simulation results

use hdf5 (with h5py) to store the data, on the run (to never lose data, even if the simulation gets killed).
even more "secure": be able to interrupt the simulation, save its state and then load it back if needed (for instance if you want to leave the office for the weekend).

Multi-players simulations!

implement a multi-player simulation environment as well! Done, in EvaluatorMultiPlayers.
implement different collision models (4 different models as far as now), and try it on each, with different setting (K < M, M = K, M < K, static or dynamic, Bernoulli or non-Bernoulli arms etc).
implement the basic multi-player policies: the fully decentralized Selfish policy (where every player runs her own policy, without even knowing that there is other player, but receiving 0 reward in case of collision), two stupid centralized policies CentralizedFixed and CentralizedCycling, and two oracle policies OracleNotFair and OracleFair.
implement a centralized non-oracle policy, which is just a multiple-play single-player policy, in CentralizedMultiplePlay. The single-player policy uses the choiceMultiple(nb) method to chose directly M arms for the M players. It works very well: no collision, and very fast identification of the best M arms!
plot several "multi-players" policy on the same graphs (e.g., the cumulative centralized regret of M players following Selfish[UCB] against the regret of M players following Selfish[klCUB], or ALOHA vs rhoRand vs MusicalChair).

State-of-the-art algorithms

I implemented the "Musical Chair" policy, from [Shamir et al., 2015], in MusicalChair. FIXME the "Dynamic Musical Chair" that regularly reinitialize "Musical Chair"...
I implemented the "rho_rand" from [Anandkumar et al., 2009], in rhoRand. It consists of the rho_rand collision avoidance protocol, and any single-player policy. FIXME the rhoEst policy that estimate the number of users from collision.
I implemented the "MEGA" multi-player policy from [Avner & Mannor, 2014], in MEGA. It consists of the ALOHA collision avoidance protocol and a Epsilon-greedy arm selection algorithm.
I also generalized it by implementing the ALOHA collision avoidance protocol for any single-player policy, in ALOHA. FIXME it has too much parameter, how to chose them??
TODO "TDFS" from [Liu & Zhao, 2009].

Dynamic settings

add the possibility to have a varying number of dynamic users for multi-users simulations in AlgoBandits.git
implement the experiments from [Musical Chair], [rhoRand] articles, and Navik Modi's experiments

Other aspects

publish on GitHub my Python/Cython implementation of Lempel-Ziv: it's here: https://github.com/Naereen/Lempel-Ziv_Complexity!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

💥 TODO

C++ library

Clean up things (recently) - FIXME

Initial things to do! - OK

Improve the code? - OK

More single-player MAB algorithms? - OK

Publicly release it and document it - OK

Better storing of the simulation results

Multi-players simulations!

State-of-the-art algorithms

Dynamic settings

Other aspects

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls

💥 TODO

C++ library

Clean up things (recently) - FIXME

Initial things to do! - OK

Improve the code? - OK

More single-player MAB algorithms? - OK

Publicly release it and document it - OK

Better storing of the simulation results

Multi-players simulations!

State-of-the-art algorithms

Dynamic settings

Other aspects