Skip to content

Latest commit

History

History
71 lines (55 loc) 路 8.88 KB

TODO.md

File metadata and controls

71 lines (55 loc) 路 8.88 KB

馃挜 TODO

For others things to do, and issues to solve, see the issue tracker on GitHub.

  • Add support to save my figures in SVG, EPS, PDF

C++ library

  • Finish to write a perfectly clean CLI client to my Python server
  • Write a small library that can be included in any other C++ program to do : 1. start the socket connexion to the server, 2. then play one step at a time,
  • Check that the library can be used within a GNU Radio block !

Clean up things (recently) - FIXME

  • remove the delta_t_save "feature"
  • remove the delta_t_plot "feature"

Initial things to do! - OK

  • Clean up initial code, keep it clean and commented, OK.
  • Lint the code and make it (almost) "perfect" regarding Python style recommandation, OK.
  • Pass it to Python 3.4 (while still being valid Python 2.7), OK. It is valid Python, both v2 (2.7), and v3 (3.4, 3.5, 3.6, 3.7).
  • Add more arms: Gaussian, Exponential, Poisson, Uniform, and more.
  • Add my aggregated bandit algorithm, and state of the art aggregation algorithms. Cf. my IEEE WCNC 2018 article.

Improve the code? - OK

  • In fact, exhaustive grid search cannot be easily used as it cannot run on-line! Sadly, OK.
  • add plots that show the percentage of optimal arms play (e.g., as done in this paper)
  • fully profile my code, with cProfile for functions and line_profiler for line-by-line. No surprise here: Beta.py is the slow part, as it takes time to sample and compute the quantiles (even by using the good numpy.random/scipy.stats functions). See for instance this log file (with cProfile) or this one (with line_profiler).
  • I could have tried to improve the efficiency bottlenecks, with smart numpy/scipy code (I did not find anything useful), or numba.jit ? (does not seem to be possible), or cython code (not so easy, not so interesting)... Maybe using numexpr: nope! Maybe using glumpy for faster vizualisations: nope! Using pypy is impossible as it does not support all of numpy yet, and none of matplotlib.
  • Explore the behavior of my Aggregator algorithm, and understand it better (and improve it? it would only be by parameter tweaking, not interesting, so NOPE).
  • Rewards not in {0, 1} are handled with a trick, with a "random binarization", cf., [Agrawal & Goyal, 2012] (algorithm 2): when reward r_t \in [0, 1] is observed, the player receives the result of a Bernoulli sample of average r_t: r_t <- sample from Bernoulli(r_t) so it is well in {0, 1}. Works fine for Exponential arms, for instance.
  • Test again (and adapt, if needed) each single-player policy against non-Bernoulli arms (Gaussian, Exponential, Poisson).
  • My framework can also handle rewards which are not bounded in [0, 1], but can not handle unbounded rewards (eg. non-truncated Gaussian or Poisson), yet.

More single-player MAB algorithms? - OK


Publicly release it and document it - OK

Better storing of the simulation results

  • use hdf5 (with h5py) to store the data, on the run (to never lose data, even if the simulation gets killed).
  • even more "secure": be able to interrupt the simulation, save its state and then load it back if needed (for instance if you want to leave the office for the weekend).

Multi-players simulations!

  • implement a multi-player simulation environment as well! Done, in EvaluatorMultiPlayers.
  • implement different collision models (4 different models as far as now), and try it on each, with different setting (K < M, M = K, M < K, static or dynamic, Bernoulli or non-Bernoulli arms etc).
  • implement the basic multi-player policies: the fully decentralized Selfish policy (where every player runs her own policy, without even knowing that there is other player, but receiving 0 reward in case of collision), two stupid centralized policies CentralizedFixed and CentralizedCycling, and two oracle policies OracleNotFair and OracleFair.
  • implement a centralized non-oracle policy, which is just a multiple-play single-player policy, in CentralizedMultiplePlay. The single-player policy uses the choiceMultiple(nb) method to chose directly M arms for the M players. It works very well: no collision, and very fast identification of the best M arms!
  • plot several "multi-players" policy on the same graphs (e.g., the cumulative centralized regret of M players following Selfish[UCB] against the regret of M players following Selfish[klCUB], or ALOHA vs rhoRand vs MusicalChair).

State-of-the-art algorithms

Dynamic settings

  • add the possibility to have a varying number of dynamic users for multi-users simulations in AlgoBandits.git
  • implement the experiments from [Musical Chair], [rhoRand] articles, and Navik Modi's experiments

Other aspects