The purpose of this repository is to bridge the gap between study-level implementations and practical package implementations in reinforcement learning. In particular, this repository aims to minimally modularize each component in RL while allowing users to run multiple experiments on various configurations. All codes are heavily based on the following:
Agent | Action type | reference |
---|---|---|
REINFORCE | Discrete/Continuous | Chapter 13.3, [1] |
REINFORCE with baseline | Discrete/Continuous | Chapter 13.4, [1] |
ActorCritic | Discrete/Continuous | Chapter 13.5, [1] |
DDPG | Continuous | Lillicrap et al., 2016 [2] |
TD3 | Continuous | Fujimoto et al., 2018 [3] |
[1] Sutton, R. S., Barto, A. G. (2018). Reinforcement Learning: An Introduction. The MIT Press.
[2] Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra. “Continuous control with deep reinforcement learning.” In ICLR (Poster), 2016. http://arxiv.org/abs/1509.02971.
[3] Fujimoto, Scott, Herke van Hoof, David Meger. “Addressing Function Approximation Error in Actor-Critic Methods”. In Proceedings of the 35th International Conference on Machine Learning, Jennifer Dy, Andreas Krause, 80:1587–96. Proceedings of Machine Learning Research. PMLR, 2018. https://proceedings.mlr.press/v80/fujimoto18a.html.