We provide results for reinforcement learning algorithms.
We provide all results in google drive and baidu drive (extraction code: 0fph).
-
sac-rlkit:
- code: rlkit with default hyperparameter setting.
- Algorithm: soft actor critic (SAC). Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).
-
ddpg-rlkit:
- Code: rlkit with default hyperparameter setting.
- Algorithm: deep deterministic policy gradient (DDPG). Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
-
td3:
- Code: authors' code with default hyperparameter setting.
- Algorithm: twin delayed deep deterministic policy gradient arglorithm (TD3). Fujimoto, Scott, Herke Van Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." arXiv preprint arXiv:1802.09477 (2018).
-
ppo:
- Code: OpenAI Baselines with default hyperparameter setting.
- Algorithm: proximal policy optimization (PPO).Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
-
trpo:
- Code: OpenAI Baselines with default hyperparameter setting.
- Algorithm: trust region policy optimization (TRPO).Schulman, John, et al. "Trust region policy optimization." International conference on machine learning. 2015.
- More algorithms and implementations, such as the official implementation of sac.
- More than ten random seeds for each algorithm.
- Results for Atari tasks.