A2C

Original paper: https://arxiv.org/abs/1602.01783
Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
python -m baselines.run --alg=a2c --env=PongNoFrameskip-v4 runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (-h) for more options
also refer to the repo-wide README.md

Files

run_atari: file used to run the algorithm.
policies.py: contains the different versions of the A2C architecture (MlpPolicy, CNNPolicy, LstmPolicy...).
a2c.py: - Model : class used to initialize the step_model (sampling) and train_model (training)
- learn : Main entrypoint for A2C algorithm. Train a policy with given network architecture on a given environment using a2c algorithm.
runner.py: class used to generates a batch of experiences