- Original paper: https://arxiv.org/abs/1602.01783
- Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
python -m baselines.run --alg=a2c --env=PongNoFrameskip-v4
runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (-h
) for more options- also refer to the repo-wide README.md
run_atari
: file used to run the algorithm.policies.py
: contains the different versions of the A2C architecture (MlpPolicy, CNNPolicy, LstmPolicy...).a2c.py
: - Model : class used to initialize the step_model (sampling) and train_model (training)- learn : Main entrypoint for A2C algorithm. Train a policy with given network architecture on a given environment using a2c algorithm.
runner.py
: class used to generates a batch of experiences