Implementation of A3C (Asynchronous Advantage Actor-Critic)
./train.py --model_dir /tmp/a3c --env Breakout-v0 --t_max 5 --eval_every 300 --parallelism 8
./train.py --help for a full list of options. Then, monitor training progress in Tensorboard:
train.pycontains the main method to start training.
estimators.pycontains the Tensorflow graph definitions for the Policy and Value networks.
worker.pycontains code that runs in each worker threads.
policy_monitor.pycontains code that evaluates the policy network by running an episode and saving rewards to Tensorboard.