Actor-critic with experience replay (ACER) . Uses batch off-policy updates to improve stability. Trust region updates can be enabled with
--trust-region. Currently uses full trust region instead of "efficient" trust region (see issue #1).
python main.py <options>. To run asynchronous advantage actor-critic (A3C)  (but with a Q-value head), use the
To install all dependencies with Anaconda run
conda env create -f environment.yml and use
source activate acer to activate the environment.