Actor-critic with experience replay
Clone or download
mirraaj and Kaixhin Changed trust region settings. Correct implementation. (#15)
* Changed trust region settings. Correct implementation.

* Add small value to avoid nans

* Suggested Changes (Minor )
Latest commit 01f612b Jul 13, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
figures Add plot Jul 28, 2017
.gitignore Ignore results dir Jun 17, 2018
LICENSE.md Initial commit May 6, 2017
README.md Add environment.yml Dec 22, 2017
environment.yml Add environment.yml Dec 22, 2017
main.py small Jun 14, 2018
memory.py Improve memory.py implementation (#14) Jul 7, 2018
model.py Upgrade to PyTorch 0.4 Jun 1, 2018
optim.py fix unicode python2 May 19, 2017
test.py Fix pickling results error Jun 17, 2018
train.py Changed trust region settings. Correct implementation. (#15) Jul 13, 2018
utils.py Suggestions added Jun 13, 2018

README.md

ACER

MIT License

Actor-critic with experience replay (ACER) [1]. Uses batch off-policy updates to improve stability. Trust region updates can be enabled with --trust-region. Currently uses full trust region instead of "efficient" trust region (see issue #1).

Run with python main.py <options>. To run asynchronous advantage actor-critic (A3C) [2] (but with a Q-value head), use the --on-policy option.

Requirements

To install all dependencies with Anaconda run conda env create -f environment.yml and use source activate acer to activate the environment.

Results

ACER

Acknowledgements

References

[1] Sample Efficient Actor-Critic with Experience Replay
[2] Asynchronous Methods for Deep Reinforcement Learning