GitHub - act65/rl_lib: utils for doing rl

The goal is to build an efficient learner which I can use for my other projects.

We use;

the 'soft watkins' td update (from Human-level Atari 200x faster) to help correct for off policy actions and allow the use of multi step returns.
an exponential moving average target network to help stabilise training (I havent seen elsewhere, but havent properly looked. still needs to be evaluated -- WIP)
(TODOs) uncertainty + discount / exploration / multiagent / reward normalisation / etc

There are also some replay buffers implemented using reverb.

a replay buffer supporting multi-step returns,
a multi agent replay buffer,
a replay buffer supporting offline / prior data (from Efficient Online Reinforcement Learning with Offline Data)

Code is inspired in style by (/ copied from) the rlax examples.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
experiments		experiments
rl_lib		rl_lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback