Skip to content

act65/rl_lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The goal is to build an efficient learner which I can use for my other projects.

We use;

  • the 'soft watkins' td update (from Human-level Atari 200x faster) to help correct for off policy actions and allow the use of multi step returns.
  • an exponential moving average target network to help stabilise training (I havent seen elsewhere, but havent properly looked. still needs to be evaluated -- WIP)
  • (TODOs) uncertainty + discount / exploration / multiagent / reward normalisation / etc

There are also some replay buffers implemented using reverb.

Code is inspired in style by (/ copied from) the rlax examples.

About

utils for doing rl

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages