Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time


Below, I present a most likely incomplete list of works I referred to when I was working on this library:

Learning rate and optimization

Residual Networks

Deep Q-Learning

  • (Feb 2015) Human-level control through deep reinforcement learning Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis

  • (Sep 2015) Deep Reinforcement Learning with Double Q-learning Hado van Hasselt and Arthur Guez and David Silver

  • (Nov 2015) Prioritized Experience Replay Tom Schaul and John Quan and Ioannis Antonoglou and David Silver

  • (Nov 2015) Dueling Network Architectures for Deep Reinforcement Learning Ziyu Wang and Nando de Freitas and Marc Lanctot

  • (Jun 2017) Noisy Networks for Exploration Meire Fortunato and Mohammad Gheshlaghi Azar and Bilal Piot and Jacob Menick and Ian Osband and Alex Graves and Vlad Mnih and Rémi Munos and Demis Hassabis and Olivier Pietquin and Charles Blundell and Shane Legg

  • (Jul 2017) A Distributional Perspective on Reinforcement Learning Marc G. Bellemare, Will Dabney, Rémi Munos

  • (Oct 2017) Rainbow: Combining Improvements in Deep Reinforcement Learning Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

Policy gradient methods

  • (May 2012) Off-Policy Actor-Critic Thomas Degris and Martha White and Richard S. Sutton

  • (Jun 2014) Deterministic Policy Gradient Algorithms Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin

  • (Feb 2015) Trust Region Policy Optimization John Schulman and Sergey Levine and Philipp Moritz and Michael I. Jordan and Pieter Abbeel

  • (Jun 2015) High-Dimensional Continuous Control Using Generalized Advantage Estimation John Schulman and Philipp Moritz and Sergey Levine and Michael I. Jordan and Pieter Abbeel

  • (Sep 2015) Continuous control with deep reinforcement learning Timothy P. Lillicrap and Jonathan J. Hunt and Alexander Pritzel and Nicolas Heess and Tom Erez and Yuval Tassa and David Silver and Daan Wierstra

  • (Feb 2016) Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih and Adria Puigdomenech Badia and Mehdi Mirza and Alex Graves and Timothy P. Lillicrap and Tim Harley and David Silver and Koray Kavukcuoglu

  • (Jun 2016) Safe and Efficient Off-Policy Reinforcement Learning Remi Munos and Tom Stepleton and Anna Harutyunyan and Marc G. Bellemare

  • (Nov 2016) Sample Efficient Actor-Critic with Experience Replay Ziyu Wang and Victor Bapst and Nicolas Heess and Volodymyr Mnih and Remi Munos and Koray Kavukcuoglu and Nando de Freitas

  • (Jul 2017) Proximal Policy Optimization Algorithms John Schulman and Filip Wolski and Prafulla Dhariwal and Alec Radford and Oleg Klimov

Various blogposts

Open source repositories

This repository contains various parts of functionality derived from open source code in the following repositories: