Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
130 lines (94 sloc) 4.83 KB

Bibliography

Below, I present a most likely incomplete list of works I referred to when I was working on this library:

Learning rate and optimization

Residual Networks

Deep Q-Learning

  • (Feb 2015) Human-level control through deep reinforcement learning Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis https://www.nature.com/articles/nature14236

  • (Sep 2015) Deep Reinforcement Learning with Double Q-learning Hado van Hasselt and Arthur Guez and David Silver http://arxiv.org/abs/1509.06461

  • (Nov 2015) Prioritized Experience Replay Tom Schaul and John Quan and Ioannis Antonoglou and David Silver http://arxiv.org/abs/1511.05952

  • (Nov 2015) Dueling Network Architectures for Deep Reinforcement Learning Ziyu Wang and Nando de Freitas and Marc Lanctot http://arxiv.org/abs/1511.06581

  • (Jun 2017) Noisy Networks for Exploration Meire Fortunato and Mohammad Gheshlaghi Azar and Bilal Piot and Jacob Menick and Ian Osband and Alex Graves and Vlad Mnih and Rémi Munos and Demis Hassabis and Olivier Pietquin and Charles Blundell and Shane Legg https://arxiv.org/abs/1901.01753

  • (Jul 2017) A Distributional Perspective on Reinforcement Learning Marc G. Bellemare, Will Dabney, Rémi Munos https://arxiv.org/abs/1707.06887

  • (Oct 2017) Rainbow: Combining Improvements in Deep Reinforcement Learning Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver https://arxiv.org/abs/1710.02298

Policy gradient methods

  • (May 2012) Off-Policy Actor-Critic Thomas Degris and Martha White and Richard S. Sutton http://arxiv.org/abs/1205.4839

  • (Jun 2014) Deterministic Policy Gradient Algorithms Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin http://dl.acm.org/citation.cfm?id=3044805.3044850

  • (Feb 2015) Trust Region Policy Optimization John Schulman and Sergey Levine and Philipp Moritz and Michael I. Jordan and Pieter Abbeel https://arxiv.org/abs/1502.05477

  • (Jun 2015) High-Dimensional Continuous Control Using Generalized Advantage Estimation John Schulman and Philipp Moritz and Sergey Levine and Michael I. Jordan and Pieter Abbeel http://arxiv.org/abs/1506.02438

  • (Sep 2015) Continuous control with deep reinforcement learning Timothy P. Lillicrap and Jonathan J. Hunt and Alexander Pritzel and Nicolas Heess and Tom Erez and Yuval Tassa and David Silver and Daan Wierstra http://arxiv.org/abs/1509.02971

  • (Feb 2016) Asynchronous Methods for Deep Reinforcement Learning Volodymyr Mnih and Adria Puigdomenech Badia and Mehdi Mirza and Alex Graves and Timothy P. Lillicrap and Tim Harley and David Silver and Koray Kavukcuoglu https://arxiv.org/abs/1602.01783

  • (Jun 2016) Safe and Efficient Off-Policy Reinforcement Learning Remi Munos and Tom Stepleton and Anna Harutyunyan and Marc G. Bellemare http://arxiv.org/abs/1606.02647

  • (Nov 2016) Sample Efficient Actor-Critic with Experience Replay Ziyu Wang and Victor Bapst and Nicolas Heess and Volodymyr Mnih and Remi Munos and Koray Kavukcuoglu and Nando de Freitas http://arxiv.org/abs/1611.01224

  • (Jul 2017) Proximal Policy Optimization Algorithms John Schulman and Filip Wolski and Prafulla Dhariwal and Alec Radford and Oleg Klimov https://arxiv.org/abs/1707.06347

Various blogposts

Open source repositories

This repository contains various parts of functionality derived from open source code in the following repositories:

You can’t perform that action at this time.