Deep Reinforcement Learning

A constantly evolving list of Reinforcement Learning papers, notes, books etc.

Glossary:

🚀 - state-of-the-art method in current domain at the moment of paper publication.
⭐ - valuable paper.

Click here to see Domain Tag icon descriptions.

- Model-based RL (Model-based).
- Multi-Agent RL (MARL).
- Self-Play.
- Evolutionary & Genetic Algorithms (Evolution).
- Generalization across environments (Generalization).
- Neural Networks & Optimizers (NN).
- Manipulation tasks (Manipulator).
- Locomotion: MuJoCo, Roboschool, etc (Locomotion)
- Mazes and Labyrinths (Maze).
- Strategy Planning Problems (Planning).
- Transfer learning (Transfer).
- Inverse Reinforcement Learning (IRL)
- Meta-Learning
- Sparse Reward Problems and/or Montezuma's Revenge (Sparse)
- Atari game (Atari).
- Table games (Table).
- Doom game (Doom).
- Starcraft game (Starcraft).
- Go game (Go).

Deep Reinforcement Learning

Year 2018

🚀 RUDDER: Return Decomposition for Delayed Rewards

[arXiv] [code] Arjona-Medina et al.; Johannes Kepler University Linz
Sparse, Atari

Relational Deep Reinforcement Learning

[arXiv] Zambaldi et al.; DeepMind
Planning, Starcraft

Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems

[arXiv] Stanton and Clune; University of Wyoming
Sparse

AutoAugment: Learning Augmentation Policies from Data

[arXiv] Cubuk et al.; Google Brain
NN

Playing Atari with Six Neurons

[arXiv] Cucci et al.; University of Fribourg, NYU
Atari

⭐ World Models

[arXiv] [blog] Ha and Schmidhuber; IDSIA, Google Brain, NNAISENSE
Model-based, Doom

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

[arXiv] Chrabaszcz et al.; University of Freiburg
Evolution, Atari

🚀 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

[arXiv] Such et al.; Uber AI Labs
Atari, Maze

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

[arXiv] Finn et al.; UC Berkeley
IRL, Manipulator

🚀 Regularized Evolution for Image Classifier Architecture Search

[arXiv] Real et al.; Google Brain
Evo, NN

Year 2017

⭐ Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

[arXiv] Such et al.; Uber AI Labs
Locomotion, Atari

⭐ Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

[arxiv] Silver et al.; DeepMind
Self-Play, Planning, Table

⭐ Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)

[arXiv] Hessel et al.; Deepmind
Atari

⭐ Meta Learning Shared Hierarchies

[arXiv] [blog] Frans et al.; OpenAI, Berkeley.
Locomotion, Meta-Learning

One-Shot Visual Imitation Learning via Meta-Learning

[arXiv] [pdf] Finn et al.; UC Berkeley, OpenAI
IRL, Meta-Learning, Manipulator

Learning with Opponent-Learning Awareness (LOLA)

[arXiv] [blog] Foerster et al.; OpenAI, Oxford, Berkeley, CMU
MARL

🚀 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)

[arXiv] Wu et al.; University of Toronto, New York University
Locomotion, Atari

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

[arXiv] [blog] [code] Nagabandi et al.; Berkeley
Locomotion

🚀 Proximal Policy Optimization Algorithms (PPO)

[arXiv] [blog] Schulman et al.; OpenAI
Locomotion, Atari

🚀 Learning Transferable Architectures for Scalable Image Recognition

[arXiv] Zoph et al.; Google Brain
NN

Hybrid Reward Architecture for Reinforcement Learning (HRA)

[arXiv] van Seijen et al.; Microsoft Maluuba, McGill University
Meta-Learning, Atari

Parameter Space Noise for Exploration

[arXiv] Plappert et al.; OpenAI, Karlsruhe Institute of Technology
Locomotion, Atari

🚀 Mastering the Game of Go without Human Knowledge (AlphaGo Zero)

[pdf], [blog] Silver et al.; Deepmind
Self-Play, Planning, Go, Table

Neural Optimizer Search with Reinforcement Learning

[pdf] Bello et al.; Google Brain
NN

Asymmetric Actor Critic for Image-Based Robot Learning

[arXiv], [official blog post] Pinto et al.; OpenAI, CMU
Generalization, Manipulator

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

[arXiv], [blog] Peng et al.; OpenAI, Berkeley
Generalization, Manipulator

A Deep Reinforcement Learning Chatbot

[arXiv] Serban et al.; MILA

Learning model-based planning from scratch

[arXiv], [blog] Pascanu et al.; Google DeepMind
Model-based, Locomotion

⭐ Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)

[arXiv] [blog] Weber et al.; DeepMind
Planning, Transfer, Atari

Distral: Robust Multitask Reinforcement Learning

[arXiv] Teh et al.; DeepMind
Transfer, Maze

Emergence of Locomotion Behaviours in Rich Environments

[arXiv] [blog] Heess et al.; DeepMind
Locomotion

Programmable Agents

[arXiv] Denil et al.; DeepMind
Locomotion

⭐ Evolution Strategies as a Scalable Alternative to Reinforcement Learning

[arXiv] Salimans et al.; OpenAI
Atari

Neural Episodic Control

[arXiv] Pritzel et al.; DeepMind
Atari

Year 2016

The Predictron: End-To-End Learning and Planning

[arXiv] Silver et al.; DeepMind
Model-based, Planning, Maze

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning

[arXiv] Duan et al.; Berkeley, OpenAI
Meta-Learning, Maze

Neural Architecture Search with Reinforcement Learning

[arXiv] B. Zoph and Quoc V. Le; Google Brain; ICLR.
NN

Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)

[arXiv] Jaderberg et al.; Google DeepMind
📝 Notes
Locomotion, Atari, Maze

🚀 Learning to act by predicting the future (VizDoom 2016 Full DM Winner)

[arXiv] Dosovitskiy, Koltun; Intel Labs
Maze, Doom

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games

[arXiv] Peng et al.; Alibaba Group, University College London
MARL, Starcraft

Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)

[arXiv] Lample, Chaplot; Carnegie Mellon University
Maze, Doom

[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

[arXiv] Usunier et al.; Facebook AI Research
Starcraft

🚀 Asynchronous Methods for Deep Reinforcement Learning (A3C)

[arXiv] Mnih et al.; DeepMind
📝 Notes
Locomotion, Atari, Maze

Year 2015

⭐ Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)

[arXiv] Wang et al.; DeepMind
Atari

Prioritized Experience Replay

[arXiv] Schaul et al.; DeepMind
📝 Notes
Atari

⭐ Deep Reinforcement Learning with Double Q-learning (Double DQN)

[arXiv] Hasselt et al.; DeepMind
Atari

High-dimensional continuous control using generalized advantage estimation

[arXiv] Schulman et al.; Berkeley
Locomotion

⭐ Trust Region Policy Optimization (TRPO)

[arXiv] Schulman et al.; UC Berkeley
Atari, Maze, Locomotion

🚀 Human-level control through deep reinforcement learning (DQN)

[Nature] [pdf] Mnih et al.; Google Deepmind
📝 Notes
Atari

Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)

[Nature], [reddit] Silver et al.; Deepmind, Google
Self-Play, Planning, Go, Table

Year 2013

🚀 Playing Atari with Deep Reinforcement Learning (DQN)

[arXiv] Mnih et al.; DeepMind Technologies
Atari

Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning

[pdf] Koutnik et al.; IDSIA, USI-SUPSI
Evolution

2012 and earlier

Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction

[pdf] Sutton et al. (2011); University of Alberta, McGill University
Manipulator, Locomotion

⭐ Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

[pdf] Kohl and Stone (2004); The University of Texas at Austin
Manipulator, Locomotion

⭐ Autonomous helicopter flight via reinforcement learning

[pdf] Ng et al. (2004); Stanford, Berkeley
Manipulator

⭐ Actor-Critic Algorithms

[pdf] Konda and Tsitsiklis (2003)

⭐ Temporal Difference Learning and TD-Gammon

[pdf] Gerald Tesauro (1995)
Table

⭐ Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)

[pdf] Ronald J. Williams (1992); Northeastern University

Surveys

A Brief Survey of Deep Reinforcement Learning

[arXiv] Arulkumaran et al (2017).

Books

⭐ Reinforcement Learning: An Introduction (Complete Draft)

[pdf] Richard S. Sutton and Andrew G. Barto (2018)

Miscellaneous

How to Read a Paper

[pdf] S. Keshav (2007); University of Waterloo

ArXiv Sanity Preserver: A recommender system for searching papers that are published on arXiv.

http://www.arxiv-sanity.com/

GitXiv: A recommender system for searching papers and their supplementary materials (if available).

http://www.gitxiv.com/

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
assets		assets
icons		icons
notes		notes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Reinforcement Learning

Year 2018

Year 2017

Year 2016

Year 2015

Year 2013

2012 and earlier

Surveys

Books

Miscellaneous

About

Uh oh!

Releases

Packages

License

nomadsarmat/reinforcement-learning-notes

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning

Year 2018

Year 2017

Year 2016

Year 2015

Year 2013

2012 and earlier

Surveys

Books

Miscellaneous

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages