A constantly evolving list of Reinforcement Learning papers, notes, books etc.
Glossary:
- π - state-of-the-art method in current domain at the moment of paper publication.
- β - valuable paper.
Click here to see Domain Tag icon descriptions.
- Model-based RL (Model-based).
- Multi-Agent RL (MARL).
- Self-Play.
- Evolutionary & Genetic Algorithms (Evolution).
- Generalization across environments (Generalization).
- Neural Networks & Optimizers (NN).
- Manipulation tasks (Manipulator).
- Locomotion: MuJoCo, Roboschool, etc (Locomotion)
- Mazes and Labyrinths (Maze).
- Strategy Planning Problems (Planning).
- Transfer learning (Transfer).
- Inverse Reinforcement Learning (IRL)
- Meta-Learning
- Sparse Reward Problems and/or Montezuma's Revenge (Sparse)
- Atari game (Atari).
- Table games (Table).
- Doom game (Doom).
- Starcraft game (Starcraft).
- Go game (Go).
π RUDDER: Return Decomposition for Delayed Rewards
Relational Deep Reinforcement Learning
- [arXiv] Zambaldi et al.; DeepMind
Planning, Starcraft
Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems
- [arXiv] Stanton and Clune; University of Wyoming
Sparse
AutoAugment: Learning Augmentation Policies from Data
- [arXiv] Cubuk et al.; Google Brain
NN
Playing Atari with Six Neurons
- [arXiv] Cucci et al.; University of Fribourg, NYU
Atari
β World Models
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
- [arXiv] Chrabaszcz et al.; University of Freiburg
Evolution, Atari
π IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
- [arXiv] Such et al.; Uber AI Labs
Atari, Maze
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
- [arXiv] Finn et al.; UC Berkeley
IRL, Manipulator
π Regularized Evolution for Image Classifier Architecture Search
- [arXiv] Real et al.; Google Brain
Evo, NN
β Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
- [arXiv] Such et al.; Uber AI Labs
Locomotion, Atari
β Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
- [arxiv] Silver et al.; DeepMind
Self-Play, Planning, Table
β Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)
- [arXiv] Hessel et al.; Deepmind
Atari
β Meta Learning Shared Hierarchies
One-Shot Visual Imitation Learning via Meta-Learning
Learning with Opponent-Learning Awareness (LOLA)
π Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)
- [arXiv] Wu et al.; University of Toronto, New York University
Locomotion, Atari
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
π Proximal Policy Optimization Algorithms (PPO)
π Learning Transferable Architectures for Scalable Image Recognition
- [arXiv] Zoph et al.; Google Brain
NN
Hybrid Reward Architecture for Reinforcement Learning (HRA)
- [arXiv] van Seijen et al.; Microsoft Maluuba, McGill University
Meta-Learning, Atari
Parameter Space Noise for Exploration
- [arXiv] Plappert et al.; OpenAI, Karlsruhe Institute of Technology
Locomotion, Atari
π Mastering the Game of Go without Human Knowledge (AlphaGo Zero)
Neural Optimizer Search with Reinforcement Learning
- [pdf] Bello et al.; Google Brain
NN
Asymmetric Actor Critic for Image-Based Robot Learning
- [arXiv], [official blog post] Pinto et al.; OpenAI, CMU
Generalization, Manipulator
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
A Deep Reinforcement Learning Chatbot
- [arXiv] Serban et al.; MILA
Learning model-based planning from scratch
β Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)
Distral: Robust Multitask Reinforcement Learning
- [arXiv] Teh et al.; DeepMind
Transfer, Maze
Emergence of Locomotion Behaviours in Rich Environments
Programmable Agents
- [arXiv] Denil et al.; DeepMind
Locomotion
β Evolution Strategies as a Scalable Alternative to Reinforcement Learning
- [arXiv] Salimans et al.; OpenAI
Atari
Neural Episodic Control
- [arXiv] Pritzel et al.; DeepMind
Atari
The Predictron: End-To-End Learning and Planning
- [arXiv] Silver et al.; DeepMind
Model-based, Planning, Maze
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
- [arXiv] Duan et al.; Berkeley, OpenAI
Meta-Learning, Maze
Neural Architecture Search with Reinforcement Learning
- [arXiv] B. Zoph and Quoc V. Le; Google Brain; ICLR.
NN
Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)
π Learning to act by predicting the future (VizDoom 2016 Full DM Winner)
- [arXiv] Dosovitskiy, Koltun; Intel Labs
Maze, Doom
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games
- [arXiv] Peng et al.; Alibaba Group, University College London
MARL, Starcraft
Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)
- [arXiv] Lample, Chaplot; Carnegie Mellon University
Maze, Doom
[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks
- [arXiv] Usunier et al.; Facebook AI Research
Starcraft
π Asynchronous Methods for Deep Reinforcement Learning (A3C)
β Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)
- [arXiv] Wang et al.; DeepMind
Atari
Prioritized Experience Replay
β Deep Reinforcement Learning with Double Q-learning (Double DQN)
- [arXiv] Hasselt et al.; DeepMind
Atari
High-dimensional continuous control using generalized advantage estimation
- [arXiv] Schulman et al.; Berkeley
Locomotion
β Trust Region Policy Optimization (TRPO)
- [arXiv] Schulman et al.; UC Berkeley
Atari, Maze, Locomotion
π Human-level control through deep reinforcement learning (DQN)
Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)
π Playing Atari with Deep Reinforcement Learning (DQN)
- [arXiv] Mnih et al.; DeepMind Technologies
Atari
Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning
- [pdf] Koutnik et al.; IDSIA, USI-SUPSI
Evolution
Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction
- [pdf] Sutton et al. (2011); University of Alberta, McGill University
Manipulator, Locomotion
β Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion
- [pdf] Kohl and Stone (2004); The University of Texas at Austin
Manipulator, Locomotion
β Autonomous helicopter flight via reinforcement learning
- [pdf] Ng et al. (2004); Stanford, Berkeley
Manipulator
β Actor-Critic Algorithms
- [pdf] Konda and Tsitsiklis (2003)
β Temporal Difference Learning and TD-Gammon
- [pdf] Gerald Tesauro (1995)
Table
β Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)
- [pdf] Ronald J. Williams (1992); Northeastern University
A Brief Survey of Deep Reinforcement Learning
- [arXiv] Arulkumaran et al (2017).
β Reinforcement Learning: An Introduction (Complete Draft)
- [pdf] Richard S. Sutton and Andrew G. Barto (2018)
How to Read a Paper
- [pdf] S. Keshav (2007); University of Waterloo
ArXiv Sanity Preserver: A recommender system for searching papers that are published on arXiv.
GitXiv: A recommender system for searching papers and their supplementary materials (if available).