Skip to content

nomadsarmat/reinforcement-learning-notes

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

A constantly evolving list of Reinforcement Learning papers, notes, books etc.

Glossary:

  • πŸš€ - state-of-the-art method in current domain at the moment of paper publication.
  • ⭐ - valuable paper.
Click here to see Domain Tag icon descriptions.
  • model - Model-based RL (Model-based).
  • marl - Multi-Agent RL (MARL).
  • sp - Self-Play.
  • evo - Evolutionary & Genetic Algorithms (Evolution).
  • gener - Generalization across environments (Generalization).
  • nn - Neural Networks & Optimizers (NN).
  • robot - Manipulation tasks (Manipulator).
  • loco - Locomotion: MuJoCo, Roboschool, etc (Locomotion)
  • maze - Mazes and Labyrinths (Maze).
  • plan - Strategy Planning Problems (Planning).
  • transfer - Transfer learning (Transfer).
  • irl - Inverse Reinforcement Learning (IRL)
  • meta - Meta-Learning
  • sparse - Sparse Reward Problems and/or Montezuma's Revenge (Sparse)
  • atari - Atari game (Atari).
  • table - Table games (Table).
  • doom - Doom game (Doom).
  • sc - Starcraft game (Starcraft).
  • go - Go game (Go).

Deep Reinforcement Learning

Year 2018

πŸš€ RUDDER: Return Decomposition for Delayed Rewards

  • [arXiv] [code] Arjona-Medina et al.; Johannes Kepler University Linz
  • sparse atari Sparse, Atari

Relational Deep Reinforcement Learning

  • [arXiv] Zambaldi et al.; DeepMind
  • plan sc Planning, Starcraft

Deep Curiosity Search: Intra-Life Exploration Improves Performance on Challenging Deep Reinforcement Learning Problems

  • [arXiv] Stanton and Clune; University of Wyoming
  • sparse Sparse

AutoAugment: Learning Augmentation Policies from Data

  • [arXiv] Cubuk et al.; Google Brain
  • nn NN

Playing Atari with Six Neurons

  • [arXiv] Cucci et al.; University of Fribourg, NYU
  • atari Atari

⭐ World Models

  • [arXiv] [blog] Ha and Schmidhuber; IDSIA, Google Brain, NNAISENSE
  • model doom Model-based, Doom

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

  • [arXiv] Chrabaszcz et al.; University of Freiburg
  • evo atari Evolution, Atari

πŸš€ IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

  • [arXiv] Such et al.; Uber AI Labs
  • atari maze Atari, Maze

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

  • [arXiv] Finn et al.; UC Berkeley
  • irl robot IRL, Manipulator

πŸš€ Regularized Evolution for Image Classifier Architecture Search

  • [arXiv] Real et al.; Google Brain
  • evo nn Evo, NN

Year 2017

⭐ Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

  • [arXiv] Such et al.; Uber AI Labs
  • loco atari Locomotion, Atari

⭐ Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

  • [arxiv] Silver et al.; DeepMind
  • sp plan table Self-Play, Planning, Table

⭐ Rainbow: Combining Improvements in Deep Reinforcement Learning (DQN improvements combined)

  • [arXiv] Hessel et al.; Deepmind
  • atari Atari

⭐ Meta Learning Shared Hierarchies

  • [arXiv] [blog] Frans et al.; OpenAI, Berkeley.
  • meta loco Locomotion, Meta-Learning

One-Shot Visual Imitation Learning via Meta-Learning

  • [arXiv] [pdf] Finn et al.; UC Berkeley, OpenAI
  • irl meta robot IRL, Meta-Learning, Manipulator

Learning with Opponent-Learning Awareness (LOLA)

  • [arXiv] [blog] Foerster et al.; OpenAI, Oxford, Berkeley, CMU
  • marl MARL

πŸš€ Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)

  • [arXiv] Wu et al.; University of Toronto, New York University
  • loco atari Locomotion, Atari

Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

πŸš€ Proximal Policy Optimization Algorithms (PPO)

  • [arXiv] [blog] Schulman et al.; OpenAI
  • atari loco Locomotion, Atari

πŸš€ Learning Transferable Architectures for Scalable Image Recognition

  • [arXiv] Zoph et al.; Google Brain
  • nn NN

Hybrid Reward Architecture for Reinforcement Learning (HRA)

  • [arXiv] van Seijen et al.; Microsoft Maluuba, McGill University
  • meta atari Meta-Learning, Atari

Parameter Space Noise for Exploration

  • [arXiv] Plappert et al.; OpenAI, Karlsruhe Institute of Technology
  • loco atari Locomotion, Atari

πŸš€ Mastering the Game of Go without Human Knowledge (AlphaGo Zero)

  • [pdf], [blog] Silver et al.; Deepmind
  • sp plan go table Self-Play, Planning, Go, Table

Neural Optimizer Search with Reinforcement Learning

  • [pdf] Bello et al.; Google Brain
  • nn NN

Asymmetric Actor Critic for Image-Based Robot Learning

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

  • [arXiv], [blog] Peng et al.; OpenAI, Berkeley
  • gener robot Generalization, Manipulator

A Deep Reinforcement Learning Chatbot

  • [arXiv] Serban et al.; MILA

Learning model-based planning from scratch

  • [arXiv], [blog] Pascanu et al.; Google DeepMind
  • model loco Model-based, Locomotion

⭐ Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)

  • [arXiv] [blog] Weber et al.; DeepMind
  • plan transfer atari Planning, Transfer, Atari

Distral: Robust Multitask Reinforcement Learning

  • [arXiv] Teh et al.; DeepMind
  • transfer maze Transfer, Maze

Emergence of Locomotion Behaviours in Rich Environments

  • [arXiv] [blog] Heess et al.; DeepMind
  • loco Locomotion

Programmable Agents

  • [arXiv] Denil et al.; DeepMind
  • loco Locomotion

⭐ Evolution Strategies as a Scalable Alternative to Reinforcement Learning

  • [arXiv] Salimans et al.; OpenAI
  • atari Atari

Neural Episodic Control

  • [arXiv] Pritzel et al.; DeepMind
  • atari Atari

Year 2016

The Predictron: End-To-End Learning and Planning

  • [arXiv] Silver et al.; DeepMind
  • model plan maze Model-based, Planning, Maze

RL2: Fast Reinforcement Learning via Slow Reinforcement Learning

  • [arXiv] Duan et al.; Berkeley, OpenAI
  • meta maze Meta-Learning, Maze

Neural Architecture Search with Reinforcement Learning

  • [arXiv] B. Zoph and Quoc V. Le; Google Brain; ICLR.
  • nn NN

Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)

  • [arXiv] Jaderberg et al.; Google DeepMind
  • πŸ“ Notes
  • loco atari maze Locomotion, Atari, Maze

πŸš€ Learning to act by predicting the future (VizDoom 2016 Full DM Winner)

  • [arXiv] Dosovitskiy, Koltun; Intel Labs
  • maze doom Maze, Doom

Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games

  • [arXiv] Peng et al.; Alibaba Group, University College London
  • marl sc MARL, Starcraft

Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)

  • [arXiv] Lample, Chaplot; Carnegie Mellon University
  • maze doom Maze, Doom

[RTS:SC] Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

  • [arXiv] Usunier et al.; Facebook AI Research
  • sc Starcraft

πŸš€ Asynchronous Methods for Deep Reinforcement Learning (A3C)

  • [arXiv] Mnih et al.; DeepMind
  • πŸ“ Notes
  • loco atari maze Locomotion, Atari, Maze

Year 2015

⭐ Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)

  • [arXiv] Wang et al.; DeepMind
  • atari Atari

Prioritized Experience Replay

  • [arXiv] Schaul et al.; DeepMind
  • πŸ“ Notes
  • atari Atari

⭐ Deep Reinforcement Learning with Double Q-learning (Double DQN)

  • [arXiv] Hasselt et al.; DeepMind
  • atari Atari

High-dimensional continuous control using generalized advantage estimation

  • [arXiv] Schulman et al.; Berkeley
  • loco Locomotion

⭐ Trust Region Policy Optimization (TRPO)

  • [arXiv] Schulman et al.; UC Berkeley
  • atari maze loco Atari, Maze, Locomotion

πŸš€ Human-level control through deep reinforcement learning (DQN)

  • [Nature] [pdf] Mnih et al.; Google Deepmind
  • πŸ“ Notes
  • atari Atari

Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)

  • [Nature], [reddit] Silver et al.; Deepmind, Google
  • sp plan go table Self-Play, Planning, Go, Table

Year 2013

πŸš€ Playing Atari with Deep Reinforcement Learning (DQN)

  • [arXiv] Mnih et al.; DeepMind Technologies
  • atari Atari

Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning

  • [pdf] Koutnik et al.; IDSIA, USI-SUPSI
  • evo Evolution

2012 and earlier

Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction

  • [pdf] Sutton et al. (2011); University of Alberta, McGill University
  • robot loco Manipulator, Locomotion

⭐ Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion

  • [pdf] Kohl and Stone (2004); The University of Texas at Austin
  • robot loco Manipulator, Locomotion

⭐ Autonomous helicopter flight via reinforcement learning

  • [pdf] Ng et al. (2004); Stanford, Berkeley
  • robot Manipulator

⭐ Actor-Critic Algorithms

  • [pdf] Konda and Tsitsiklis (2003)

⭐ Temporal Difference Learning and TD-Gammon

  • [pdf] Gerald Tesauro (1995)
  • table Table

⭐ Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE)

  • [pdf] Ronald J. Williams (1992); Northeastern University

Surveys

A Brief Survey of Deep Reinforcement Learning

  • [arXiv] Arulkumaran et al (2017).

Books

⭐ Reinforcement Learning: An Introduction (Complete Draft)

  • [pdf] Richard S. Sutton and Andrew G. Barto (2018)

Miscellaneous

How to Read a Paper

  • [pdf] S. Keshav (2007); University of Waterloo

ArXiv Sanity Preserver: A recommender system for searching papers that are published on arXiv.

GitXiv: A recommender system for searching papers and their supplementary materials (if available).

About

Survey on (Deep) Reinforcement Learning papers and algorithms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published