Skip to content

Latest commit

 

History

History
109 lines (48 loc) · 4.13 KB

rl.md

File metadata and controls

109 lines (48 loc) · 4.13 KB

Reinforcement Learning

Survey

[2018] Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

[2021] A Survey of Zero-shot Generalisation in Deep Reinforcement Learning

[2023] Diffusion Models for Reinforcement Learning: A Survey

[2023] A Tutorial Introduction to Reinforcement Learning

Algorithm

[2015] Continuous control with deep reinforcement learning

[2015] Trust Region Policy Optimization

[2016] Asynchronous Methods for Deep Reinforcement Learning

[2016] Sample Efficient Actor-Critic with Experience Replay

[2017] Proximal Policy Optimization Algorithms

[2018] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

[2018] Soft Actor-Critic Algorithms and Applications

[2018] Addressing Function Approximation Error in Actor-Critic Methods

[2018] IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Transformers for RL

[2019] Stabilizing Transformers for Reinforcement Learning

Temporal Processing

[2017] Time Limits in Reinforcement Learning

[2018] Learning Temporal Point Processes via Reinforcement Learning

[2019] Making Deep Q-learning methods robust to time discretization

[2020] Thinking While Moving: Deep Reinforcement Learning with Concurrent Control

Various Paradigms of RL

[2019] Real-Time Reinforcement Learning

[2021] Maximum Entropy RL (Provably) Solves Some Robust RL Problems

[2022] Contrastive Learning as Goal-Conditioned Reinforcement Learning

[2023] Maximum diffusion reinforcement learning

[2024] Privileged Sensing Scaffolds Reinforcement Learning

Variance Reduction

[2016] Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Sample Efficiency

[2016] Q(λ) with Off-Policy Corrections

[2016] Safe and Efficient Off-Policy Reinforcement Learning

[2016] Sample Efficient Actor-Critic with Experience Replay

[2020] AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

Theory

[2018] Deep Reinforcement Learning and the Deadly Triad

[2022] A Theory of Abstraction in Reinforcement Learning

[2017] Deep Reinforcement Learning that Matters

[2018] The Mirage of Action-Dependent Baselines in Reinforcement Learning

[2018] Deep reinforcement learning doesn’t work yet

[2020] Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

[2020] Leverage the Average: an Analysis of KL Regularization in RL

[2021] How to Train Your Robot with Deep Reinforcement Learning; Lessons We've Learned

[2021] What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study

[2022] The Primacy Bias in Deep Reinforcement Learning

[2024] Addressing Signal Delay in Deep Reinforcement Learning