A list of learning resources for fundamental reinforcement learning.
- Reinforcement Learning: An Introduction. Rich Sutton and Andrew Barto. [PDF]
- Algorithms for Reinforcement Learning. Csaba Szepesvari. [PDF]
- Reinforcement Learning: Theory and Algorithms. Alekh Agarwal, Nan Jiang, Sham Kakade. [PDF]
- Neuro-Dynamic Programming. Dimitri P. Bertsekas and John Tsitsiklis.
- Markov Decision Processes: Discrete Stochastic Dynamic Programming. Martin Puterman.
- CS 598 Statistical Reinforcement Learning. Nan Jiang. [link]
- Approximate Dynamic Programming. Ben Van Roy. [link]
- Mathematical Techniques for Machine Learning. Prakash Panangaden. [link]
- Unifying Task Specification in Reinforcement Learning. Martha White. [PDF]
- Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach. Silviu Pitis. [PDF]
- Scherrer B. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view. [PDF]
- Schoknecht R. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation. Advances in Neural Information Processing Systems. 2015 [PDF]
- Scherrer B. Approximate Policy Iteration Schemes: A Comparison.
- Farahmand A, Szepesvári C, Munos R. Error Propagation for Approximate Policy and Value Iteration. Advances in Neural Information Processing Systems.
- Munos R, Szepesvari C. Finite-Time Bounds for Fitted Value Iteration. 2008.
- Munos R. Performance Bounds in
$L_p$ ‐norm for Approximate Value Iteration. SIAM J Control Optim. 2007. - Antos A, Szepesvari C, Munos R. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. 2006.
- Munos R. Error Bounds for Approximate Value Iteration. 2005
- Munos R. Error Bounds for Approximate Policy Iteration. 2003.
- Williams R, Baird LC. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions. 1993.
- A Linearly Relaxed Approximate Linear Program for Markov Decision Processes. [PDF]
- Learning to predict by the methods of temporal differences. Rich Sutton. [PDF]
- TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning. Artemij Amiranashvili, et al. [PDF]
- Convergence of Stochastic Iterative Dynamic Programming Algorithms. [PDF]
- Q-Learning. Christopher Watkins and Peter Dayan. [PDF]
- Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms. [PDF]
- Reinforcement Learning with Function Approximation Converges to a Region. [PDF]
- Chattering in SARSA. [PDF]
- Analysis of temporal-diffference learning with function approximation. John Tsitsiklis and Benjamin Van Roy. [PDF]
- An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning. [PDF]
- Least-Squares Methods in Reinforcement Learning for Control. Michail Lagoudakis, Ronald Parr, and Michael Littman. [PDF]
- Linear Least-Squares Algorithms for Temporal Difference Learning. Steven Bradtke, and Andrew Barto. [PDF]
- Towards Characterizing Divergence in Deep Q-Learning. Joshua Achiam, Ethan Knight, and Pieter Abbeel.
- Diagnosing Bottlenecks in Deep Q-learning Algorithms. Justin Fu, et al.
- Deep Reinforcement Learning and the Deadly Triad. Hado van Hasselt, et al. [PDF]
- A Theoretical Analysis of Deep Q-Learning. Zhuoran Yang, et al. [PDF]
- A Tutorial on Fisher Information. Alexander Ly, et al. [PDF]