Education materials for a Reinforcement Learning Course. This course aims to go through some of the base concept of reinforcement learnig. Starting from the K-armed bandit problem, introducing the Markov Decision Process (MDP). Implementing Dynamic Programming, Monte Carlo and Temporal Differenc algorithms in a practical way. The core material follows the structure of the Sutton-Barton book
The notebooks can be run directly online in google colab or offline in a docker container on a local machine. For the docker container installation see the guide.
Lecture 01 - Introduction
Lecture 02 - Multi-armed Bandit
Lecture 03 - Markov Decision Processes
Lecture 04 - Dynamic Programming
Lecture 05 - Monte Carlo Methods
Lecture 06 - Temporal Difference
Lecture 07 - Temporal Difference
Lecture 08 - N-step Bootstarpping
Lecture 09 - Planning and Learning
Lecture 10 - Function Approximation
Lecture 11 - Eligibility Traces
Lab 01: K-armed Bandit
Lab 01 solution
Lab 02: Markov Decision Process - Gymnasium Basics
Lab 03: Dynamic Programming - Gambler's problem
Lab 03 solution
Lab 04: Monte Carlo - Blackjack
Lab 04 solution
Lab 05: Temporal Difference - Frozen Lake
Lab 05 solution
Lab 06: N-step TD - Taxi
Lab 06 solution
Lab 07: Planning and Learning - Maze
Lab 07 solution
Lab 08: Function Approximation - Tile Coding
Lab 08 solution