DRL codes

Collection of exercises and minimal DRL implementations used for teaching. All algorithms are coded using Torch. Note that these algorithms are for pedagogical purposes, and hence, they include minimal implementation tricks, as the purpose of this code is to have a clear view on how DRL codes are implemented (thus, the performance of these codes may be low compared to state-of-the-art implementations as Stable Baselines 3).

Basic methods

The following examples and exercises correspond to some basic ideas needed to understand the basics of RL:

Multi-armed bandit, which is devoted to implementing the epsilon-greedy algorithm, and how to use it to solve the multi-armed bandit problem.
Stationary distribution of an MP, which is devoted to computing the stationary distribution of a Markov Process (MP).
Gym interface example, which is devoted to showing how to use the Gym interface to create your own RL environments.
Bellman fixed point for PE, which is devoted to implementing the Bellman fixed point equations for policy evaluation (PE) in a simple MDP.
Bellman fixed point for PE in a Random Walk, is similar to the previous code, but using a Random Walk with terminal states as the MDP.
Bellman random Policy Search, which is devoted to finding the optimal Bellman Policy using random search.
Bellman random Policy Search in the Cliff, which is similar to the previous code, but using the Cliff problem as the MDP.
Optimal policy using Bellman Equations, which is devoted to finding the optimal policy by solving the Bellman Equations in a simple case of discrete actions.
The recycling robot, which is devoted to finding the optimal policy by solving the Bellman Equations.
MDP simple example, which is devoted to showing how to compute the basic elements of an MDP.

Classic RL

The following examples and exercises correspond to some classic RL algorithms, including iterative methods, tabular methods, and linear function approximation methods:

Policy Evaluation, which is devoted to implementing the policy evaluation algorithm for a simple MDP.
Policy Iteration, which is devoted to implementing the policy iteration algorithm for a simple MDP.
Value Iteration, which is devoted to implementing the value iteration algorithm for a simple MDP.
Grid World, which is devoted to review the iterative methods (PE, PI, VI) in a Grid World problem.
Every-visit Monte-Carlo, which is devoted to implementing the every-visit Monte-Carlo algorithm for a simple MDP.
Off-policy Monte-Carlo via Importance Sampling, which is devoted to implementing the off-policy Monte-Carlo algorithm using Importance Sampling for a simple MDP.
Monte Carlo and Temporary Difference in a simple MDP, which is an example devoted to implementing the Monte Carlo and Temporal Difference algorithms for a simple MDP.
Monte Carlo and Temporary Difference in the Cliff, which is an example devoted to implementing the Monte Carlo and Temporal Difference algorithms for the Cliff.
Monte Carlo and Temporary Difference in a random walk, which is devoted to implementing the Monte Carlo and Temporal Difference algorithms for a random walk.
SARSA and Q-learning in a simple MDP, which is an example devoted to implementing the SARSA and Q-learning algorithms for a simple MDP.
SARSA and Q-learning in the Cliff, which is devoted to implementing the SARSA and Q-learning algorithms for the Cliff problem .
SARSA and Q-learning in a random walk, which is devoted to implementing the SARSA and Q-learning algorithms for a random walk.
Feature basis for linear approximations, which is devoted to implementing a feature basis for a linear approximation.
Model-based prediction using linear approximations, which is devoted to implementing BPE (Bellman Projected Equation), a model-based prediction algorithm using linear approximations.
Model-free prediction using linear approximations, which is devoted to implementing LSTD, a model-free prediction algorithm using linear approximations.
Model-free control using linear approximations, which is devoted to implementing LSPI, a model-free control algorithm using linear approximations.
Linear approximation limits, which is an example devoted to showing the limits of linear approximations.

Model-free DRL

The following examples implement model-free DRL algorithms (all tested on the Cartpole problem):

DDQN (Double Deep Q-Networks)
VPG (Vanilla Policy Gradient)
A2C (Advantage Actor Critic)
TRPO (Trust Region Policy Optimization, note that in this case, we use the implementation of Stable Baselines 3 instead of providing an implementation to show a state-of-the-art library)
DDPG (Deep Deterministic Policy Gradient)

Model-based DRL

For model-based DRL, the only implemented example is AlphaZero (tested on tic-tac-toe).

PASD students guide

Link	Observations
Example 7.6	Code for the example in the slides
Example 8.1	Code for the example in the slides
Example 8.3	Code for the example in the slides
Example 8.5	Homework
Example 8.6	Homework
Example 8.7	Homework
Example 9.1	Code for the example in the slides
Example 9.2	Code for the example in the slides
Example 9.3	Code for the example in the slides
Example 9.4	Homework
Example 9.5	Homework
Example 9.6	Homework
Example 9.7	Homework
Example 9.8	Homework
Example 9.9	Code for the example in the slides
Example 9.10	Code for the example in the slides
Example 9.11	Code for the example in the slides
Example 9.12	Code for the example in the slides

REIL students guide

Link	Observations
Exercise 2.1	Exercise to be completed by the student
Exercise 3.2	Exercise to be completed by the student
Exercise 3.3	Exercise to be completed by the student
Exercise 3.4	Exercise to be completed by the student
Exercise 3.5	Exercise to be completed by the student
Exercise 3.6	Exercise to be completed by the student
Exercise 3.7	Exercise to be completed by the student
Exercise 4.1	Exercise to be completed by the student
Exercise 4.2	Exercise to be completed by the student
Exercise 4.3	Exercise to be completed by the student
Exercise 4.4	Exercise to be completed by the student
Exercise 5.1	Exercise to be completed by the student
Exercise 5.2	Exercise to be completed by the student
Exercise 5.3	Exercise to be completed by the student
Exercise 5.4	Exercise to be completed by the student
Exercise 5.5	Exercise to be completed by the student
Exercise 6.1	Exercise to be completed by the student
Exercise 6.2	Exercise to be completed by the student
Exercise 6.3	Exercise to be completed by the student
Exercise 6.4	Exercise to be completed by the student
Example 7.1	Code for the example in the slides
Example 7.2	Code for the example in the slides
Example 7.3	Code for the example in the slides
Example 7.4	Code for the example in the slides
Example 7.5	Code for the example in the slides
Example 7.6	Code for the example in the slides
Example 7.7	Code for the example in the slides

Execution in Google Colab

The recommended way of executing these codes is to use Google Colab. The simplest way of doing that is to navigate to the code you want to execute, and then replace github.com in the URL by githubtocolab.com.

A second option is to go to Colab, and in the Open options, select GitHub and add this repository.

And finally, you can also download the code and execute it in your own machine, by installing all required dependencies.

More websites

If you are interested in DRL and want to keep on learning, it might be worthy checking the following resources:

Spinning up in DRL is an OpenAI webpage devoted to give an in-depth introduction to DRL, as well as a set of papers to learn more advanced topics. Their documentation is well-written, and their code is also available and worth checking.
CleanRL is a project that has developed several implementations of DRL algorithms in a single file, in order to facilitate understanding. Their documentation is nice, and it is a code repository worth checking.
Stable Baselines 3 is a high-quality implementation of most DRL algorithms, that is highly recommended if you want to use state-of-the-art implementations of the most popular DRL algorithms. They are a solid alternative to OpenAI Baselines.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
basics		basics
classical_rl		classical_rl
model_based		model_based
model_free		model_free
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basics

basics

classical_rl

classical_rl

model_based

model_based

model_free

model_free

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

DRL codes

Basic methods

Classic RL

Model-free DRL

Model-based DRL

PASD students guide

REIL students guide

Execution in Google Colab

More websites

About

Releases

Packages

Languages

License

jparras/drl_classes

Folders and files

Latest commit

History

Repository files navigation

DRL codes

Basic methods

Classic RL

Model-free DRL

Model-based DRL

PASD students guide

REIL students guide

Execution in Google Colab

More websites

About

Resources

License

Stars

Watchers

Forks

Languages