Deep Reinforcement Learning

Summary

This repository contains implementations of the majority of the core algorithms in Deep Reinforcement Learning (DRL) as listed below. All the implementations are in Python and based on PyTorch for models, optimizers and training in general. The algorithms can be used with any environment in Gymnasium or other environments that follow the same API of Gymnasium. Each algorithm is implemented standalone and is therefore independent of the implementation of the others even when sharing a lot of overlapping ideas. This was done for easier code readibility for each algorithm. The library is organized into different directories, each encompassing a specific class of DRL algorithms. These currently include the following:

Core: Majority of the main on and off-policy DRL algorithms listed in details below.
Exploration: Algorithms aimed at enhancing the DRL agent's ability at exploring its environment, typically aimed at environments with a sparse reward signal. However, in this context, this also includes algorithms for safe exploration.

Features

MLP, CNN and CNN-LSTM (Recurrent) Policies
TensorBoard integration for logging
Parallel vector environments
Nvidia GPU support
Model saving, checkpointing and ability to start training from an existing model's parameters
Environment saving and loading for both base or arbitrarily wrapped environments
Policy testing using saved environments and models in addition to easy video recording
Support for learning rate scheduling
Parameter sharing for CNN-based architectures (except for TRPO)
Return normalization and action rescaling to [-1, 1] for Box action spaces
Flexible sequence lengths for recurrent policies with adjustable 'burn-in' periods and hidden state management for uninterrupted rollouts
Extremely customizable algorithm and architecture configurations through scripting or the terminal

Currently Available Algorithms and Future Developments

Core

Deep Q-Learning Network (DQN)
- Basic DQNs
- Double DQNs (DDQN)
- Dueling DQNs (DDQN)
- DQNS with Prioritized Experience Replay (PER)
Advantage Actor-Critic (A2C)
Trust Region Policy Optimization (TRPO)
Proximal Policy Optimization (PPO)
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed Deep Deterministic Policy Gradient (TD3)
Soft Actor-Critic (SAC)

Exploration

Curiosity-driven Exploration by Self-supervised Prediction (CDESP)
Hindsight Experience Replay (HER)
Adaptive Policy ReguLarization (APRL)

Hierarchical RL

To do:
- Hierarchical Actor-Critic (HAC)
- Double Actor-Critic (DAC)

Imitation Learning

Offline RL

Physics & Model-Based RL

Other

To do:
- Adverserial RL
- Meta RL
- Policy Distillation

Acknowledgments

OpenAI's Spinning Up which was my main source of information (in addition to the original papers) for learning about the core algorithms and Gym.
Stable-Baselines3 (SB3) mainly for clearing up confusions regarding parameter sharing in on-policy algorithms and as a guide for default hyperparamter values.
The amazing blogpost: The 37 Implementation Details of Proximal Policy Optimization by Huang, et al. which dives into all the important details regarding PPO's implementation.
The Generalized Advantage Estimation (GAE) paper and the Recurrent Replay Distributed DQN (R2D2) paper which cleared many confusions about recurrent policies in general.
The amazing book: Dive into Deep Learning by Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J. for Deep Learning using PyTorch.
Note: All correspoding papers are linked with their algorithms above. Also, for SAC the link correponds to the 2nd paper that the implementation is based on and that describes automatic temperature coefficient adjustment using Dual Gradient Descent. Meanwhile, the original SAC paper can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
core		core
exploration		exploration
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning

Summary

Features

Currently Available Algorithms and Future Developments

Core

Exploration

Hierarchical RL

Imitation Learning

Offline RL

Physics & Model-Based RL

Other

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning

Summary

Features

Currently Available Algorithms and Future Developments

Core

Exploration

Hierarchical RL

Imitation Learning

Offline RL

Physics & Model-Based RL

Other

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages