Merge pull request #164 from Toni-SM/develop

Merge develop
Toni-SM · Jun 24, 2024 · 636936f · 636936f
2 parents 631613a + e2d86be
commit 636936f
Show file tree

Hide file tree

Showing 164 changed files with 1,530 additions and 1,246 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yaml b/.github/ISSUE_TEMPLATE/bug_report.yaml
@@ -30,11 +30,14 @@ body:
     description: The skrl version can be obtained with the command `pip show skrl`.
     options:
       - ---
+      - 1.2.0
+      - 1.1.0
       - 1.0.0
       - 1.0.0-rc2
       - 1.0.0-rc1
       - 0.10.2 or 0.10.1
       - 0.10.0 or earlier
+      - develop branch
   validations:
     required: true
 - type: input

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,20 @@
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## [1.2.0] - 2024-06-23
+### Added
+- Define the `environment_info` trainer config to log environment info (PyTorch implementation)
+- Add support to automatically compute the write and checkpoint intervals and make it the default option
+- Single forward-pass in shared models
+- Distributed multi-GPU and multi-node learning (PyTorch implementation)
+
+### Changed
+- Update Orbit-related source code and docs to Isaac Lab
+
+### Fixed
+- Move the batch sampling inside gradient step loop for DDPG and TD3
+- Perform JAX computation on the selected device
+
 ## [1.1.0] - 2024-02-12
 ### Added
 - MultiCategorical mixin to operate MultiDiscrete action spaces

diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
 <h2 align="center" style="border-bottom: 0 !important;">SKRL - Reinforcement Learning library</h2>
 <br>
 
-**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io)) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev) / Farama [Gymnasium](https://gymnasium.farama.org) and [DeepMind](https://github.com/deepmind/dm_env) and other environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/), [NVIDIA Isaac Orbit](https://isaac-orbit.github.io/orbit/index.html) and [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/isaacsim/latest/tutorial_gym_isaac_gym.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.
+**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io)) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev) / Farama [Gymnasium](https://gymnasium.farama.org) and [DeepMind](https://github.com/deepmind/dm_env) and other environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/), [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/isaacsim/latest/tutorial_gym_isaac_gym.html) and [NVIDIA Isaac Lab](https://isaac-sim.github.io/IsaacLab/index.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.
 
 <br>
 

diff --git a/...urce/_static/imgs/example_isaac_orbit.png → .../source/_static/imgs/example_isaaclab.png b/...urce/_static/imgs/example_isaac_orbit.png → .../source/_static/imgs/example_isaaclab.png
diff --git a/docs/source/api/agents/a2c.rst b/docs/source/api/agents/a2c.rst
@@ -232,6 +232,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/amp.rst b/docs/source/api/agents/amp.rst
@@ -237,6 +237,10 @@ Support for advanced features is described in the next table
       - \-
       - .. centered:: :math:`\square`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/cem.rst b/docs/source/api/agents/cem.rst
@@ -175,6 +175,10 @@ Support for advanced features is described in the next table
       - \-
       - .. centered:: :math:`\square`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - \-
+      - .. centered:: :math:`\square`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/ddpg.rst b/docs/source/api/agents/ddpg.rst
@@ -47,10 +47,10 @@ Learning algorithm
 
 |
 | :literal:`_update(...)`
-| :green:`# sample a batch from memory`
-| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 | :green:`# gradient steps`
 | **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
+|     :green:`# sample a batch from memory`
+|     [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 |     :green:`# compute target values`
 |     :math:`a' \leftarrow \mu_{\theta_{target}}(s')`
 |     :math:`Q_{_{target}} \leftarrow Q_{\phi_{target}}(s', a')`
@@ -236,6 +236,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/ddqn.rst b/docs/source/api/agents/ddqn.rst
@@ -184,6 +184,10 @@ Support for advanced features is described in the next table
       - \-
       - .. centered:: :math:`\square`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/dqn.rst b/docs/source/api/agents/dqn.rst
@@ -184,6 +184,10 @@ Support for advanced features is described in the next table
       - \-
       - .. centered:: :math:`\square`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/ppo.rst b/docs/source/api/agents/ppo.rst
@@ -248,6 +248,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/rpo.rst b/docs/source/api/agents/rpo.rst
@@ -285,6 +285,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/sac.rst b/docs/source/api/agents/sac.rst
@@ -244,6 +244,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/td3.rst b/docs/source/api/agents/td3.rst
@@ -47,10 +47,10 @@ Learning algorithm
 
 |
 | :literal:`_update(...)`
-| :green:`# sample a batch from memory`
-| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 | :green:`# gradient steps`
 | **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
+|     :green:`# sample a batch from memory`
+|     [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 |     :green:`# target policy smoothing`
 |     :math:`a' \leftarrow \mu_{\theta_{target}}(s')`
 |     :math:`noise \leftarrow \text{clip}(` :guilabel:`smooth_regularization_noise` :math:`, -c, c) \qquad` with :math:`c` as :guilabel:`smooth_regularization_clip`
@@ -258,6 +258,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/trpo.rst b/docs/source/api/agents/trpo.rst
@@ -282,6 +282,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Distributed
+      - Single Program Multi Data (SPMD) multi-GPU
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
 
 .. raw:: html
 

diff --git a/docs/source/api/config/frameworks.rst b/docs/source/api/config/frameworks.rst
@@ -7,6 +7,65 @@ Configurations for behavior modification of Machine Learning (ML) frameworks.
 
     <br><hr>
 
+PyTorch
+-------
+
+PyTorch specific configuration
+
+.. raw:: html
+
+    <br>
+
+API
+^^^
+
+.. py:data:: skrl.config.torch.device
+    :type: torch.device
+    :value: "cuda:${LOCAL_RANK}" | "cpu"
+
+    Default device
+
+    The default device, unless specified, is ``cuda:0`` (or ``cuda:LOCAL_RANK`` in a distributed environment) if CUDA is available, ``cpu`` otherwise
+
+.. py:data:: skrl.config.local_rank
+    :type: int
+    :value: 0
+
+    The rank of the worker/process (e.g.: GPU) within a local worker group (e.g.: node)
+
+    This property reads from the ``LOCAL_RANK`` environment variable (``0`` if it doesn't exist).
+    See `torch.distributed <https://pytorch.org/docs/stable/distributed.html>`_ for more details
+
+.. py:data:: skrl.config.rank
+    :type: int
+    :value: 0
+
+    The rank of the worker/process (e.g.: GPU) within a worker group (e.g.: across all nodes)
+
+    This property reads from the ``RANK`` environment variable (``0`` if it doesn't exist).
+    See `torch.distributed <https://pytorch.org/docs/stable/distributed.html>`_ for more details
+
+.. py:data:: skrl.config.world_size
+    :type: int
+    :value: 1
+
+    The total number of workers/process (e.g.: GPUs) in a worker group (e.g.: across all nodes)
+
+    This property reads from the ``WORLD_SIZE`` environment variable (``1`` if it doesn't exist).
+    See `torch.distributed <https://pytorch.org/docs/stable/distributed.html>`_ for more details
+
+.. py:data:: skrl.config.is_distributed
+    :type: bool
+    :value: False
+
+    Whether if running in a distributed environment
+
+    This property is ``True`` when the PyTorch's distributed environment variable ``WORLD_SIZE > 1``
+
+.. raw:: html
+
+    <br>
+
 JAX
 ---
 

diff --git a/docs/source/api/envs.rst b/docs/source/api/envs.rst
@@ -7,16 +7,16 @@ Environments
     Wrapping (single-agent) <envs/wrapping>
     Wrapping (multi-agents) <envs/multi_agents_wrapping>
     Isaac Gym environments <envs/isaac_gym>
-    Isaac Orbit environments <envs/isaac_orbit>
     Omniverse Isaac Gym environments <envs/omniverse_isaac_gym>
+    Isaac Lab environments <envs/isaaclab>
 
 The environment plays a fundamental and crucial role in defining the RL setup. It is the place where the agent interacts, and it is responsible for providing the agent with information about its current state, as well as the rewards/penalties associated with each action.
 
 .. raw:: html
 
     <br><hr>
 
-Grouped in this section you will find how to load environments from NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym with a simple function.
+Grouped in this section you will find how to load environments from NVIDIA Isaac Gym, Omniverse Isaac Gym and Isaac Lab with a simple function.
 
 In addition, you will be able to :doc:`wrap single-agent <envs/wrapping>` and :doc:`multi-agent <envs/multi_agents_wrapping>` RL environment interfaces.
 
@@ -29,10 +29,10 @@ In addition, you will be able to :doc:`wrap single-agent <envs/wrapping>` and :d
     * - :doc:`Isaac Gym environments <envs/isaac_gym>`
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`
-    * - :doc:`Isaac Orbit environments <envs/isaac_orbit>`
+    * - :doc:`Omniverse Isaac Gym environments <envs/omniverse_isaac_gym>`
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`
-    * - :doc:`Omniverse Isaac Gym environments <envs/omniverse_isaac_gym>`
+    * - :doc:`Isaac Lab environments <envs/isaaclab>`
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`
 
@@ -57,10 +57,10 @@ In addition, you will be able to :doc:`wrap single-agent <envs/wrapping>` and :d
     * - Isaac Gym (previews)
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`
-    * - Isaac Orbit
+    * - Omniverse Isaac Gym |_5| |_5| |_5| |_5| |_2|
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`
-    * - Omniverse Isaac Gym |_5| |_5| |_5| |_5| |_2|
+    * - Isaac Lab
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`
     * - PettingZoo

diff --git a/docs/source/api/envs/isaac_orbit.rst b/docs/source/api/envs/isaac_orbit.rst