Merge pull request #106 from Toni-SM/develop

Develop
Toni-SM · Aug 11, 2023 · 1a32596 · 1a32596
2 parents 00a2fd3 + 0000bdf
commit 1a32596
Show file tree

Hide file tree

Showing 335 changed files with 3,307 additions and 2,166 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,16 @@
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## [1.0.0-rc.2] - Unreleased
+### Added
+- Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
+- Time-limit (truncation) boostrapping in on-policy actor-critic agents
+- Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value
+
+### Changed
+- Structure environment loaders and wrappers file hierarchy coherently [**breaking change**]
+- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
+
 ## [1.0.0-rc.1] - 2023-07-25
 ### Added
 - JAX support (with Flax and Optax)

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@
 <h2 align="center" style="border-bottom: 0 !important;">SKRL - Reinforcement Learning library</h2>
 <br>
 
-**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev) / Farama [Gymnasium](https://gymnasium.farama.org) and [DeepMind](https://github.com/deepmind/dm_env) and other environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/), [NVIDIA Isaac Orbit](https://isaac-orbit.github.io/orbit/index.html) and [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/tutorial_gym_isaac_gym.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.
+**skrl** is an open-source modular library for Reinforcement Learning written in Python (on top of [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io)) and designed with a focus on modularity, readability, simplicity, and transparency of algorithm implementation. In addition to supporting the OpenAI [Gym](https://www.gymlibrary.dev) / Farama [Gymnasium](https://gymnasium.farama.org) and [DeepMind](https://github.com/deepmind/dm_env) and other environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/), [NVIDIA Isaac Orbit](https://isaac-orbit.github.io/orbit/index.html) and [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/isaacsim/latest/tutorial_gym_isaac_gym.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run.
 
 <br>
 

diff --git a/docs/source/_static/imgs/example_parallel.jpg b/docs/source/_static/imgs/example_parallel.jpg
diff --git a/docs/source/_static/imgs/model_categorical-dark.svg b/docs/source/_static/imgs/model_categorical-dark.svg
diff --git a/docs/source/_static/imgs/model_categorical-light.svg b/docs/source/_static/imgs/model_categorical-light.svg
diff --git a/docs/source/_static/imgs/model_deterministic-dark.svg b/docs/source/_static/imgs/model_deterministic-dark.svg
diff --git a/docs/source/_static/imgs/model_deterministic-light.svg b/docs/source/_static/imgs/model_deterministic-light.svg
diff --git a/docs/source/_static/imgs/model_gaussian-dark.svg b/docs/source/_static/imgs/model_gaussian-dark.svg
diff --git a/docs/source/_static/imgs/model_gaussian-light.svg b/docs/source/_static/imgs/model_gaussian-light.svg
diff --git a/docs/source/_static/imgs/model_multivariate_gaussian-dark.svg b/docs/source/_static/imgs/model_multivariate_gaussian-dark.svg
diff --git a/docs/source/_static/imgs/model_multivariate_gaussian-light.svg b/docs/source/_static/imgs/model_multivariate_gaussian-light.svg
diff --git a/docs/source/api/agents/a2c.rst b/docs/source/api/agents/a2c.rst
@@ -143,8 +143,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/a2c/a2c.py
     :language: python
-    :lines: 18-54
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/amp.rst b/docs/source/api/agents/amp.rst
@@ -139,8 +139,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/amp/amp.py
     :language: python
-    :lines: 18-71
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/cem.rst b/docs/source/api/agents/cem.rst
@@ -98,8 +98,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/cem/cem.py
     :language: python
-    :lines: 15-44
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/ddpg.rst b/docs/source/api/agents/ddpg.rst
@@ -138,8 +138,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/ddpg/ddpg.py
     :language: python
-    :lines: 16-56
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/ddqn.rst b/docs/source/api/agents/ddqn.rst
@@ -98,8 +98,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/dqn/ddqn.py
     :language: python
-    :lines: 16-55
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/dqn.rst b/docs/source/api/agents/dqn.rst
@@ -98,8 +98,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/dqn/dqn.py
     :language: python
-    :lines: 16-55
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/ppo.rst b/docs/source/api/agents/ppo.rst
@@ -71,7 +71,7 @@ Learning algorithm
 |     :green:`# mini-batches loop`
 |     **FOR** each mini-batch [:math:`s, a, logp, V, R, A`] up to :guilabel:`mini_batches` **DO**
 |         :math:`logp' \leftarrow \pi_\theta(s, a)`
-|         :green:`# compute aproximate KL divergence`
+|         :green:`# compute approximate KL divergence`
 |         :math:`ratio \leftarrow logp' - logp`
 |         :math:`KL_{_{divergence}} \leftarrow \frac{1}{N} \sum_{i=1}^N ((e^{ratio} - 1) - ratio)`
 |         :green:`# early stopping with KL divergence`
@@ -159,8 +159,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/ppo/ppo.py
     :language: python
-    :lines: 18-61
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/q_learning.rst b/docs/source/api/agents/q_learning.rst
@@ -78,8 +78,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/q_learning/q_learning.py
     :language: python
-    :lines: 14-35
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/rpo.rst b/docs/source/api/agents/rpo.rst
@@ -110,7 +110,7 @@ Learning algorithm
 |     :green:`# mini-batches loop`
 |     **FOR** each mini-batch [:math:`s, a, logp, V, R, A`] up to :guilabel:`mini_batches` **DO**
 |         :math:`logp' \leftarrow \pi_\theta(s, a)`
-|         :green:`# compute aproximate KL divergence`
+|         :green:`# compute approximate KL divergence`
 |         :math:`ratio \leftarrow logp' - logp`
 |         :math:`KL_{_{divergence}} \leftarrow \frac{1}{N} \sum_{i=1}^N ((e^{ratio} - 1) - ratio)`
 |         :green:`# early stopping with KL divergence`
@@ -198,8 +198,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/rpo/rpo.py
     :language: python
-    :lines: 18-62
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/sac.rst b/docs/source/api/agents/sac.rst
@@ -139,8 +139,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/sac/sac.py
     :language: python
-    :lines: 18-56
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/sarsa.rst b/docs/source/api/agents/sarsa.rst
@@ -78,8 +78,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/sarsa/sarsa.py
     :language: python
-    :lines: 14-35
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/td3.rst b/docs/source/api/agents/td3.rst
@@ -148,8 +148,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/td3/td3.py
     :language: python
-    :lines: 19-63
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/agents/trpo.rst b/docs/source/api/agents/trpo.rst
@@ -195,8 +195,8 @@ Configuration and hyperparameters
 
 .. literalinclude:: ../../../../skrl/agents/torch/trpo/trpo.py
     :language: python
-    :lines: 18-61
-    :linenos:
+    :start-after: [start-config-dict-torch]
+    :end-before: [end-config-dict-torch]
 
 .. raw:: html
 

diff --git a/docs/source/api/envs.rst b/docs/source/api/envs.rst
@@ -65,7 +65,7 @@ In addition, you will be able to :doc:`wrap single-agent <envs/wrapping>` and :d
       - .. centered:: :math:`\blacksquare`
     * - PettingZoo
       - .. centered:: :math:`\blacksquare`
-      - .. centered:: :math:`\square`
+      - .. centered:: :math:`\blacksquare`
     * - robosuite
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
diff --git a/docs/source/api/envs/isaac_gym.rst b/docs/source/api/envs/isaac_gym.rst
@@ -98,7 +98,7 @@ Usage
 API
 ^^^
 
-.. autofunction:: skrl.envs.torch.loaders.load_isaacgym_env_preview4
+.. autofunction:: skrl.envs.loaders.torch.load_isaacgym_env_preview4
 
 .. raw:: html
 
@@ -181,7 +181,7 @@ Usage
 API
 ^^^
 
-.. autofunction:: skrl.envs.torch.loaders.load_isaacgym_env_preview3
+.. autofunction:: skrl.envs.loaders.torch.load_isaacgym_env_preview3
 
 .. raw:: html
 
@@ -260,4 +260,4 @@ Usage
 API
 ^^^
 
-.. autofunction:: skrl.envs.torch.loaders.load_isaacgym_env_preview2
+.. autofunction:: skrl.envs.loaders.torch.load_isaacgym_env_preview2
diff --git a/docs/source/api/envs/isaac_orbit.rst b/docs/source/api/envs/isaac_orbit.rst
@@ -87,4 +87,4 @@ Usage
 API
 ^^^
 
-.. autofunction:: skrl.envs.torch.loaders.load_isaac_orbit_env
+.. autofunction:: skrl.envs.loaders.torch.load_isaac_orbit_env
diff --git a/docs/source/api/envs/multi_agents_wrapping.rst b/docs/source/api/envs/multi_agents_wrapping.rst
@@ -82,7 +82,7 @@ Usage
 API (PyTorch)
 -------------
 
-.. autofunction:: skrl.envs.torch.wrappers.wrap_env
+.. autofunction:: skrl.envs.wrappers.torch.wrap_env
 
 .. raw:: html
 
@@ -91,7 +91,7 @@ API (PyTorch)
 API (JAX)
 ---------
 
-.. autofunction:: skrl.envs.jax.wrappers.wrap_env
+.. autofunction:: skrl.envs.wrappers.jax.wrap_env
 
 .. raw:: html
 
@@ -100,7 +100,7 @@ API (JAX)
 Internal API (PyTorch)
 ----------------------
 
-.. autoclass:: skrl.envs.torch.wrappers.MultiAgentEnvWrapper
+.. autoclass:: skrl.envs.wrappers.torch.MultiAgentEnvWrapper
     :undoc-members:
     :show-inheritance:
     :members:
@@ -117,14 +117,14 @@ Internal API (PyTorch)
 
         A list of all possible_agents the environment could generate
 
-.. autoclass:: skrl.envs.torch.wrappers.BiDexHandsWrapper
+.. autoclass:: skrl.envs.wrappers.torch.BiDexHandsWrapper
     :undoc-members:
     :show-inheritance:
     :members:
 
     .. automethod:: __init__
 
-.. autoclass:: skrl.envs.torch.wrappers.PettingZooWrapper
+.. autoclass:: skrl.envs.wrappers.torch.PettingZooWrapper
     :undoc-members:
     :show-inheritance:
     :members:
@@ -138,7 +138,7 @@ Internal API (PyTorch)
 Internal API (JAX)
 ------------------
 
-.. autoclass:: skrl.envs.jax.wrappers.MultiAgentEnvWrapper
+.. autoclass:: skrl.envs.wrappers.jax.MultiAgentEnvWrapper
     :undoc-members:
     :show-inheritance:
     :members:
@@ -155,14 +155,14 @@ Internal API (JAX)
 
         A list of all possible_agents the environment could generate
 
-.. autoclass:: skrl.envs.jax.wrappers.BiDexHandsWrapper
+.. autoclass:: skrl.envs.wrappers.jax.BiDexHandsWrapper
     :undoc-members:
     :show-inheritance:
     :members:
 
     .. automethod:: __init__
 
-.. autoclass:: skrl.envs.jax.wrappers.PettingZooWrapper
+.. autoclass:: skrl.envs.wrappers.jax.PettingZooWrapper
     :undoc-members:
     :show-inheritance:
     :members:

diff --git a/docs/source/api/envs/omniverse_isaac_gym.rst b/docs/source/api/envs/omniverse_isaac_gym.rst
@@ -159,4 +159,4 @@ In this approach, the RL algorithm is executed on a secondary thread while the s
 API
 ^^^
 
-.. autofunction:: skrl.envs.torch.loaders.load_omniverse_isaacgym_env
+.. autofunction:: skrl.envs.loaders.torch.load_omniverse_isaacgym_env