Doc fix and add Stable-Baselines3 Jax (SBX) page (#1566)

* Fix custom policy example * Add RL Zoo doc link * Add changelog to pypi * Add SBX doc page * Fix small mistake in docstring --------- Co-authored-by: Peter Elmers <peter.elmers@yahoo.com>
DLR-RM · Jun 21, 2023 · 4fdb65e · 4fdb65e
1 parent f667f08
commit 4fdb65e
Show file tree

Hide file tree

Showing 8 changed files with 77 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -73,7 +73,7 @@ Goals of this repository:
 
 Github repo: https://github.com/DLR-RM/rl-baselines3-zoo
 
-Documentation: https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.html
+Documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/
 
 ## SB3-Contrib: Experimental RL Features
 

diff --git a/docs/guide/custom_policy.rst b/docs/guide/custom_policy.rst
@@ -371,7 +371,8 @@ If your task requires even more granular control over the policy/value architect
           *args,
           **kwargs,
       ):
-
+          # Disable orthogonal initialization
+          kwargs["ortho_init"] = False
           super().__init__(
               observation_space,
               action_space,
@@ -380,8 +381,7 @@ If your task requires even more granular control over the policy/value architect
               *args,
               **kwargs,
           )
-          # Disable orthogonal initialization
-          self.ortho_init = False
+
 
       def _build_mlp_extractor(self) -> None:
           self.mlp_extractor = CustomNetwork(self.features_dim)

diff --git a/docs/guide/rl_zoo.rst b/docs/guide/rl_zoo.rst
@@ -17,6 +17,8 @@ Goals of this repository:
 3. Provide tuned hyperparameters for each environment and RL algorithm
 4. Have fun with the trained agents!
 
+Documentation is available online: https://rl-baselines3-zoo.readthedocs.io/
+
 Installation
 ------------
 

diff --git a/docs/guide/sbx.rst b/docs/guide/sbx.rst
@@ -0,0 +1,66 @@
+.. _sbx:
+
+==========================
+Stable Baselines Jax (SBX)
+==========================
+
+`Stable Baselines Jax (SBX) <https://github.com/araffin/sbx>`_ is a proof of concept version of Stable-Baselines3 in Jax.
+
+It provides a minimal number of features compared to SB3 but can be much faster (up to 20x times!): https://twitter.com/araffin2/status/1590714558628253698
+
+Implemented algorithms:
+
+- Soft Actor-Critic (SAC) and SAC-N
+- Truncated Quantile Critics (TQC)
+- Dropout Q-Functions for Doubly Efficient Reinforcement Learning (DroQ)
+- Proximal Policy Optimization (PPO)
+- Deep Q Network (DQN)
+
+
+As SBX follows SB3 API, it is also compatible with the `RL Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_.
+For that you will need to create two files:
+
+``train_sbx.py``:
+
+.. code-block:: python
+
+  import rl_zoo3
+  import rl_zoo3.train
+  from rl_zoo3.train import train
+  from sbx import DQN, PPO, SAC, TQC, DroQ
+
+
+  rl_zoo3.ALGOS["tqc"] = TQC
+  rl_zoo3.ALGOS["droq"] = DroQ
+  rl_zoo3.ALGOS["sac"] = SAC
+  rl_zoo3.ALGOS["ppo"] = PPO
+  rl_zoo3.ALGOS["dqn"] = DQN
+  rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
+  rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS
+
+  if __name__ == "__main__":
+      train()
+
+Then you can call ``python train_sbx.py --algo sac --env Pendulum-v1`` and use the RL Zoo CLI.
+
+
+``enjoy_sbx.py``:
+
+.. code-block:: python
+
+  import rl_zoo3
+  import rl_zoo3.enjoy
+  from rl_zoo3.enjoy import enjoy
+  from sbx import DQN, PPO, SAC, TQC, DroQ
+
+
+  rl_zoo3.ALGOS["tqc"] = TQC
+  rl_zoo3.ALGOS["droq"] = DroQ
+  rl_zoo3.ALGOS["sac"] = SAC
+  rl_zoo3.ALGOS["ppo"] = PPO
+  rl_zoo3.ALGOS["dqn"] = DQN
+  rl_zoo3.enjoy.ALGOS = rl_zoo3.ALGOS
+  rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS
+
+  if __name__ == "__main__":
+      enjoy()
diff --git a/docs/index.rst b/docs/index.rst
@@ -51,6 +51,7 @@ Main Features
    guide/integrations
    guide/rl_zoo
    guide/sb3_contrib
+   guide/sbx
    guide/imitation
    guide/migration
    guide/checking_nan

diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -80,6 +80,8 @@ Documentation:
 - Added ``EvalCallback`` example (@sidney-tio)
 - Update custom env documentation
 - Added `pink-noise-rl` to projects page
+- Fix custom policy example, ``ortho_init`` was ignored
+- Added SBX page
 
 
 Release 1.8.0 (2023-04-07)

diff --git a/setup.py b/setup.py
@@ -159,6 +159,7 @@
     project_urls={
         "Code": "https://github.com/DLR-RM/stable-baselines3",
         "Documentation": "https://stable-baselines3.readthedocs.io/",
+        "Changelog": "https://stable-baselines3.readthedocs.io/en/master/misc/changelog.html",
         "SB3-Contrib": "https://github.com/Stable-Baselines-Team/stable-baselines3-contrib",
         "RL-Zoo": "https://github.com/DLR-RM/rl-baselines3-zoo",
     },

diff --git a/stable_baselines3/common/monitor.py b/stable_baselines3/common/monitor.py
@@ -202,7 +202,7 @@ def __init__(
 
     def write_row(self, epinfo: Dict[str, float]) -> None:
         """
-        Close the file handler
+        Write row of monitor data to csv log file.
 
         :param epinfo: the information on episodic return, length, and time
         """