Fix consistency in setup_model for SAC (#766)

* SAC can setup model without environment If given an `action_space` and an `observation_space`, SAC can now setup the model without the environment and is consistent with TD3/DDPG. * Update changelog.rst * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Stable-Baselines-Team · Mar 25, 2020 · ac92d2e · ac92d2e
1 parent d2364c9
commit ac92d2e
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 1 deletion.
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -21,6 +21,7 @@ Bug Fixes:
 - Fixed a bug in ``HindsightExperienceReplayWrapper``, where the openai-gym signature for ``compute_reward`` was not matched correctly (@johannes-dornheim)
 - Fixed SAC/TD3 checking time to update on learn steps instead of total steps (@solliet)
 - Added ``**kwarg`` pass through for ``reset`` method in ``atari_wrappers.FrameStack`` (@solliet)
+- Fix consistency in ``setup_model()`` for SAC, ``target_entropy`` now uses ``self.action_space`` instead of ``self.env.action_space`` (@solliet)
 
 Deprecations:
 ^^^^^^^^^^^^^

diff --git a/stable_baselines/sac/sac.py b/stable_baselines/sac/sac.py
@@ -178,7 +178,7 @@ def setup_model(self):
                     # Target entropy is used when learning the entropy coefficient
                     if self.target_entropy == 'auto':
                         # automatically set target entropy if needed
-                        self.target_entropy = -np.prod(self.env.action_space.shape).astype(np.float32)
+                        self.target_entropy = -np.prod(self.action_space.shape).astype(np.float32)
                     else:
                         # Force conversion
                         # this will also throw an error for unexpected string