Skip to content

Commit

Permalink
Fix consistency in setup_model for SAC (#766)
Browse files Browse the repository at this point in the history
* SAC can setup model without environment

If given an `action_space` and an `observation_space`, SAC can now setup the model without the environment and is consistent with TD3/DDPG.

* Update changelog.rst

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
  • Loading branch information
m-rph and araffin committed Mar 25, 2020
1 parent d2364c9 commit ac92d2e
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Bug Fixes:
- Fixed a bug in ``HindsightExperienceReplayWrapper``, where the openai-gym signature for ``compute_reward`` was not matched correctly (@johannes-dornheim)
- Fixed SAC/TD3 checking time to update on learn steps instead of total steps (@solliet)
- Added ``**kwarg`` pass through for ``reset`` method in ``atari_wrappers.FrameStack`` (@solliet)
- Fix consistency in ``setup_model()`` for SAC, ``target_entropy`` now uses ``self.action_space`` instead of ``self.env.action_space`` (@solliet)

Deprecations:
^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ def setup_model(self):
# Target entropy is used when learning the entropy coefficient
if self.target_entropy == 'auto':
# automatically set target entropy if needed
self.target_entropy = -np.prod(self.env.action_space.shape).astype(np.float32)
self.target_entropy = -np.prod(self.action_space.shape).astype(np.float32)
else:
# Force conversion
# this will also throw an error for unexpected string
Expand Down

0 comments on commit ac92d2e

Please sign in to comment.