Skip to content

Commit

Permalink
Fix q-target in SAC (#77)
Browse files Browse the repository at this point in the history
* Fix q-target in SAC

* [ci skip] Update version
  • Loading branch information
araffin committed Jun 29, 2020
1 parent 96b771f commit 08e7519
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 5 deletions.
3 changes: 2 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Changelog
==========

Pre-Release 0.8.0a1 (WIP)
Pre-Release 0.8.0a2 (WIP)
------------------------------

Breaking Changes:
Expand All @@ -21,6 +21,7 @@ New Features:
Bug Fixes:
^^^^^^^^^^
- Fixed a bug in the ``close()`` method of ``SubprocVecEnv``, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended)
- Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states

Deprecations:
^^^^^^^^^^^^^
Expand Down
5 changes: 2 additions & 3 deletions stable_baselines3/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,10 +202,9 @@ def train(self, gradient_steps: int, batch_size: int = 64) -> None:
next_actions, next_log_prob = self.actor.action_log_prob(replay_data.next_observations)
# Compute the target Q value
target_q1, target_q2 = self.critic_target(replay_data.next_observations, next_actions)
target_q = th.min(target_q1, target_q2)
target_q = replay_data.rewards + (1 - replay_data.dones) * self.gamma * target_q
target_q = th.min(target_q1, target_q2) - ent_coef * next_log_prob.reshape(-1, 1)
# td error + entropy term
q_backup = target_q - ent_coef * next_log_prob.reshape(-1, 1)
q_backup = replay_data.rewards + (1 - replay_data.dones) * self.gamma * target_q

# Get current Q estimates
# using action from the replay buffer
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines3/version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.8.0a1
0.8.0a2

0 comments on commit 08e7519

Please sign in to comment.