Skip to content

Commit

Permalink
fix approximate entropy calculation in PPO and A2C (#130)
Browse files Browse the repository at this point in the history
  • Loading branch information
AndyShih12 committed Jul 29, 2020
1 parent bd2aae0 commit 8f9aaae
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 3 deletions.
3 changes: 2 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Bug Fixes:
- Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states
- Use ``cloudpickle.load`` instead of ``pickle.load`` in ``CloudpickleWrapper``. (@shwang)
- Fixed a bug with orthogonal initialization when `bias=False` in custom policy (@rk37)
- Fixed approximate entropy calculation in PPO and A2C. (@andyshih12)

Deprecations:
^^^^^^^^^^^^^
Expand Down Expand Up @@ -357,4 +358,4 @@ And all the contributors:
@Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket
@MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow @Antymon @seheevic @justinkterry @edbeeching
@flodorner @KuKuXia @NeoExtended @PartiallyTyped @mmcenta @richardwu @kinalmehta @rolandgvc @tkelestemur @mloo3
@tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37
@tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37 @andyshih12
2 changes: 1 addition & 1 deletion stable_baselines3/a2c/a2c.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def train(self) -> None:
# Entropy loss favor exploration
if entropy is None:
# Approximate entropy when no analytical form
entropy_loss = -log_prob.mean()
entropy_loss = -th.mean(-log_prob)
else:
entropy_loss = -th.mean(entropy)

Expand Down
2 changes: 1 addition & 1 deletion stable_baselines3/ppo/ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ def train(self) -> None:
# Entropy loss favor exploration
if entropy is None:
# Approximate entropy when no analytical form
entropy_loss = -log_prob.mean()
entropy_loss = -th.mean(-log_prob)
else:
entropy_loss = -th.mean(entropy)

Expand Down

0 comments on commit 8f9aaae

Please sign in to comment.