Fixed parameter order for compute_reward (#725)

* fixed parameter order for compute_reward the openai-gym signature compute_reward(self, achieved_goal, desired_goal, info) was not matched, which leads to errors when the reward is not symmetric regarding desired_goal and achieved_goal. * added changelog entry added entry to the changelog for the (minor) bugfix * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Stable-Baselines-Team · Mar 21, 2020 · 49b1ba6 · 49b1ba6
1 parent e24c380
commit 49b1ba6
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 2 deletions.
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -18,6 +18,7 @@ New Features:
 Bug Fixes:
 ^^^^^^^^^^
 - Fixed DDPG sampling empty replay buffer when combined with HER  (@tirafesi)
+- Fixed a bug in ``HindsightExperienceReplayWrapper``, where the openai-gym signature for ``compute_reward`` was not matched correctly (@johannes-dornheim)
 
 Deprecations:
 ^^^^^^^^^^^^^
@@ -683,4 +684,4 @@ Thanks to @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar @abhiskk
 @XMaster96 @kantneel @Pastafarianist @GerardMaggiolino @PatrickWalter214 @yutingsz @sc420 @Aaahh @billtubbs
 @Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket
 @MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow @Antymon @seheevic @justinkterry @edbeeching
-@flodorner @KuKuXia @NeoExtended @solliet @mmcenta @richardwu @tirafesi @caburu 
+@flodorner @KuKuXia @NeoExtended @solliet @mmcenta @richardwu @tirafesi @caburu @johannes-dornheim
diff --git a/stable_baselines/her/replay_buffer.py b/stable_baselines/her/replay_buffer.py
@@ -173,7 +173,7 @@ def _store_episode(self):
                 next_obs_dict['desired_goal'] = goal
 
                 # Update the reward according to the new desired goal
-                reward = self.env.compute_reward(goal, next_obs_dict['achieved_goal'], None)
+                reward = self.env.compute_reward(next_obs_dict['achieved_goal'], goal, None)
                 # Can we use achieved_goal == desired_goal?
                 done = False