Skip to content

Commit

Permalink
Fixed parameter order for compute_reward (#725)
Browse files Browse the repository at this point in the history
* fixed parameter order for compute_reward

the openai-gym signature compute_reward(self, achieved_goal, desired_goal, info) was not matched, which leads to errors when the reward is not symmetric regarding desired_goal and achieved_goal.

* added changelog entry

added entry to the changelog for the (minor) bugfix

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
  • Loading branch information
johannes-dornheim and araffin committed Mar 21, 2020
1 parent e24c380 commit 49b1ba6
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ New Features:
Bug Fixes:
^^^^^^^^^^
- Fixed DDPG sampling empty replay buffer when combined with HER (@tirafesi)
- Fixed a bug in ``HindsightExperienceReplayWrapper``, where the openai-gym signature for ``compute_reward`` was not matched correctly (@johannes-dornheim)

Deprecations:
^^^^^^^^^^^^^
Expand Down Expand Up @@ -683,4 +684,4 @@ Thanks to @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar @abhiskk
@XMaster96 @kantneel @Pastafarianist @GerardMaggiolino @PatrickWalter214 @yutingsz @sc420 @Aaahh @billtubbs
@Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket
@MarvineGothic @jdossgollin @SyllogismRXS @rusu24edward @jbulow @Antymon @seheevic @justinkterry @edbeeching
@flodorner @KuKuXia @NeoExtended @solliet @mmcenta @richardwu @tirafesi @caburu
@flodorner @KuKuXia @NeoExtended @solliet @mmcenta @richardwu @tirafesi @caburu @johannes-dornheim
2 changes: 1 addition & 1 deletion stable_baselines/her/replay_buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def _store_episode(self):
next_obs_dict['desired_goal'] = goal

# Update the reward according to the new desired goal
reward = self.env.compute_reward(goal, next_obs_dict['achieved_goal'], None)
reward = self.env.compute_reward(next_obs_dict['achieved_goal'], goal, None)
# Can we use achieved_goal == desired_goal?
done = False

Expand Down

0 comments on commit 49b1ba6

Please sign in to comment.