Skip to content

[Bug Report] [MuJoCo] Reacher And Pusher reward is calculated prior to transition #821

@Kallinteris-Andreas

Description

@Kallinteris-Andreas

Describe the bug

In a normal RL environment's step:

  1. execute the actions (change the state according to the state-action transition model)
  2. generate a reward using current state and actions
  3. and do other stuff

which is mean that they generate the reward as a function of the current state and current actions

but in Pusher & Reacher's step:

1. generate a reward using current state and actions
2. execute the actions (change the state according to the state-action transition model)
3. do other stuff

which means that they generate the reward as a function of the previous state and current actions

Learning impact analysis

TODO at some point (will likely do it after 2023)

proposed solution

as I believe, the current v5 MuJoCo environment to be done as is,
and the environments are easily solvable as is anyway,
we should fix this in a future release (v6?)

Code example

def step(self, action):
reward, reward_info = self._get_rew(action)
self.do_simulation(action, self.frame_skip)
observation = self._get_obs()
info = reward_info
if self.render_mode == "human":
self.render()
return observation, reward, False, False, info

lines 199 and 200 are in the opposite order

Additional context

This has been an issue since reacher was introduced in the initial commit.
This issue was first reported in 2018, but never addressed.

Checklist

  • I have checked that there is no similar issue in the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions