Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] [MuJoCo] Reacher And Pusher reward is calculated prior to transition #821

Closed
1 task done
Kallinteris-Andreas opened this issue Dec 7, 2023 · 3 comments · Fixed by #832
Closed
1 task done
Labels
bug Something isn't working

Comments

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Dec 7, 2023

Describe the bug

In a normal RL environment's step:

  1. execute the actions (change the state according to the state-action transition model)
  2. generate a reward using current state and actions
  3. and do other stuff

which is mean that they generate the reward as a function of the current state and current actions

but in Pusher & Reacher's step:

1. generate a reward using current state and actions
2. execute the actions (change the state according to the state-action transition model)
3. do other stuff

which means that they generate the reward as a function of the previous state and current actions

Learning impact analysis

TODO at some point (will likely do it after 2023)

proposed solution

as I believe, the current v5 MuJoCo environment to be done as is,
and the environments are easily solvable as is anyway,
we should fix this in a future release (v6?)

Code example

def step(self, action):
reward, reward_info = self._get_rew(action)
self.do_simulation(action, self.frame_skip)
observation = self._get_obs()
info = reward_info
if self.render_mode == "human":
self.render()
return observation, reward, False, False, info

lines 199 and 200 are in the opposite order

Additional context

This has been an issue since reacher was introduced in the initial commit.
This issue was first reported in 2018, but never addressed.

Checklist

  • I have checked that there is no similar issue in the repo
@Kallinteris-Andreas Kallinteris-Andreas added the bug Something isn't working label Dec 7, 2023
@pseudo-rnd-thoughts
Copy link
Member

Is it possible to change this in v5 before we make the release?
I can run the code if you need

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Dec 9, 2023

Quick analysis:
The agents were trained using the environment mentioned on the legend, but evaluated with fixed environment.

The performance is very similar (though slightly more consistent for the fixed reward case)

Pusher
Reacher

code:
https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation/tree/main/hand_manipulation_rew_fix

@pseudo-rnd-thoughts
Copy link
Member

Amazing, could you make a PR to change the implementation to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants