[Bug Report] [`MuJoCo`] Reacher And Pusher reward is calculated prior to transition #821

Kallinteris-Andreas · 2023-12-07T14:52:56Z

Describe the bug

In a normal RL environment's step:

execute the actions (change the state according to the state-action transition model)
generate a reward using current state and actions
and do other stuff

which is mean that they generate the reward as a function of the current state and current actions

but in Pusher & Reacher's step:

1. generate a reward using current state and actions
2. execute the actions (change the state according to the state-action transition model)
3. do other stuff

which means that they generate the reward as a function of the previous state and current actions

Learning impact analysis

TODO at some point (will likely do it after 2023)

proposed solution

as I believe, the current v5 MuJoCo environment to be done as is,
and the environments are easily solvable as is anyway,
we should fix this in a future release (v6?)

Code example

Gymnasium/gymnasium/envs/mujoco/reacher_v5.py

Lines 199 to 207 in 14def07

    
           def step(self, action): 
        
               reward, reward_info = self._get_rew(action) 
        
               self.do_simulation(action, self.frame_skip) 
        
               observation = self._get_obs() 
        
               info = reward_info 
        
               if self.render_mode == "human": 
        
                   self.render() 
        
               return observation, reward, False, False, info

lines 199 and 200 are in the opposite order

Additional context

This has been an issue since reacher was introduced in the initial commit.
This issue was first reported in 2018, but never addressed.

Checklist

I have checked that there is no similar issue in the repo

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2023-12-07T15:15:41Z

Is it possible to change this in v5 before we make the release?
I can run the code if you need

Kallinteris-Andreas · 2023-12-09T10:48:23Z

Quick analysis:
The agents were trained using the environment mentioned on the legend, but evaluated with fixed environment.

The performance is very similar (though slightly more consistent for the fixed reward case)

code:
https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation/tree/main/hand_manipulation_rew_fix

pseudo-rnd-thoughts · 2023-12-09T12:37:42Z

Amazing, could you make a PR to change the implementation to this?

Kallinteris-Andreas added the bug Something isn't working label Dec 7, 2023

This was referenced Dec 9, 2023

Fix Reacher-v5 & Pusher-v5 reward function being calculated using previous state #832

Merged

Add MuJoCo v5 environments #572

Merged

pseudo-rnd-thoughts closed this as completed in #832 Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] [`MuJoCo`] Reacher And Pusher reward is calculated prior to transition #821

[Bug Report] [`MuJoCo`] Reacher And Pusher reward is calculated prior to transition #821

Kallinteris-Andreas commented Dec 7, 2023 •

edited

pseudo-rnd-thoughts commented Dec 7, 2023

Kallinteris-Andreas commented Dec 9, 2023 •

edited

pseudo-rnd-thoughts commented Dec 9, 2023

[Bug Report] [MuJoCo] Reacher And Pusher reward is calculated prior to transition #821

[Bug Report] [MuJoCo] Reacher And Pusher reward is calculated prior to transition #821

Comments

Kallinteris-Andreas commented Dec 7, 2023 • edited

Describe the bug

Learning impact analysis

proposed solution

Code example

Additional context

Checklist

pseudo-rnd-thoughts commented Dec 7, 2023

Kallinteris-Andreas commented Dec 9, 2023 • edited

pseudo-rnd-thoughts commented Dec 9, 2023

[Bug Report] [`MuJoCo`] Reacher And Pusher reward is calculated prior to transition #821

[Bug Report] [`MuJoCo`] Reacher And Pusher reward is calculated prior to transition #821

Kallinteris-Andreas commented Dec 7, 2023 •

edited

Kallinteris-Andreas commented Dec 9, 2023 •

edited