[Bug Report] [MuJoCo
] Reacher And Pusher reward is calculated prior to transition
#821
Closed
1 task done
Labels
bug
Something isn't working
Describe the bug
In a normal RL environment's step:
which is mean that they generate the reward as a function of the current state and current actions
but in
Pusher
&Reacher
's step:which means that they generate the reward as a function of the previous state and current actions
Learning impact analysis
TODO at some point (will likely do it after 2023)
proposed solution
as I believe, the current
v5
MuJoCo environment to be done as is,and the environments are easily solvable as is anyway,
we should fix this in a future release (v6?)
Code example
Gymnasium/gymnasium/envs/mujoco/reacher_v5.py
Lines 199 to 207 in 14def07
lines 199 and 200 are in the opposite order
Additional context
This has been an issue since
reacher
was introduced in the initial commit.This issue was first reported in 2018, but never addressed.
Checklist
The text was updated successfully, but these errors were encountered: