Describe the bug
In a normal RL environment's step:
- execute the actions (change the state according to the state-action transition model)
- generate a reward using current state and actions
- and do other stuff
which is mean that they generate the reward as a function of the current state and current actions
but in Pusher & Reacher's step:
1. generate a reward using current state and actions
2. execute the actions (change the state according to the state-action transition model)
3. do other stuff
which means that they generate the reward as a function of the previous state and current actions
Learning impact analysis
TODO at some point (will likely do it after 2023)
proposed solution
as I believe, the current v5 MuJoCo environment to be done as is,
and the environments are easily solvable as is anyway,
we should fix this in a future release (v6?)
Code example
|
def step(self, action): |
|
reward, reward_info = self._get_rew(action) |
|
self.do_simulation(action, self.frame_skip) |
|
|
|
observation = self._get_obs() |
|
info = reward_info |
|
if self.render_mode == "human": |
|
self.render() |
|
return observation, reward, False, False, info |
lines 199 and 200 are in the opposite order
Additional context
This has been an issue since reacher was introduced in the initial commit.
This issue was first reported in 2018, but never addressed.
Checklist
Describe the bug
In a normal RL environment's step:
which is mean that they generate the reward as a function of the current state and current actions
but in
Pusher&Reacher's step:which means that they generate the reward as a function of the previous state and current actions
Learning impact analysis
TODO at some point (will likely do it after 2023)
proposed solution
as I believe, the current
v5MuJoCo environment to be done as is,and the environments are easily solvable as is anyway,
we should fix this in a future release (v6?)
Code example
Gymnasium/gymnasium/envs/mujoco/reacher_v5.py
Lines 199 to 207 in 14def07
lines 199 and 200 are in the opposite order
Additional context
This has been an issue since
reacherwas introduced in the initial commit.This issue was first reported in 2018, but never addressed.
Checklist