-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Continuation of #500
Describe the bug
The healthy reward on some MuJoCo Envs is given even on
Ant
Gymnasium/gymnasium/envs/mujoco/ant_v4.py
Lines 276 to 280 in 07185f5
| def healthy_reward(self): | |
| return ( | |
| float(self.is_healthy or self._terminate_when_unhealthy) | |
| * self._healthy_reward | |
| ) |
issue 1
terminate_when_unhealthy affects the reward function (it should only affect if the env terminates)
issue 2
if terminate_when_unhealthy is true (which is the default) the or statement is true → the reward gets given regardless of if the environment is healthy or not
Hopper
Gymnasium/gymnasium/envs/mujoco/hopper_v4.py
Lines 216 to 220 in 07185f5
| def healthy_reward(self): | |
| return ( | |
| float(self.is_healthy or self._terminate_when_unhealthy) | |
| * self._healthy_reward | |
| ) |
Same issues
Humanoid
Gymnasium/gymnasium/envs/mujoco/humanoid_v4.py
Lines 329 to 333 in 07185f5
| def healthy_reward(self): | |
| return ( | |
| float(self.is_healthy or self._terminate_when_unhealthy) | |
| * self._healthy_reward | |
| ) |
Same issues
Inverted (Double) Pendulum
Walker2D
Gymnasium/gymnasium/envs/mujoco/walker2d_v4.py
Lines 217 to 221 in 07185f5
| def healthy_reward(self): | |
| return ( | |
| float(self.is_healthy or self._terminate_when_unhealthy) | |
| * self._healthy_reward | |
| ) |
Same issues
Note
v3 and v2 are also affected by this bug, but I am not showing the respective codes
Unaffected (no terminal state)
HalfCheetah
HumanoidStandup
Reacher
Pusher
Swimmer
Proposed solution change in v5
def healthy_reward(self):
return self.is_healthy * self._healthy_rewardCode example
#Note: Hopper-v4/3/2 does not have `info['reward_survive']`, but it is still affected
#Note: Walker2d-v4/3/2 does not have `info['reward_survive']`, but it is still affected
#Note: Inverted(Double)Pendulum-v4/2 does not have `info['reward_survive']`, but it is still affected
@pytest.mark.parametrize("env_id", ['Ant-v4', 'Ant-v3', 'Ant-v2'])
def test_verify_reward_survive(env_id):
env = gym.make(env_id)
env.reset()
for step in range(10000):
obs, rew, terminal, truncated, info = env.step(env.action_space.sample())
if terminal:
assert(info['reward_survive'] == 0)
break
assert(info['reward_survive'] != 0)
assert False, "If you get here, it means that the testing methodology has failed."
@pytest.mark.parametrize("env_id", ['Humanoid-v4'])
def test_verify_reward_survive_human(env_id):
env = gym.make(env_id)
env.reset()
for step in range(10000):
obs, rew, terminal, truncated, info = env.step(env.action_space.sample())
if terminal:
assert(info['reward_alive'] == 0)
break
assert(info['reward_alive'] != 0)
assert False, "If you get here, it means that the testing methodology has failed."
@pytest.mark.parametrize("env_id", ['Ant-v5', 'Hopper-v5', 'Humanoid-v5', 'InvertedDoublePendulum-v5', 'InvertedPendulum-v5', 'Walker2d-v5'])
def test_verify_reward_survive_v5(env_id):
"""Assert that `reward_survive` is 0 on `terminal` states and not 0 on non-`terminal` states"""
env = gym.make(env_id, reset_noise_scale=0)
env.reset(seed=0)
env.action_space.seed(0)
for step in range(175):
obs, rew, terminal, truncated, info = env.step(env.action_space.sample())
if terminal:
assert(info['reward_survive'] == 0)
break
assert(info['reward_survive'] != 0)
assert terminal, "The environment, should have terminated, if not the test is not valid."System info
gym 28.1
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo