Skip to content

[Bug Report] MuJoCo Envs, healthy reward issues #526

@Kallinteris-Andreas

Description

@Kallinteris-Andreas

Continuation of #500

Describe the bug

The healthy reward on some MuJoCo Envs is given even on

Ant

def healthy_reward(self):
return (
float(self.is_healthy or self._terminate_when_unhealthy)
* self._healthy_reward
)

issue 1

terminate_when_unhealthy affects the reward function (it should only affect if the env terminates)

issue 2

if terminate_when_unhealthy is true (which is the default) the or statement is true → the reward gets given regardless of if the environment is healthy or not

Hopper

def healthy_reward(self):
return (
float(self.is_healthy or self._terminate_when_unhealthy)
* self._healthy_reward
)

Same issues

Humanoid

def healthy_reward(self):
return (
float(self.is_healthy or self._terminate_when_unhealthy)
* self._healthy_reward
)

Same issues

Inverted (Double) Pendulum

#500

Walker2D

def healthy_reward(self):
return (
float(self.is_healthy or self._terminate_when_unhealthy)
* self._healthy_reward
)

Same issues

Note

v3 and v2 are also affected by this bug, but I am not showing the respective codes

Unaffected (no terminal state)

HalfCheetah
HumanoidStandup
Reacher
Pusher
Swimmer

Proposed solution change in v5

 def healthy_reward(self): 
     return self.is_healthy * self._healthy_reward

Code example

#Note: Hopper-v4/3/2 does not have `info['reward_survive']`, but it is still affected
#Note: Walker2d-v4/3/2 does not have `info['reward_survive']`, but it is still affected
#Note: Inverted(Double)Pendulum-v4/2 does not have `info['reward_survive']`, but it is still affected
@pytest.mark.parametrize("env_id", ['Ant-v4', 'Ant-v3', 'Ant-v2'])
def test_verify_reward_survive(env_id):
    env = gym.make(env_id)
    env.reset()

    for step in range(10000):
        obs, rew, terminal, truncated, info = env.step(env.action_space.sample())

        if terminal:
            assert(info['reward_survive'] == 0)
            break

        assert(info['reward_survive'] != 0)

    assert False, "If you get here, it means that the testing methodology has failed."

@pytest.mark.parametrize("env_id", ['Humanoid-v4'])
def test_verify_reward_survive_human(env_id):
    env = gym.make(env_id)
    env.reset()

    for step in range(10000):
        obs, rew, terminal, truncated, info = env.step(env.action_space.sample())

        if terminal:
            assert(info['reward_alive'] == 0)
            break

        assert(info['reward_alive'] != 0)

    assert False, "If you get here, it means that the testing methodology has failed."


@pytest.mark.parametrize("env_id", ['Ant-v5', 'Hopper-v5', 'Humanoid-v5', 'InvertedDoublePendulum-v5', 'InvertedPendulum-v5', 'Walker2d-v5'])
def test_verify_reward_survive_v5(env_id):
    """Assert that `reward_survive` is 0 on `terminal` states and not 0 on non-`terminal` states"""
    env = gym.make(env_id, reset_noise_scale=0)
    env.reset(seed=0)
    env.action_space.seed(0)

    for step in range(175):
        obs, rew, terminal, truncated, info = env.step(env.action_space.sample())

        if terminal:
            assert(info['reward_survive'] == 0)
            break

        assert(info['reward_survive'] != 0)

    assert terminal, "The environment, should have terminated, if not the test is not valid."

System info

gym 28.1

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions