-
-
Notifications
You must be signed in to change notification settings - Fork 133
Description
Describe the bug
The gymnasium API allows users to seed the environment on each reset to yield reproducible results. Running the environment with the same seed should always give the exact same results. While the documentation recommends that users should seed reset only once, it does not forbid seeding multiple times.
FetchPickAndPlace-v2 does not yield reproducible results under these conditions. The reset observation is identical, but the observations start deviating at the first environment step using identical actions.
Code example
import gymnasium
import numpy as np
def test_reproducibility(env: gymnasium.Env, seed: int = 42):
env.action_space.seed(seed) # Reproducible actions
action = env.action_space.sample() # Same random action for both runs
env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)
env.reset(seed=seed) # Same seed should produce the same observations
obs_2, _, _, _, _ = env.step(action) # Identical action
if isinstance(obs_1, dict):
for key in obs_1:
assert np.all(obs_1[key] == obs_2[key]) # Assertion error: different observations
else:
assert np.all(obs_1 == obs_2)
print(f"Reproducibility test passed for {env.unwrapped.spec.id}")
def main():
test_reproducibility(gymnasium.make('CartPole-v1')) # Works
test_reproducibility(gymnasium.make("FetchPickAndPlace-v2")) # Fails
if __name__ == '__main__':
main()
Stack Trace:
Reproducibility test passed for CartPole-v1
Traceback (most recent call last):
File "/home/amacati/repos/Gymnasium-Robotics/bug_report.py", line 26, in <module>
main()
File "/home/amacati/repos/Gymnasium-Robotics/bug_report.py", line 22, in main
test_reproducibility(gymnasium.make("FetchPickAndPlace-v2")) # Fails
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amacati/repos/Gymnasium-Robotics/bug_report.py", line 14, in test_reproducibility
assert np.all(obs_1[key] == obs_2[key]) # Assertion error: different observations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
System Info
- Clone the latest Gymnasium-Robotics commit (commit 50c8019)
- New mamba environment with Python 3.11
- Install Gymnasium-Robotics with
pip install -e . - Ubuntu 22.04 Jammy
- Python 3.11.7
Additional context
The differences are small, i.e. they sometimes pass a np.allclose assert. In the example above, the object rotation in observation 1 is
[-5.18150577e-08 7.97154734e-08 -1.37921664e-16]
and
[-5.18150577e-08 7.97154734e-08 -1.37780312e-16]
in observation 2. Note the difference in z rotation. In fact, all three rotations are not equal, but the differences are too small to be printed without additional precision.
The inconsistencies arise from the FetchPickAndPlace environment's use of mocap bodies. The position and quaternions of the mocap bodies are currently not reset properly.
Furthermore, the Mujoco integrator uses warmstarts and caches the last controls in mjData. In the current implementation, these are also not reset. Only if these four mjData fields are properly restored to their initial states, env.reset(seed=seed) yields reproducible results.
I will open up a pull request that fixes this.
Checklist
- I have checked that there is no similar issue in the repo (required)