Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] MuJoCo.Ant contact forces being off by default is based on a wrong experiment #214

Closed
1 task done
Kallinteris-Andreas opened this issue Dec 12, 2022 · 9 comments
Labels
bug Something isn't working

Comments

@Kallinteris-Andreas
Copy link
Collaborator

Describe the bug

The problem

due to openai/gym#2762 (comment) it was decided that use_contact_forces would default to False, but
The 2 different problem factorizations, used DIFFERENT REWARD FUNCTIONS

As you can see here, the reward functions are indeed different:
https://github.com/rodrigodelazcano/gym/blob/9c9741498dd0b613fb2d418f17d77ab5f6e60476/gym/envs/mujoco/ant_v4.py#L264

This behavior (of differing rewards functions) is also not documented at all (i can make a PR for that)

@rodrigodelazcano

Code at that commit: (it is same as the current code, as far we are concerned, with our current problem)
https://github.com/rodrigodelazcano/gym/blob/9c9741498dd0b613fb2d418f17d77ab5f6e60476/gym/envs/mujoco/ant_v4.py

Code example

No response

System info

No response

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
@pseudo-rnd-thoughts
Copy link
Member

To confirm, the issue is that the reward function is dependant on the use_contact_forces which is not documented

@Kallinteris-Andreas
Copy link
Collaborator Author

@pseudo-rnd-thoughts
There are 2 issues

  1. The undocumented reward functions differences with use_contact_forces
  2. The fact that use_contact_forces defaults to =False based on a wrong experiment (this should be investigated for a potential MuJoCo-v5 or Brax based environments)

@pseudo-rnd-thoughts
Copy link
Member

Ok, could you make a PR to address 1
Could you explain why it was based on the wrong experiment? I wasn't involved and don't know anything about the experiments. What did they do? What is the issue with that? What do you propose the solution should be? What does your experiment show?

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Dec 15, 2022

Ok so what happened (from what I understand it, I would like @rodrigodelazcano to comment here)

  • back when using the mujoco-py bindings the external forces (mujoco.cfrc_ext) were not working which resulted in all the those observations being 0 and contact cost == 0
  • because of that, when developing the new environment with mujoco bindings they added an option to disable external forces (mujoco.cfrc_ext) (use_contact_forces) which alters both the observations and the reward function (by disabling contact_cost)
  • Then they run an ablation study to see if disabling mujoco.cfrc_ext (use_contact_forces=False) would provide better results (here they made the crucial mistake of comparing 2 implementations with 2 different reward functions because they forgot that use_contact_forces alters the reward function) Change mujoco_py bindings for mujoco Deepmind bindings openai/gym#2762 (comment)
  • The experiment showed that disabling mujoco.cfrc_ext provided significantly bigger returns (which makes sense since they disabled the biggest cost from the reward)

The problem of this, is that unlike other MuJoCo environments the Ant is significantly different between v2/v3 and v4. (I have not looked into if this has concluded in wrong conclusions in research paper)

The immediate step would be to run an ablation study with the same reward function with/without mujoco.cfrc_ext.
I do not have computing power to spare to run this experiment anytime soon.

I do not know what the best solution to this problem is for MuJoCo environments.
A possible solution would be to make ant-v5 (that would have the same reward function as ant-v2/v3)

At the very least, we should know if external forces matter for Ant environments for Gymnasium.Brax

@rodrigodelazcano
Copy link
Member

@Kallinteris-Andreas yes, you are right. We missed taking into account the fact that adding/removing contact_cost to the reward function would affect the return differently in the long run. The initial justification of having a performance degradation due to contact forces in the observations is wrong, sorry about that. I'll run this weekend the suggested experiments to make sure that's the case.

The past versions of the environment (v2/v3) are being kept for reproducibility of past research. We haven't modified anything and they can still be used with mujoco_py and older versions of mujoco.

As you've mentioned v4 environments upgrade to use the latest mujoco bindings instead of mujoco-py since it's no longer maintained. In the process we also decided to fix the external contact forces issue you've mentioned that appeared with later mujoco versions mujoco>=2.0. Thus the reward function of v4 is no different from v2/v3 if external contact forces are included in observation and reward, with use_contact_forces.

However, because we observed successful learning without contact forces in openai/gym#2762 (comment), we decided to make the use of external forces (observation/reward) optional. Having said this I don't think there is a need for a v5 version other than your documentation updates mentioning that using contact forces will affect the reward function, which is highly important and I thank you for finding this out. What do you think @pseudo-rnd-thoughts?

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Dec 16, 2022

Thanks both of you, my proposed answer would be to add a note on the ant documentation to note this difference between v2/3 and v4 (with default parameters) and using use_contact_forces for equivalence between mujoco-py and mujoco bindings.
Additionally, once the experiments are completed, then we need to add a comment to the original PR which we can link in the documentation. This should avoid the need for a v5 environment but if users read the documentation they can understand the difference between versions, etc

@Kallinteris-Andreas
Copy link
Collaborator Author


Surprisingly including the contact forces to the observation space, does not appear to have any significant impact to the average performance
but including the contact forces significantly reduces the training variance.

I think the default should to observe the contact forces, as stable training performance is important for evaluating training algorithms.

Note: also the worldbody contact forces were asserted to be 0 through the entire process

@rodrigodelazcano
Copy link
Member

rodrigodelazcano commented May 14, 2023

@Kallinteris-Andreas Thanks for taking the time to run the experiment. Sounds good setting the default to use contact forces, I approve. Can you make this change to the v4 environment in Gymnasium.

The removal of the worldbody elements can be done in v5.

@Kallinteris-Andreas
Copy link
Collaborator Author

I do not think we should change the defaults on v4 since users would use and get a different environment when they do:

import gymnasium
import gymnasium_robotics
env = gymnasium.make('Ant-v4')

depending on the gymnasium and gymnasium_robotics versions

also keep in mind that in v4 changing use_conctact_focres would also change the rewards

It would create unnecessary confusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants