[Question] Return of IPO's adv_surrograte is confused #234

stvsd1314 · 2023-05-05T14:31:56Z

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

why the return of function "compute_adv_surrogate" in IPO is" (adv_r - penalty * adv_c) / (1 + penalty)" instead of "adv_r"? And when there are more than one constraint, how can I modify this algorithm? Thanks a lot!!

stvsd1314 · 2023-05-06T01:29:40Z

Sorry, there was a clerical error on it. What I meant was why I had to divide by this term--“ (1 + penalty)”

Gaiejj · 2023-05-06T07:38:10Z

That is a pretty good question. This is because when the penalty value is too large, the update direction will be biased. Our approach can make the training oscillate between two extremes: when penalty=0, the update is equivalent to classical reinforcement learning algorithms such as Policy Gradient or PPO, and when penalty= $+\infty$ , the update will simply minimize the cost.
We will provide the performance curves of IPO on multiple environments as soon as possible to validate our ideas with experimental results. We will also consider your suggestions and conduct experiments with the settings you provided. Thank you again for your feedback, and a Pull Request to implement your ideas is also welcomed.

calico-1226 · 2023-05-08T17:02:39Z

It seems like this issue has been resolved, and I am going to close it now. If you have any other questions, feel free to continue asking.

stvsd1314 added the question Further information is requested label May 5, 2023

Gaiejj self-assigned this May 6, 2023

calico-1226 closed this as completed May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Return of IPO's adv_surrograte is confused #234

[Question] Return of IPO's adv_surrograte is confused #234

stvsd1314 commented May 5, 2023

stvsd1314 commented May 6, 2023

Gaiejj commented May 6, 2023

calico-1226 commented May 8, 2023

[Question] Return of IPO's adv_surrograte is confused #234

[Question] Return of IPO's adv_surrograte is confused #234

Comments

stvsd1314 commented May 5, 2023

Required prerequisites

Questions

stvsd1314 commented May 6, 2023

Gaiejj commented May 6, 2023

calico-1226 commented May 8, 2023