Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Return of IPO's adv_surrograte is confused #234

Closed
3 tasks done
stvsd1314 opened this issue May 5, 2023 · 3 comments
Closed
3 tasks done

[Question] Return of IPO's adv_surrograte is confused #234

stvsd1314 opened this issue May 5, 2023 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@stvsd1314
Copy link

Required prerequisites

Questions

why the return of function "compute_adv_surrogate" in IPO is" (adv_r - penalty * adv_c) / (1 + penalty)" instead of "adv_r"? And when there are more than one constraint, how can I modify this algorithm? Thanks a lot!!

@stvsd1314 stvsd1314 added the question Further information is requested label May 5, 2023
@stvsd1314
Copy link
Author

Sorry, there was a clerical error on it. What I meant was why I had to divide by this term--“ (1 + penalty)”

@Gaiejj Gaiejj self-assigned this May 6, 2023
@Gaiejj
Copy link
Member

Gaiejj commented May 6, 2023

That is a pretty good question. This is because when the penalty value is too large, the update direction will be biased. Our approach can make the training oscillate between two extremes: when penalty=0, the update is equivalent to classical reinforcement learning algorithms such as Policy Gradient or PPO, and when penalty= $+\infty$ , the update will simply minimize the cost.
We will provide the performance curves of IPO on multiple environments as soon as possible to validate our ideas with experimental results. We will also consider your suggestions and conduct experiments with the settings you provided. Thank you again for your feedback, and a Pull Request to implement your ideas is also welcomed.

@calico-1226
Copy link
Member

It seems like this issue has been resolved, and I am going to close it now. If you have any other questions, feel free to continue asking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants