Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about P3O algorithms #320

Closed
2 of 3 tasks
Eureke01 opened this issue Apr 12, 2024 · 1 comment
Closed
2 of 3 tasks

about P3O algorithms #320

Eureke01 opened this issue Apr 12, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Eureke01
Copy link

Required prerequisites

Questions

I didn't find any update process for kappa while I was learning the p3o algorithm. Is there an update process in /omnisafe/omnisafe/algorithms/on_policy/penalty_function p3o.py?
kappa

@Eureke01 Eureke01 added the question Further information is requested label Apr 12, 2024
@Gaiejj
Copy link
Member

Gaiejj commented Apr 12, 2024

In the method description on page 3747, in the lower-left corner of the Penalized Proximal Policy Optimization for Safe Reinforcement Learning (P3O) paper, the author mentions two implementation methods of P3O and considers both methods to be very effective in implementation.

  • Method 1 (from P3O paper)

As shown in Algorithm 2, we increase κ at every time step, and the early stopping condition is fulflled when the distance between solutions of two adjacent steps is small enough or the current policy is out of the trust region.

  • Method 2 (from P3O paper)

In practice, we utilize the normalization trick that maps the advantage estimation to an approximate standard normal distribution regardless of the tasks themselves. We fnd this technique enables a fxed κ for general good results across different tasks.

  • Authors Statement (from P3O paper)

Experimental results show that both of above algorithms work effectively and the learning processes are stable in a wide range of κ.

OmniSafe has adopted the second implementation method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants