about P3O algorithms #320

Eureke01 · 2024-04-12T08:15:39Z

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

I didn't find any update process for kappa while I was learning the p3o algorithm. Is there an update process in /omnisafe/omnisafe/algorithms/on_policy/penalty_function p3o.py?

Gaiejj · 2024-04-12T11:45:28Z

In the method description on page 3747, in the lower-left corner of the Penalized Proximal Policy Optimization for Safe Reinforcement Learning (P3O) paper, the author mentions two implementation methods of P3O and considers both methods to be very effective in implementation.

Method 1 (from P3O paper)

As shown in Algorithm 2, we increase κ at every time step, and the early stopping condition is fulflled when the distance between solutions of two adjacent steps is small enough or the current policy is out of the trust region.

Method 2 (from P3O paper)

In practice, we utilize the normalization trick that maps the advantage estimation to an approximate standard normal distribution regardless of the tasks themselves. We fnd this technique enables a fxed κ for general good results across different tasks.

Authors Statement (from P3O paper)

Experimental results show that both of above algorithms work effectively and the learning processes are stable in a wide range of κ.

OmniSafe has adopted the second implementation method.

Eureke01 added the question Further information is requested label Apr 12, 2024

Eureke01 closed this as completed Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about P3O algorithms #320

about P3O algorithms #320

Eureke01 commented Apr 12, 2024

Gaiejj commented Apr 12, 2024

about P3O algorithms #320

about P3O algorithms #320

Comments

Eureke01 commented Apr 12, 2024

Required prerequisites

Questions

Gaiejj commented Apr 12, 2024