[Question] PPO/TRPO-EarlyTerminated code help #293

Obnayuf · 2023-12-10T09:41:15Z

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

First of all, thank you very much for your codes, they have been instrumental in helping me understand Safe Reinforcement Learning.
I have two questions, the first one is that I observed that the article "Safe Exploration by Solving Early Terminated MDP" uses a Context model in order to ensure the generalization ability of the model in different initial states, but I observed that our PPO/TRPO-EarlyTerminated doesn't have a related implementation, and is it possible for me to ask what is the reason for this? and whether it is possible to introduce network structures such as RNN into omnisafe?
The second question is I would like to ask you how you understand the matter of the difficulty of determining the cost limit in safe reinforcement learning? In practice, I usually use vanilla RL to find the upper limit of the cost limit, and debug a little bit to find the proper value of the cost limit, which is obviously very "unintelligent", and what do you think of paper "Value constrained model-free continuous control"which can auto find proper cost limit in my view?

Gaiejj · 2023-12-10T12:37:21Z

The current implementation of OmniSafe does not support a context model. During the implementation process, we focused on formulating an ET-MDP, and per the original paper, Intuitively, solving ET-MDP is similar to solving normal MDPs as there are no constraints that should be considered. Any prevailing algorithm can be applied as ET-MDP solver, such as TD3 , SAC, PPO, TRPO, the context model is merely a solution suitable for ET-MDPs. But I believe incorporating a context model could be very valuable, and we will indeed add it to OmniSafe's to-do list. Thank you for bringing it up.

Indeed, searching for a suitable cost-limit does require a grid search. However, OmniSafe currently allows for some automation of this process. You can use the file examples/run_experiment_grid.py to specify multiple experiments with different cost limits and examples/analyze_experiment_results.py to visualize the results for different cost limits. Your suggestion for a more automated implementation of cost-limit searching will also be considered.

Thank you once again for your insightful proposals.

Obnayuf · 2023-12-10T12:45:35Z

thanks for ur reply.

Obnayuf added the question Further information is requested label Dec 10, 2023

Obnayuf closed this as completed Dec 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] PPO/TRPO-EarlyTerminated code help #293

[Question] PPO/TRPO-EarlyTerminated code help #293

Obnayuf commented Dec 10, 2023

Gaiejj commented Dec 10, 2023 •

edited

Obnayuf commented Dec 10, 2023

[Question] PPO/TRPO-EarlyTerminated code help #293

[Question] PPO/TRPO-EarlyTerminated code help #293

Comments

Obnayuf commented Dec 10, 2023

Required prerequisites

Questions

Gaiejj commented Dec 10, 2023 • edited

Obnayuf commented Dec 10, 2023

Gaiejj commented Dec 10, 2023 •

edited