Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] PPO/TRPO-EarlyTerminated code help #293

Closed
3 tasks done
Obnayuf opened this issue Dec 10, 2023 · 2 comments
Closed
3 tasks done

[Question] PPO/TRPO-EarlyTerminated code help #293

Obnayuf opened this issue Dec 10, 2023 · 2 comments
Labels
question Further information is requested

Comments

@Obnayuf
Copy link

Obnayuf commented Dec 10, 2023

Required prerequisites

Questions

First of all, thank you very much for your codes, they have been instrumental in helping me understand Safe Reinforcement Learning.
I have two questions, the first one is that I observed that the article "Safe Exploration by Solving Early Terminated MDP" uses a Context model in order to ensure the generalization ability of the model in different initial states, but I observed that our PPO/TRPO-EarlyTerminated doesn't have a related implementation, and is it possible for me to ask what is the reason for this? and whether it is possible to introduce network structures such as RNN into omnisafe?
The second question is I would like to ask you how you understand the matter of the difficulty of determining the cost limit in safe reinforcement learning? In practice, I usually use vanilla RL to find the upper limit of the cost limit, and debug a little bit to find the proper value of the cost limit, which is obviously very "unintelligent", and what do you think of paper "Value constrained model-free continuous control"which can auto find proper cost limit in my view?

@Obnayuf Obnayuf added the question Further information is requested label Dec 10, 2023
@Gaiejj
Copy link
Member

Gaiejj commented Dec 10, 2023

The current implementation of OmniSafe does not support a context model. During the implementation process, we focused on formulating an ET-MDP, and per the original paper, Intuitively, solving ET-MDP is similar to solving normal MDPs as there are no constraints that should be considered. Any prevailing algorithm can be applied as ET-MDP solver, such as TD3 , SAC, PPO, TRPO, the context model is merely a solution suitable for ET-MDPs. But I believe incorporating a context model could be very valuable, and we will indeed add it to OmniSafe's to-do list. Thank you for bringing it up.

Indeed, searching for a suitable cost-limit does require a grid search. However, OmniSafe currently allows for some automation of this process. You can use the file examples/run_experiment_grid.py to specify multiple experiments with different cost limits and examples/analyze_experiment_results.py to visualize the results for different cost limits. Your suggestion for a more automated implementation of cost-limit searching will also be considered.

Thank you once again for your insightful proposals.

@Obnayuf
Copy link
Author

Obnayuf commented Dec 10, 2023

thanks for ur reply.

@Obnayuf Obnayuf closed this as completed Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants