feat: update saute config and benchmarking results #274
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Update
PPOSaute
andTRPOSaute
algorithms hyper-parameters for the following environments:SafetyCarCircle1-v0
SafetyCarCircle2-v0
SafetyCarGoal1-v0
SafetyCarGoal2-v0
SafetyPointCircle1-v0
SafetyPointCircle2-v0
SafetyPointGoal1-v0
SafetyPointGoal2-v0
We summarized our insights as the following:
obs_normalize=True
is critical forTRPOSaute
but not so work forPPOSaute
. We observed thatobs_normalize=True
is inferior inGoal
tasks forPPOSaute
.PPOSaute
andTRPOSaute
algorithms exhibited excessive conservatism inGoal
tasks, which is highly related to thealgo_cfgs:unsafe_reward
parameter. We explored values of 0.0, -0.1, -0.2, and -0.5 for this parameter and found that its value significantly affects the algorithm's performance. Lower values ofalgo_cfgs:unsafe_reward
tend to result in more conservative strategies. Taking into account the trade-off between reward and cost, we have temporarily selected -0.2 as a comprehensive value.algo_cfgs:saute_gamma
parameter has a strong influence on the stability of the algorithm. In the case ofGoal
tasks, we have found that a value of 0.9999 performs better than 0.999. This suggests that a higheralgo_cfgs:saute_gamma
value improves the stability of the algorithm forGoal
tasks.Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!
make format
. (required)make lint
. (required)make test
pass. (required)