-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(simmer): update config, benchmark results and code style #280
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Gaiejj
added
enhancement
New feature or request
algorithm
Some issues about algorithm
labels
Oct 8, 2023
Codecov Report
@@ Coverage Diff @@
## main #280 +/- ##
=======================================
Coverage 97.01% 97.01%
=======================================
Files 138 138
Lines 6991 6991
=======================================
Hits 6782 6782
Misses 209 209
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Gaiejj
changed the title
feat(simmer): update simmer config and benchmark results
feat(simmer): update config, benchmark results and code style
Oct 10, 2023
zmsn-2077
approved these changes
Oct 18, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Similar to #274, we summarized our insights as the following:
obs_normalize=True
is critical forTRPOSimmerPID
but not so work forPPOSimmerPID
. We observed thatobs_normalize=True
is inferior inGoal
andCircle
tasks forPPOSimmerPID
.PPOSimmerPID
andTRPOSimmerPID
algorithms exhibited excessive conservatism inGoal
tasks, which is highly related to thealgo_cfgs:unsafe_reward
parameter. We explored values of 0.0, -0.1, -0.2, and -0.5 for this parameter and found that its value significantly affects the algorithm's performance. Lower values ofalgo_cfgs:unsafe_reward
tend to result in more conservative strategies. Taking into account the trade-off between reward and cost, we have temporarily selected -0.2 as a comprehensive value.algo_cfgs:saute_gamma
parameter has a strong influence on the stability of the algorithm. In the case ofGoal
tasks, we have found that a value of 0.9999 performs better than 0.999. This suggests that a higheralgo_cfgs:saute_gamma
value improves the stability of the algorithm forGoal
tasks.Detailedly, we fine-tune the performance of
PPOSimmer
andTRPOSimmer
inSafetyPointGoal1-v0
SafetyCarGoal1-v0
SafetyPointCircle1-v0
SafetyCarCircle1-v0
SafetyPointGoal2-v0
SafetyCarGoal2-v0
SafetyPointCircle2-v0
SafetyCarCircle2-v0
This Pull Request also resolves some code style issue.
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!
make format
. (required)make lint
. (required)make test
pass. (required)