Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update saute config and benchmarking results #274

Merged
merged 3 commits into from
Sep 13, 2023

Conversation

Gaiejj
Copy link
Member

@Gaiejj Gaiejj commented Sep 4, 2023

Description

Update PPOSaute and TRPOSaute algorithms hyper-parameters for the following environments:

  • SafetyCarCircle1-v0
  • SafetyCarCircle2-v0
  • SafetyCarGoal1-v0
  • SafetyCarGoal2-v0
  • SafetyPointCircle1-v0
  • SafetyPointCircle2-v0
  • SafetyPointGoal1-v0
  • SafetyPointGoal2-v0

We summarized our insights as the following:

  • obs_normalize=True is critical for TRPOSaute but not so work for PPOSaute. We observed that obs_normalize=True is inferior in Goal tasks for PPOSaute.
  • The previous PPOSaute and TRPOSaute algorithms exhibited excessive conservatism in Goal tasks, which is highly related to the algo_cfgs:unsafe_reward parameter. We explored values of 0.0, -0.1, -0.2, and -0.5 for this parameter and found that its value significantly affects the algorithm's performance. Lower values of algo_cfgs:unsafe_reward tend to result in more conservative strategies. Taking into account the trade-off between reward and cost, we have temporarily selected -0.2 as a comprehensive value.
  • algo_cfgs:saute_gamma parameter has a strong influence on the stability of the algorithm. In the case of Goal tasks, we have found that a value of 0.9999 performs better than 0.999. This suggests that a higher algo_cfgs:saute_gamma value improves the stability of the algorithm for Goal tasks.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format. (required)
  • I have checked the code using make lint. (required)
  • I have ensured make test pass. (required)

@Gaiejj Gaiejj added enhancement New feature or request feature Something related to new features algorithm Some issues about algorithm labels Sep 4, 2023
@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Merging #274 (a107b74) into main (db34e2c) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head a107b74 differs from pull request most recent head 91e358a. Consider uploading reports for the commit 91e358a to get more accurate results

@@            Coverage Diff             @@
##             main     #274      +/-   ##
==========================================
+ Coverage   96.98%   97.01%   +0.03%     
==========================================
  Files         138      138              
  Lines        6991     6991              
==========================================
+ Hits         6780     6782       +2     
+ Misses        211      209       -2     

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Gaiejj Gaiejj marked this pull request as ready for review September 13, 2023 04:11
@Gaiejj Gaiejj merged commit c575fd5 into PKU-Alignment:main Sep 13, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm Some issues about algorithm enhancement New feature or request feature Something related to new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants