Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(simmer): update config, benchmark results and code style #280

Merged
merged 14 commits into from
Oct 18, 2023

Conversation

Gaiejj
Copy link
Member

@Gaiejj Gaiejj commented Oct 8, 2023

Description

Similar to #274, we summarized our insights as the following:

  • obs_normalize=True is critical for TRPOSimmerPID but not so work for PPOSimmerPID. We observed that obs_normalize=True is inferior in Goal and Circle tasks for PPOSimmerPID.
  • The previous PPOSimmerPID and TRPOSimmerPID algorithms exhibited excessive conservatism in Goal tasks, which is highly related to the algo_cfgs:unsafe_reward parameter. We explored values of 0.0, -0.1, -0.2, and -0.5 for this parameter and found that its value significantly affects the algorithm's performance. Lower values of algo_cfgs:unsafe_reward tend to result in more conservative strategies. Taking into account the trade-off between reward and cost, we have temporarily selected -0.2 as a comprehensive value.
  • algo_cfgs:saute_gamma parameter has a strong influence on the stability of the algorithm. In the case of Goal tasks, we have found that a value of 0.9999 performs better than 0.999. This suggests that a higher algo_cfgs:saute_gamma value improves the stability of the algorithm for Goal tasks.

Detailedly, we fine-tune the performance of PPOSimmer and TRPOSimmer in

  • SafetyPointGoal1-v0
    simmer_pointgoal1_1e7
  • SafetyCarGoal1-v0
    simmer_cargoal1_1e7
  • SafetyPointCircle1-v0
    simmer_pointcircle1_1e7
  • SafetyCarCircle1-v0
    simmer_carcircle1_1e7
  • SafetyPointGoal2-v0
    simmer_pointgoal2_1e7
  • SafetyCarGoal2-v0
    simmer_cargoal2_1e7
  • SafetyPointCircle2-v0
    simmer_pointcircle2_1e7
  • SafetyCarCircle2-v0
    simmer_carcircle2_1e7

This Pull Request also resolves some code style issue.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format. (required)
  • I have checked the code using make lint. (required)
  • I have ensured make test pass. (required)

@Gaiejj Gaiejj added enhancement New feature or request algorithm Some issues about algorithm labels Oct 8, 2023
@codecov
Copy link

codecov bot commented Oct 8, 2023

Codecov Report

Merging #280 (608173c) into main (c575fd5) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 608173c differs from pull request most recent head 9ad1ff2. Consider uploading reports for the commit 9ad1ff2 to get more accurate results

@@           Coverage Diff           @@
##             main     #280   +/-   ##
=======================================
  Coverage   97.01%   97.01%           
=======================================
  Files         138      138           
  Lines        6991     6991           
=======================================
  Hits         6782     6782           
  Misses        209      209           
Files Coverage Δ
omnisafe/adapter/simmer_adapter.py 100.00% <100.00%> (ø)
omnisafe/algorithms/model_based/base/ensemble.py 94.86% <100.00%> (ø)
omnisafe/algorithms/offline/vae_bc.py 100.00% <100.00%> (ø)
omnisafe/common/normalizer.py 98.57% <100.00%> (-0.02%) ⬇️
omnisafe/models/actor/actor_builder.py 96.30% <100.00%> (+0.14%) ⬆️
omnisafe/models/actor/gaussian_sac_actor.py 97.83% <ø> (ø)
omnisafe/models/actor_critic/actor_critic.py 100.00% <100.00%> (ø)
omnisafe/models/actor_critic/actor_q_critic.py 97.73% <100.00%> (-0.05%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Gaiejj Gaiejj marked this pull request as draft October 10, 2023 04:15
@Gaiejj Gaiejj changed the title feat(simmer): update simmer config and benchmark results feat(simmer): update config, benchmark results and code style Oct 10, 2023
@Gaiejj Gaiejj marked this pull request as ready for review October 12, 2023 12:01
@Gaiejj Gaiejj merged commit 9d943b6 into PKU-Alignment:main Oct 18, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm Some issues about algorithm enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants