Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rebase CRPO from PPO to TRPO #254

Merged
merged 3 commits into from
Jul 6, 2023
Merged

Conversation

Gaiejj
Copy link
Member

@Gaiejj Gaiejj commented Jul 6, 2023

Description

The original implementation of CRPO uses TRPO as the base algorithm.
We rebase the on-policy version of CRPO, that is, OnCRPO, from PPO to TRPO, and tune the config file.
We evaluate the new implementation in the following environment in SafetyGymnasium:
SafetyCarGoal1-v0
SafetyCarGoal1-v0---OnCRPO

SafetyPointGoal1-v0
SafetyPointGoal1-v0---OnCRPO

SafetyAntVelocity-v1
SafetyAntVelocity-v1---OnCRPO

SafetyHalfCheetahVelocity-v1
SafetyHalfCheetahVelocity-v1---OnCRPO

SafetyHopperVelocity-v1
SafetyHopperVelocity-v1---OnCRPO

SafetyHumanoidVelocity-v1
SafetyHumanoidVelocity-v1---OnCRPO

SafetyWalker2dVelocity-v1
SafetyWalker2dVelocity-v1---OnCRPO

SafetySwimmerVelocity-v1
SafetySwimmerVelocity-v1---OnCRPO

Motivation and Context

The performance of CRPO can not reach our expectation and it is different from original implementation.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format. (required)
  • I have checked the code using make lint. (required)
  • I have ensured make test pass. (required)

@codecov-commenter
Copy link

Codecov Report

Merging #254 (4f45f49) into main (af2951d) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head 4f45f49 differs from pull request most recent head cf6fdb3. Consider uploading reports for the commit cf6fdb3 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #254      +/-   ##
==========================================
- Coverage   97.01%   96.98%   -0.03%     
==========================================
  Files         138      138              
  Lines        6989     6990       +1     
==========================================
- Hits         6780     6779       -1     
- Misses        209      211       +2     
Impacted Files Coverage Δ
omnisafe/algorithms/on_policy/primal/crpo.py 91.30% <100.00%> (+0.40%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@muchvo muchvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zmsn-2077 zmsn-2077 merged commit 19f7fc7 into PKU-Alignment:main Jul 6, 2023
4 checks passed
@Gaiejj Gaiejj deleted the dev-crpo branch August 10, 2023 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants