Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: change the file layout of omnisafe #35

Merged
merged 12 commits into from
Dec 10, 2022

Conversation

zmsn-2077
Copy link
Member

Description

  1. rename algos to algorithms
  2. change common, utils, configs, models to the same level as algorithms
  3. save the configuration file in dict to namedtuple
  4. make omnisafe files pass to pylint, add part of the disable
(omnisafe) zmsn-2077@:~/Documents/omnisafe-dev$ pylint omnisafe
************* Module /home/zmsn-2077/Documents/omnisafe-dev/.pylintrc

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

image

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format. (required)
  • I have checked the code using make lint. (required)
  • I have ensured make test pass. (required)

Copy link
Member

@Gaiejj Gaiejj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Gaiejj
Gaiejj previously approved these changes Dec 10, 2022
@friedmainfunction
Copy link
Collaborator

I have reviewed this pr in details.
Approve.

tests/test_model.py Outdated Show resolved Hide resolved
Gaiejj
Gaiejj previously approved these changes Dec 10, 2022
Copy link
Member

@Gaiejj Gaiejj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

omnisafe/models/base.py Outdated Show resolved Hide resolved
self.lagrangian_multiplier = 2.0

def compute_loss_pi(self, data: dict):
# Policy loss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Policy loss
"""compute loss for policy"""

Comment on lines 77 to 79
"""
Update actor, critic, running statistics
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Update actor, critic, running statistics
"""
"""Update actor, critic, running statistics"""

Comment on lines +233 to +236
"""
Pre-process data, e.g. standardize observations, rescale rewards if
enabled by arguments.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Pre-process data, e.g. standardize observations, rescale rewards if
enabled by arguments.
"""
"""Pre-process data, e.g. standardize observations, rescale rewards if enabled by arguments."""

@@ -313,7 +313,7 @@ TRPO describes an approximate policy iteration scheme based on the policy improv
Note that for now, we assume exact evaluation of the advantage values :math:`A^R_{\pi}`.

It follows from Equation :ref:`(11) <trpo-eq-11>` that TRPO is guaranteed to generate a monotonically improving sequence of policies :math:`J\left(\pi_0\right) \leq J\left(\pi_1\right) \leq J\left(\pi_2\right) \leq \cdots`.
To see this, let :math:`M_i(\pi)=L_{\pi_i}(\pi)-C D_{\mathrm{KL}}^{\max }\left(\pi_i, \pi\right)`.
To see this, let :math:`M_i(\pi)=L_{\pi_i}(\pi)-C D_{\mathrm{}}^{\max }\left(\pi_i, \pi\right)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing KL

Copy link
Collaborator

@friedmainfunction friedmainfunction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve.

@zmsn-2077 zmsn-2077 merged commit 8ab6fd5 into PKU-Alignment:dev Dec 10, 2022
@zmsn-2077 zmsn-2077 deleted the dev branch December 10, 2022 14:57
muchvo pushed a commit to muchvo/omnisafe that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants