refactor: change the file layout of omnisafe #35

zmsn-2077 · 2022-12-09T13:32:19Z

Description

rename algos to algorithms
change common, utils, configs, models to the same level as algorithms
save the configuration file in dict to namedtuple
make omnisafe files pass to pylint, add part of the disable

(omnisafe) zmsn-2077@:~/Documents/omnisafe-dev$ pylint omnisafe
************* Module /home/zmsn-2077/Documents/omnisafe-dev/.pylintrc

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.
I have reformatted the code using make format. (required)
I have checked the code using make lint. (required)
I have ensured make test pass. (required)

Gaiejj

LGTM

friedmainfunction · 2022-12-10T05:38:39Z

I have reviewed this pr in details.
Approve.

docs/source/spelling_wordlist.txt

omnisafe/algorithms/on_policy/cpo.py

omnisafe/algorithms/on_policy/natural_pg.py

omnisafe/algorithms/on_policy/policy_gradient.py

omnisafe/algorithms/on_policy/ppo_lag.py

omnisafe/algorithms/on_policy/trpo.py

omnisafe/algorithms/on_policy/trpo_lag.py

omnisafe/configs/off-policy/DDPG.yaml

omnisafe/models/actor/__init__.py

omnisafe/models/critic/__init__.py

omnisafe/models/critic/q_critic.py

omnisafe/models/actor_critic.py

omnisafe/models/__init__.py

omnisafe/utils/config_utils.py

omnisafe/algorithms/on_policy/pcpo.py

tests/test_model.py

Gaiejj

LGTM

omnisafe/algorithms/env_wrapper.py

omnisafe/algorithms/on_policy/policy_gradient.py

omnisafe/algorithms/on_policy/trpo.py

omnisafe/common/buffer.py

omnisafe/common/lagrange.py

omnisafe/common/logger.py

omnisafe/models/actor/__init__.py

omnisafe/models/critic/__init__.py

omnisafe/models/base.py

omnisafe/models/critic/v_critic.py

omnisafe/models/critic/q_critic.py

omnisafe/algorithms/__init__.py

friedmainfunction · 2022-12-10T12:51:16Z

omnisafe/algorithms/on_policy/focops.py

+            self.lagrangian_multiplier = 2.0
+
+    def compute_loss_pi(self, data: dict):
+        # Policy loss


Suggested change

# Policy loss

"""compute loss for policy"""

friedmainfunction · 2022-12-10T12:55:40Z

omnisafe/algorithms/on_policy/npg_lag.py

        """
        Update actor, critic, running statistics
        """


Suggested change

"""

Update actor, critic, running statistics

"""

"""Update actor, critic, running statistics"""

friedmainfunction · 2022-12-10T13:36:59Z

omnisafe/common/buffer.py

+        """
+        Pre-process data, e.g. standardize observations, rescale rewards if
+            enabled by arguments.
+        """


Suggested change

"""

Pre-process data, e.g. standardize observations, rescale rewards if

enabled by arguments.

"""

"""Pre-process data, e.g. standardize observations, rescale rewards if enabled by arguments."""

Gaiejj · 2022-12-10T14:38:31Z

docs/source/BaseRL/TRPO.rst

@@ -313,7 +313,7 @@ TRPO describes an approximate policy iteration scheme based on the policy improv
 Note that for now, we assume exact evaluation of the advantage values :math:`A^R_{\pi}`.

 It follows from Equation :ref:`(11) <trpo-eq-11>` that TRPO is guaranteed to generate a monotonically improving sequence of policies :math:`J\left(\pi_0\right) \leq J\left(\pi_1\right) \leq J\left(\pi_2\right) \leq \cdots`.
-To see this, let :math:`M_i(\pi)=L_{\pi_i}(\pi)-C D_{\mathrm{KL}}^{\max }\left(\pi_i, \pi\right)`.
+To see this, let :math:`M_i(\pi)=L_{\pi_i}(\pi)-C D_{\mathrm{}}^{\max }\left(\pi_i, \pi\right)`.


friedmainfunction

Approve.

zmsn-2077 added 7 commits December 9, 2022 17:44

refactor: convert dict to namedtuple

c600960

refactor: del model-based and off-policy

d714ec0

refactor: pass pylint tests

273f61c

refactor: fix some pylint disable

5b6311b

refactor: convert disable to disable-next

f24e06e

refactor: change some typo

3734116

refactor: pre-commit fix

99fa7b4

Gaiejj reviewed Dec 10, 2022

View reviewed changes

Gaiejj previously approved these changes Dec 10, 2022

View reviewed changes

friedmainfunction previously approved these changes Dec 10, 2022

View reviewed changes