feat(off-policy): fix final_obsevation setting and support evaluation times configuation #260

Gaiejj · 2023-07-31T15:16:20Z

Description

Motivation and Context

Previously the off-policy algorithms must evaluate once per episode, This pull request would set the times of evaluation as a configuration. If you do not want to evaluate during training process, please set it to 0.
We process the final_observation to address the issue of the end of trajectory. However, not all environments have the key final_observation in the info of step function. So we fix it by a simple if-else judgement.
I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.
I have reformatted the code using make format. (required)
I have checked the code using make lint. (required)
I have ensured make test pass. (required)

muchvo

LGTM.

Gaiejj added 2 commits July 31, 2023 23:07

feat: support evaluation episodes configuration

a90dbfd

fix: fix final observation setting in off-policy algorithms

30d90f8

Gaiejj requested review from XuehaiPan, zmsn-2077 and muchvo as code owners July 31, 2023 15:16

Gaiejj added bug Something isn't working enhancement New feature or request algorithm Some issues about algorithm labels Jul 31, 2023

Gaiejj changed the title ~~Dev meta~~ feat(off-policy): fix final_obsevation setting and support evaluation times configuation Jul 31, 2023

zmsn-2077 approved these changes Jul 31, 2023

View reviewed changes

muchvo approved these changes Aug 1, 2023

View reviewed changes

Gaiejj enabled auto-merge (squash) August 1, 2023 12:23

Gaiejj disabled auto-merge August 1, 2023 12:25

Gaiejj merged commit 9e76d28 into PKU-Alignment:main Aug 1, 2023
4 checks passed

Gaiejj mentioned this pull request Aug 2, 2023

style: update pre-commit.yaml and fix ruff #261

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(off-policy): fix final_obsevation setting and support evaluation times configuation #260

feat(off-policy): fix final_obsevation setting and support evaluation times configuation #260

Gaiejj commented Jul 31, 2023

muchvo left a comment

feat(off-policy): fix final_obsevation setting and support evaluation times configuation #260

feat(off-policy): fix final_obsevation setting and support evaluation times configuation #260

Conversation

Gaiejj commented Jul 31, 2023

Description

Motivation and Context

Types of changes

Checklist

muchvo left a comment

Choose a reason for hiding this comment