[bug-fix] Separate critic only for PPO #4661

ervteng · 2020-11-17T21:36:18Z

Proposed change(s)

This might be a controversial change, but probably less so with hybrid-actions on the horizon.

In PPO-Torch, we use a separate actor and critic network for continuous actions, and a shared one for discrete. In TF, we always used a separate network. This causes some differences in performance between TF and Torch (particularly with the FoodCollector environment, where the combined network does not have enough capacity for both action and value).

In the principle of least surprise, I think we should change Torch to use a separate network always, at least until TF is deprecated.

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

vincentpierre

Do you plan on adding this to release 10 ?

Separate critic only for PPO

b80fd17

ervteng requested review from andrewcoh and vincentpierre November 17, 2020 21:36

vincentpierre approved these changes Nov 17, 2020

View reviewed changes

ervteng merged commit 7349bcf into master Nov 18, 2020

delete-merged-branch bot deleted the develop-separateonly branch November 18, 2020 00:01

ervteng pushed a commit that referenced this pull request Nov 18, 2020

[bug-fix] Separate critic only for PPO (#4661)

d02be14

ervteng pushed a commit that referenced this pull request Nov 18, 2020

[bug-fix] Separate critic only for PPO (#4661)

5b1bb15

ervteng mentioned this pull request Nov 18, 2020

Cherry-pick separate critic only for PPO (#4661) to Release 10 #4666

Merged

10 tasks

ervteng pushed a commit that referenced this pull request Nov 18, 2020

Cherry-pick separate critic only for PPO (#4661) (#4666)

b0ac32e

github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug-fix] Separate critic only for PPO #4661

[bug-fix] Separate critic only for PPO #4661

Uh oh!

ervteng commented Nov 17, 2020 •

edited

Loading

Uh oh!

vincentpierre left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[bug-fix] Separate critic only for PPO #4661

[bug-fix] Separate critic only for PPO #4661

Uh oh!

Conversation

ervteng commented Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Types of change(s)

Checklist

Other comments

Uh oh!

vincentpierre left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ervteng commented Nov 17, 2020 •

edited

Loading