Skip to content

Conversation

@ervteng
Copy link
Contributor

@ervteng ervteng commented Nov 17, 2020

Proposed change(s)

This might be a controversial change, but probably less so with hybrid-actions on the horizon.

In PPO-Torch, we use a separate actor and critic network for continuous actions, and a shared one for discrete. In TF, we always used a separate network. This causes some differences in performance between TF and Torch (particularly with the FoodCollector environment, where the combined network does not have enough capacity for both action and value).

In the principle of least surprise, I think we should change Torch to use a separate network always, at least until TF is deprecated.

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan on adding this to release 10 ?

@ervteng ervteng merged commit 7349bcf into master Nov 18, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-separateonly branch November 18, 2020 00:01
ervteng pushed a commit that referenced this pull request Nov 18, 2020
ervteng pushed a commit that referenced this pull request Nov 18, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants