Skip to content

Conversation

@ervteng
Copy link
Contributor

@ervteng ervteng commented Nov 13, 2020

Proposed change(s)

This PR is motivated by the perpetually lower resulting rewards that Torch PPO gets on ragdoll environments, especially Walker. It addresses two differences between Torch and TF PPO:

  1. Torch layers were initialized using Kaiming Normal, but PyTorch by default assumes that you're using Leaky ReLU and adds a default gain of sqrt(2). This PR changes the nonlinearity assumed in the initializer to linear, removing that extra factor of sqrt(2). This initialization was the main cause of wildly higher value losses in Torch.
  2. In TensorFlow, when not using tanh squashing on Gaussian outputs (PPO), we truncate the output by clip(action, -3, 3) / 3. This PR adds the same thing to Torch.

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@CubeMD CubeMD mentioned this pull request Nov 15, 2020
5 tasks
@ervteng ervteng marked this pull request as ready for review November 16, 2020 18:17
Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to cherry pick to release_10_branch after merge.

@ervteng ervteng merged commit d4e0dae into master Nov 17, 2020
@delete-merged-branch delete-merged-branch bot deleted the develop-torch-clip branch November 17, 2020 22:02
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants