[bug-fix] Add clipping to PyTorch policy, fix initialization #4649

ervteng · 2020-11-13T02:44:40Z

Proposed change(s)

This PR is motivated by the perpetually lower resulting rewards that Torch PPO gets on ragdoll environments, especially Walker. It addresses two differences between Torch and TF PPO:

Torch layers were initialized using Kaiming Normal, but PyTorch by default assumes that you're using Leaky ReLU and adds a default gain of sqrt(2). This PR changes the nonlinearity assumed in the initializer to linear, removing that extra factor of sqrt(2). This initialization was the main cause of wildly higher value losses in Torch.
In TensorFlow, when not using tanh squashing on Gaussian outputs (PPO), we truncate the output by clip(action, -3, 3) / 3. This PR adds the same thing to Torch.

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

This reverts commit a708af6.

ml-agents/mlagents/trainers/ppo/optimizer_torch.py

vincentpierre

Don't forget to cherry pick to release_10_branch after merge.

…4662)

Ervin Teng added 18 commits November 9, 2020 17:48

Add clipping to Torch

358b745

Invert divide by 3 in log prob

a708af6

Test commit

495a1a5

Undo test commit

154c149

Take mean of continuous entropy

8b37958

Proper dimensions

0399ae9

Merge branch 'develop-contentropy' into develop-torch-clip

9bbac64

Remove clip

f3d4efd

Use real clipping (as in TF)

05da1b0

Change num envs

7991614

Use lower scaling value

06731c0

Merge branch 'develop-torchcrawlerdebug' into develop-torch-clip

f3aa873

Use linear gain for KaimingHe

c0ddb62

Improve comment

32794bc

Double policy loss for no reason

7e16190

Add clip to export and make optional in policy

a648248

Merge branch 'master' into develop-torch-clip

50005a6

Revert "Invert divide by 3 in log prob"

20d7285

This reverts commit a708af6.

ervteng commented Nov 13, 2020

View reviewed changes

ml-agents/mlagents/trainers/ppo/optimizer_torch.py Outdated Show resolved Hide resolved

Ervin Teng added 3 commits November 12, 2020 20:30

Increase initialization

9d5e2e4

Bigger scale

548e68a

Decrease kernel gain

5978b74

CubeMD mentioned this pull request Nov 15, 2020

Match3 hyperparameters adjustments #4641

Closed

5 tasks

ervteng marked this pull request as ready for review November 16, 2020 18:17

ervteng requested a review from vincentpierre November 16, 2020 18:20

vincentpierre approved these changes Nov 16, 2020

View reviewed changes

Ervin Teng added 4 commits November 16, 2020 17:42

kernel gain float again

c88c22b

Fix discrete

f9d606d

Increase gain on visual encoders

14f8ea1

Fix some tests

019ac81

More gain changes, fix tests

531f844

ervteng merged commit d4e0dae into master Nov 17, 2020

delete-merged-branch bot deleted the develop-torch-clip branch November 17, 2020 22:02

ervteng pushed a commit that referenced this pull request Nov 17, 2020

[bug-fix] Add clipping to PyTorch policy, fix initialization (#4649)

5c5c42d

ervteng mentioned this pull request Nov 17, 2020

Cherry-pick clipping and initialization changes to release 10 #4662

Merged

10 tasks

ervteng pushed a commit that referenced this pull request Nov 18, 2020

[bug-fix] Add clipping to PyTorch policy, fix initialization (#4649) (#…

60ae629

…4662)

github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug-fix] Add clipping to PyTorch policy, fix initialization #4649

[bug-fix] Add clipping to PyTorch policy, fix initialization #4649

Uh oh!

ervteng commented Nov 13, 2020

Uh oh!

Uh oh!

vincentpierre left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[bug-fix] Add clipping to PyTorch policy, fix initialization #4649

[bug-fix] Add clipping to PyTorch policy, fix initialization #4649

Uh oh!

Conversation

ervteng commented Nov 13, 2020

Proposed change(s)

Types of change(s)

Checklist

Other comments

Uh oh!

Uh oh!

vincentpierre left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants