[bug-fix] Fix issue where NaNs are outputted by the policy when training Match3 #4664

ervteng · 2020-11-18T00:11:47Z

Proposed change(s)

In certain environments (particularly with large action spaces), discrete actions can become NaN occasionally. This is especially evident with the Match3 environment.

This PR adds a small epsilon value to any division or log function that may converge very close to zero, particularly in the MultiCategoricalDistribution and CategoricalDistInstance.

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

…fix-nan

chriselion · 2020-11-18T19:20:11Z

ml-agents/mlagents/trainers/torch/distributions.py

-        normalized_probs = raw_probs / torch.sum(raw_probs, dim=-1).unsqueeze(-1)
-        normalized_logits = torch.log(normalized_probs + EPSILON)
-        return normalized_logits
+        # Zero out masked logits, then subtract a large value


Can you link to the paper/blog post that this was adapted from?

…ing Match3 (#4664) * match3 settings * Add epsilon to log * Add another epsilon * Revert match3 configs * NaN-free masking method * Add comment for paper * Add comment for paper Co-authored-by: Chris Elion <chris.elion@unity3d.com>

* match3 settings * Add epsilon to log * Add another epsilon * Revert match3 configs * NaN-free masking method * Add comment for paper * Add comment for paper Co-authored-by: Chris Elion <chris.elion@unity3d.com> Co-authored-by: Chris Elion <chris.elion@unity3d.com>

Chris Elion and others added 5 commits November 16, 2020 17:16

match3 settings

6d729a0

Add epsilon to log

1901269

Merge commit '6d729a0a2b2ba1fc946720cdb7871c9be3e38d45' into develop-…

d4930a2

…fix-nan

Add another epsilon

6e29be1

Revert match3 configs

7b7dbf9

ervteng requested a review from chriselion November 18, 2020 00:13

chriselion approved these changes Nov 18, 2020

View reviewed changes

NaN-free masking method

22800d7

vincentpierre approved these changes Nov 18, 2020

View reviewed changes

chriselion reviewed Nov 18, 2020

View reviewed changes

Ervin Teng added 2 commits November 18, 2020 11:49

Add comment for paper

eafb0e8

Add comment for paper

9de1525

ervteng merged commit 993f822 into master Nov 18, 2020

delete-merged-branch bot deleted the develop-fix-nan-merge branch November 18, 2020 22:30

ervteng mentioned this pull request Nov 18, 2020

Cherry-pick NaN fix #4664 #4669

Merged

10 tasks

github-actions bot locked as resolved and limited conversation to collaborators Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug-fix] Fix issue where NaNs are outputted by the policy when training Match3 #4664

[bug-fix] Fix issue where NaNs are outputted by the policy when training Match3 #4664

Uh oh!

ervteng commented Nov 18, 2020 •

edited

Loading

Uh oh!

chriselion Nov 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[bug-fix] Fix issue where NaNs are outputted by the policy when training Match3 #4664

[bug-fix] Fix issue where NaNs are outputted by the policy when training Match3 #4664

Uh oh!

Conversation

ervteng commented Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Types of change(s)

Checklist

Other comments

Uh oh!

chriselion Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ervteng commented Nov 18, 2020 •

edited

Loading