Skip to content

Conversation

@chriselion
Copy link
Contributor

@chriselion chriselion commented Oct 26, 2020

Proposed change(s)

  • Use int64 for tensorflow policy steps, to avoid int32 overflow after 2**31 steps.
  • Also use int64 tensor for torch policy steps. The int32 overflow problem doesn't exist in torch since it use float tensors by default. But if setting the global step to a large number and then call get_current_step() where the step count is converted to int, the conversion will cause numerical error and the result will not be as expected.
  • Check for NaN actions before they get sent back to Unity. Currently, we only check the observations, but if we send NaN actions, we'll very likely get NaN observations and "blame" the observations.
  • After this change, old Tensorflow model will become incompatible and cannot be loaded by new code. Torch is not affected.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

still needs tests and handle torch
action = run_out.get("action")
# Fast NaN check on the action
# See https://stackoverflow.com/questions/6736590/fast-check-for-nan-in-numpy for background.
d = np.sum(action)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few ways to do this; we were using np.dot for observations at one point:

d = np.dot(np_obs, np_obs)

but that morphed to np.mean during some refactoring:
https://github.com/Unity-Technologies/ml-agents/pull/3022/files#diff-614d11115c4d74319295ea34b44a3b7d8ab40b43865c1c3e351a8f6c9602c673R92

np.sum is one of the recommended approaches in the stackoverflow thread.

d = np.sum(action)
has_nan = np.isnan(d)
if has_nan:
raise RuntimeError("NaN action detected.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just raising for now; there's not much the user can do about it. We could use np.nan_to_num but that would probably trigger every step and give bad results.

Chris Elion and others added 2 commits October 26, 2020 17:37
* check nan action for torch

* step overflow test

* use int tensor for global step in torch
Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not forget to cherry pick on release_10 if you want these changes in the next release

@dongruoping dongruoping merged commit 1ee83d8 into master Nov 13, 2020
@delete-merged-branch delete-merged-branch bot deleted the MLA-1503-int-overflow-nan branch November 13, 2020 20:47
dongruoping pushed a commit that referenced this pull request Nov 13, 2020
* use int64 steps

* check for NaN actions

Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com>
vincentpierre pushed a commit that referenced this pull request Nov 16, 2020
* use int64 steps

* check for NaN actions

Co-authored-by: Ruo-Ping Dong <ruoping.dong@unity3d.com>

Co-authored-by: Chris Elion <chris.elion@unity3d.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants