Check int overflow/ nan action for torch and add tests #4646

dongruoping · 2020-11-12T23:17:59Z

Proposed change(s)

Add the int overflow/ nan action checks for torch
Use int tensor for global_step for torch: the int32 overflow problem doesn't happen in torch since torch use float tensors by default. But there's a subtle bug in this: when setting the global step to a very large number and then call get_current_step() where the step count is converted to int, there's some numerical error and the result will not be as expected, i.e.

policy.set_step(2 ** 31 - 1)
assert policy.get_current_step() == 2 ** 31 - 1  <---this will fail

This could be a potential bug when trying to resume training from a large step in torch.

Added tests for int overflow check, verified both TF and torch failed on master and passed with fix in this PR

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

Ruo-Ping Dong added 3 commits November 12, 2020 14:01

check nan action for torch

1e87bae

step overflow test

145464a

use int tensor for global step in torch

50e551a

dongruoping requested review from chriselion and vincentpierre November 12, 2020 23:17

chriselion approved these changes Nov 12, 2020

View reviewed changes

dongruoping merged commit bcc0ba0 into MLA-1503-int-overflow-nan Nov 12, 2020

delete-merged-branch bot deleted the MLA-1503-int-overflow-nan-torch branch November 12, 2020 23:35

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check int overflow/ nan action for torch and add tests #4646

Check int overflow/ nan action for torch and add tests #4646

Uh oh!

dongruoping commented Nov 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Check int overflow/ nan action for torch and add tests #4646

Check int overflow/ nan action for torch and add tests #4646

Uh oh!

Conversation

dongruoping commented Nov 12, 2020

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants