Add torch_utils class, auto-detect CUDA availability by ervteng · Pull Request #4403 · Unity-Technologies/ml-agents

ervteng · 2020-08-21T00:39:41Z

Proposed change(s)

This PR adds a torch_utils class (similiar to tf_utils) and requires importing torch from there. This lets us do a couple things, the first being detect if CUDA is available and set the default tensor type appropriately. Requiring importing torch from here ensures that this will be set before any torch functions are used at all. The PR also adds an is_available() method to torch_utils, and throws a nicer error if torch isn't available when you passed --torch.

In the future we can also set the number of torch threads, etc. from here.

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

ervteng · 2020-08-21T02:08:11Z

.pylintrc

 # Add files or directories to the ignore list. They should be base names, not
 # paths.
 ignore=CVS
+generated-members=torch.*


I'm pretty sure this isn't the best way to do this, but without it, pylint will complain about torch not having the right members everywhere. I believe this is because torch could be None, though it never actually happens.

chriselion · 2020-08-21T05:06:56Z

setup.cfg


 banned-modules = tensorflow = use mlagents.tf_utils instead (it handles tf2 compat).
                 logging = use mlagents_envs.logging_util instead
+                 torch = use mlagents.torch_utils istead (handles GPU detection).


Suggested change

torch = use mlagents.torch_utils istead (handles GPU detection).

torch = use mlagents.torch_utils instead (handles GPU detection).

vincentpierre

Looks good to me, but I would like ml-agents/mlagents/torch_utils/torch.py to contain a comment stating that the file is temporary and will be removed once torch is required. (To avoid adding functionality to this file in the mean time)

ervteng · 2020-08-21T18:01:26Z

Looks good to me, but I would like ml-agents/mlagents/torch_utils/torch.py to contain a comment stating that the file is temporary and will be removed once torch is required. (To avoid adding functionality to this file in the mean time)

The file isn't temporary (will still be needed for GPU detection) but yeah the try/except is. Added comment.

dongruoping · 2020-08-24T20:34:12Z

Only tensors are created on GPU but the models are still on CPU.
Should it also be addressed in this PR?

dongruoping · 2020-08-25T21:02:57Z

Another point missing here is that when running Torch with CuDNN, if we want to get completely reproducible results, we'll also need to set these besides setting seeds for torch and numpy:

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

while these could potentially hurt the running performance.

ervteng · 2020-08-26T22:41:41Z

Another point missing here is that when running Torch with CuDNN, if we want to get completely reproducible results, we'll also need to set these besides setting seeds for torch and numpy:
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
while these could potentially hurt the running performance.

How much of a performance hit do we expect by setting these flags?

dongruoping · 2020-08-26T22:44:35Z

How much of a performance hit do we expect by setting these flags?

Not sure, it could be case by case, depending on how much the model changes during training and how much optimization cudnn does.
I haven't really try to quantify the impact, but it's a general warning on PyTorch doc.

dongruoping · 2020-08-27T19:31:03Z

ml-agents/mlagents/trainers/policy/torch_policy.py: line 61 should be removed. It reset the default tensor type to non-cuda, which is already properly set in utils .

I tested with removing that line. It works well.

Ervin Teng added 4 commits August 20, 2020 17:31

Add torch_utils

76cb4e9

Use torch from torch_utils

65679e1

Add torch to banned modules in CI

b96d696

Better import error handling

4e4a474

ervteng requested review from chriselion and vincentpierre August 21, 2020 00:39

ervteng changed the title ~~[feature] Add torch_utils class, auto-detect CUDA availability~~ Add torch_utils class, auto-detect CUDA availability Aug 21, 2020

Fix flake8 errors

3504bbd

ervteng commented Aug 21, 2020

View reviewed changes

chriselion reviewed Aug 21, 2020

View reviewed changes

chriselion approved these changes Aug 21, 2020

View reviewed changes

vincentpierre approved these changes Aug 21, 2020

View reviewed changes

Address comments

c2b9504

Ervin Teng added 5 commits August 26, 2020 16:20

Move networks to GPU if enabled

f7082cf

Merge branch 'master' into develop-torch_utils

e3aa8c4

Switch to torch_utils

3fae90b

More flake8 problems

9f4cde6

Move reward providers to GPU/CPU

a6b856e

ervteng requested a review from dongruoping August 27, 2020 18:28

Ervin Teng added 4 commits August 27, 2020 15:05

Remove anothere set default tensor

343ce5e

Merge branch 'master' into develop-torch_utils

db30755

Merge branch 'master' into develop-torch_utils

92f1fb3

Fix banned import in test

7bd90e0

ervteng merged commit 084d1c8 into master Aug 28, 2020

delete-merged-branch bot deleted the develop-torch_utils branch August 28, 2020 00:36

github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add torch_utils class, auto-detect CUDA availability #4403

Add torch_utils class, auto-detect CUDA availability #4403
ervteng merged 15 commits intomasterfrom
develop-torch_utils

ervteng commented Aug 21, 2020 •

edited

Loading

Uh oh!

ervteng Aug 21, 2020

Uh oh!

chriselion Aug 21, 2020

Uh oh!

vincentpierre left a comment

Uh oh!

ervteng commented Aug 21, 2020

Uh oh!

dongruoping commented Aug 24, 2020

Uh oh!

dongruoping commented Aug 25, 2020

Uh oh!

ervteng commented Aug 26, 2020

Uh oh!

dongruoping commented Aug 26, 2020 •

edited

Loading

Uh oh!

dongruoping commented Aug 27, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	torch = use mlagents.torch_utils istead (handles GPU detection).
	torch = use mlagents.torch_utils instead (handles GPU detection).

Comments

Conversation

ervteng commented Aug 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed change(s)

Types of change(s)

Checklist

Other comments

Uh oh!

ervteng Aug 21, 2020

Choose a reason for hiding this comment

Uh oh!

chriselion Aug 21, 2020

Choose a reason for hiding this comment

Uh oh!

vincentpierre left a comment

Choose a reason for hiding this comment

Uh oh!

ervteng commented Aug 21, 2020

Uh oh!

dongruoping commented Aug 24, 2020

Uh oh!

dongruoping commented Aug 25, 2020

Uh oh!

ervteng commented Aug 26, 2020

Uh oh!

dongruoping commented Aug 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongruoping commented Aug 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ervteng commented Aug 21, 2020 •

edited

Loading

dongruoping commented Aug 26, 2020 •

edited

Loading

dongruoping commented Aug 27, 2020 •

edited

Loading