Skip to content

Conversation

@Nic-Ma
Copy link
Contributor

@Nic-Ma Nic-Ma commented Aug 17, 2021

Fixes #2789 .

Description

This PR added the ToDevice transform.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Aug 17, 2021

/black

@Nic-Ma Nic-Ma requested review from ericspod, rijobro and wyli August 17, 2021 03:51
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Aug 17, 2021

/black

@rijobro
Copy link
Contributor

rijobro commented Aug 17, 2021

Hi @Nic-Ma, what's the motivation of this one? I thought we previously said (link) said it wasn't a good idea as the batch wouldn't be contiguously copied to the device?

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Aug 17, 2021

Hi @Nic-Ma, what's the motivation of this one? I thought we previously said (link) said it wasn't a good idea as the batch wouldn't be contiguously copied to the device?

Hi @rijobro ,

Thanks for your review.
With this transform, we can cache the data in GPU directly then avoid duplicated copying from CPU to GPU in every epoch, and execute following transforms with GPU tensor to accelerate the training progress.
I tested locally with the spleen segmentation tutorial, it can help improve the training speed by 30%.
You can check the unit test of ToDeviced, it's a typical usage example.
And I think this PR is a non-breaking enhancement, if you want to move a batch data to GPU together, you still can use the regular method and use DataLoader num_worker > 0.

Thanks.

@Nic-Ma Nic-Ma requested a review from wyli August 17, 2021 12:45
@wyli
Copy link
Contributor

wyli commented Aug 17, 2021

Thanks for the reminder @rijobro #1489 Provides more options such as non-blocking... Shall we consider those? Or reopen the PR and work on that one

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Aug 17, 2021

If you guys think non_blocking is necessary, I can also add it.
Thanks for the great discussion!

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Aug 17, 2021

Hi @wyli @rijobro ,

Do you guys have other concerns or comments here?

Thanks in advance.

@wyli
Copy link
Contributor

wyli commented Aug 17, 2021

If you guys think non_blocking is necessary, I can also add it.
Thanks for the great discussion!

I'm not familiar with this option, @dongyang0122 any idea?

@dongyang0122
Copy link
Collaborator

If you guys think non_blocking is necessary, I can also add it.
Thanks for the great discussion!

I'm not familiar with this option, @dongyang0122 any idea?

The transform is helpful and necessary to enable GPU-based data pre-processing or augmentation. The computing efficiency will be greatly improved.

@wyli
Copy link
Contributor

wyli commented Aug 17, 2021

thanks @dongyang0122 , the .to() API of pytorch has a non_blocking=False by default, https://pytorch.org/docs/stable/generated/torch.Tensor.to.html do you have experience with this flag?

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Aug 17, 2021

Hi @wyli ,

As the doc said: "When non_blocking, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor. "
Maybe let me add a **kwargs to this ToDevice transform?

Thanks.

@wyli
Copy link
Contributor

wyli commented Aug 17, 2021

thanks @Nic-Ma, sounds good

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Aug 17, 2021

/black

@Nic-Ma Nic-Ma enabled auto-merge (squash) August 17, 2021 22:39
@Nic-Ma Nic-Ma merged commit 28856b8 into Project-MONAI:dev Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ToDevice transform to execute transform logic on GPU

4 participants