Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device handling in Datasets #105

Closed
C-nit opened this issue Dec 8, 2020 · 3 comments
Closed

Device handling in Datasets #105

C-nit opened this issue Dec 8, 2020 · 3 comments
Labels
enhancement New feature or request wontfix This will not be worked on
Milestone

Comments

@C-nit
Copy link
Member

C-nit commented Dec 8, 2020

@drugilsberg @jannisborn I'm opening this issue to revise and discuss whether there should be any device handling and what the default device or even datatype of items should be. Much of this is motivated by PyTorch Lightning.

First of all, a Dataset is not required at all to return a torch.Tensor when indexed (also briefly discussed in #98), e.g. SmilesDataset returns str. The default torch Dataloader/collate_fn handles np.arrays, strings, even Mappable such as dicts. Actually it handles np.arrays even better than we do (avoiding a copy if possible https://github.com/pytorch/pytorch/blob/b643dbb8a4bf9850171b2de848b6b89206973972/torch/utils/data/_utils/collate.py#L51).

  • Would you agree that we should not cast to tensor if there is no benefit within the dataset itself?

I think our handling of device is uncommon. One would usually only send data to the gpu in advance of the training loop when one knows the entire data to fit in GPU memory, as this avoids any sending of data between devices. This is not the case for us where most of the time we end up calling tensor.to(device) in __getitem__.

I'm assuming now that the data originates from cpu anyway, and we merely have to decide when the transfer to other devices should happen. In the generic case that not all data can be sent to gpu, doing it in __getitem__ has a big overhead versus sending batches. So it should to be done in the training loop. (Lightning does this for us btw)

  • So I argue to not ever set the device parameter in our datasets to use a gpu. I think we should remove any device handling from pytoda.datasets. At least we should set the default to 'cpu'.

Leaving things on the cpu allows also to later send only parts of the data to specific devices, as torch.DistributedSampler does (and again Lightning for us under the hood), allowing multi gpu training. Currently our datasets would send all data to all gpus.
Also the dataloader could prepare on cpu with multiple workers. So instead of worrying about device, we should take care that the datasets do well on cpu and can be used with multiple workers (torch docs issue a warning here not to return cuda tensors!).

  • We should check if we have to do anything to support Dataloader(..., pin_memory=True).

Reading:
https://pytorch-lightning.readthedocs.io/en/stable/multi_gpu.html
https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/
https://pytorch.org/docs/stable/data.html#multi-process-data-loading
https://pytorch.org/docs/stable/data.html#memory-pinning

@C-nit C-nit added this to the 0.3 milestone Dec 8, 2020
@C-nit C-nit added the enhancement New feature or request label Dec 8, 2020
@jannisborn jannisborn added the wontfix This will not be worked on label Jan 3, 2022
@jannisborn
Copy link
Member

I agree on the sentiment of this issue, but I'm closing it due to low priority (#wontfix label).

Indeed, there's a significant overhead when each sample is sent to the gpu in __get_item__. The reason for not following your suggestion is practical. Removing the device option entirely from pytoda.datasets is drastic. Effectively we kill a feature that might be useful to some users. Also, some dataset helpers like StochasticItems can be used independently from datasets but are quite intertwined.
The alternative could be changing the default to cpu. This seems a good idea at first, but for the case where the user wants to use the gpu option we need to ensure that the argument is passed down all the time to each helper function (e.g., transforms like ToTensor). That is totally doable, but it's time-consuming to debug and issues might arise later during training.

I'm not happy with keeping this as is, but due to bandwidth, this would be my go-to.

@YoelShoshan
Copy link
Contributor

@C-nit I completely agree and provide additional strong motivation (heavy under-utilization of GPU)

#155

@C-nit
Copy link
Member Author

C-nit commented May 11, 2022

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants