Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about ExternalSource and GPU #2052

Closed
kikoaumond opened this issue Jun 23, 2020 · 6 comments
Closed

Question about ExternalSource and GPU #2052

kikoaumond opened this issue Jun 23, 2020 · 6 comments
Labels
question Further information is requested

Comments

@kikoaumond
Copy link

Hello
I am using
CUDA release 9.1, V9.1.85
Dali 0.21.0

I am trying to use an ExternalSource that does its processing in the GPU.
The documentation says ExetrnalSource supports both CPU and GPU.
But I get this error when I run my pipeline, which suggests GPU is not supported.

File "/home/kikoaumond/.local/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 447, in feed_input
inp = Tensors.TensorListCPU(data, layout)
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. nvidia.dali.backend_impl.TensorListCPU(arg0: buffer, arg1: str)

Moreover, the method documentation in feed_input (see below) says that "In case of GPU external sources, this (data) must be a numpy.ndarray."

which is confusing. Do you mean a CUPY array since I don't see how you can gave a numpy array in a GPU.

So, can you please clarify

  1. Can I have an ExternalSource that provides data via the GPU?
  2. And so, which types are supported in the GPU? PyTorch tensor? CUPY arrays? Both?

Thank you

def feed_input(self, data_node, data, layout=""):
    """Bind a NumPy array (or a list thereof) to an output of ExternalSource.

    Parameters
    ----------
    data_node : :class:`DataNode` or str
        The :class:`DataNode` returned by a call to ExternalSource or a name of the
        :class:`nvidia.dali.ops.ExternalSource`

    data : numpy.ndarray or a list thereof
        The data to be used as the output of the ExternalSource referred to by `data_node`.
        In case of GPU external sources, this must be a ``numpy.ndarray``.

    layout : str
        The description of the data layout (or empty string, if not specified).
        It should be a string of the length that matches the dimensionality of the data, batch
        dimension excluded. For a batch of channel-first images, this should be "CHW", for
        channel-last video it's "FHWC" and so on.
    """
@JanuszL JanuszL added the question Further information is requested label Jun 23, 2020
@JanuszL
Copy link
Contributor

JanuszL commented Jun 23, 2020

Hi,
The documentation states that you can ask the ExternalSource operator to run on either CPU and GPU what means that data it produces will end up in the requested device. Still, in the last official release, only CPU data is accepted as an input, and it is up to the operator to move it to the GPU under the hood.
Just recently at #1997 the ability to provide GPU data through cupy (or any other type that supports cuda_array_interface) was introduced. In #2023 DLPacked data support will be added as well.

@kikoaumond
Copy link
Author

kikoaumond commented Jun 23, 2020

Thank you for clarifying. So, if I do some preprocessing in the GPU in my ExternalSource iterator, I need to move the data back to the CPU before it gets to the ExternalOperator so that ExternalOperator will move it back to the GPU? That adds an expensive round-trip between GPU and CPU. Will PyTorch Tensors in the GPU also be supported in ExternalSource in the near future?

@JanuszL
Copy link
Contributor

JanuszL commented Jun 23, 2020

Hi,
As for now, yes you need to do this redundant round trip, but as said soon GPU input will be supported. According to https://numba.pydata.org/numba-doc/latest/cuda/cuda_array_interface.html#interoperability PyTorch tensor supportscuda_array_interface, also you can make a DLPack from it https://pytorch.org/docs/stable/dlpack.html and DALI will support this as well.

@JanuszL
Copy link
Contributor

JanuszL commented Jul 2, 2020

Hi,
#2023 has been merged as well. So now you can pass GPU data to the ExternalSource and no intermediate buffer is needed at the CPU side in between. You can test these changes using the latest nightly build.

@kikoaumond
Copy link
Author

Thank you. Will this functionality be in Dali versions for CUDA 10.1? I understand PyTorch does not support CUDA 10.2 yet.

@JanuszL
Copy link
Contributor

JanuszL commented Jul 6, 2020

Thank you. Will this functionality be in Dali versions for CUDA 10.1? I understand PyTorch does not support CUDA 10.2 yet.

Hi,
Please use DALI build for CUDA 10.0 to cover any framework build for cuda 10.x.

@JanuszL JanuszL closed this as completed Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants