Enable python ExternalSource operator for the GPU data #1997

JanuszL · 2020-06-04T20:13:43Z

adds Python side support for GPU data feed to ExterenlSource operator
extends ExternalSource example

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

It adds support a GPU input to te ExternalSource operator via Python API

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
adds Python side support for GPU data feed to ExterenlSource operator
extends ExternalSource example
Affected modules and functionalities:
ExternalSource
Python API
backend_impl
Key points relevant for the review:
NA
Validation and testing:
new CI tests are added
Documentation (including examples):
example is extended

JIRA TASK: [DALI-182]

review-notebook-app · 2020-06-04T20:13:49Z

Check out this pull request on

Review Jupyter notebook visual diffs & provide feedback on notebooks.

Powered by ReviewNB

- adds Python side support for GPU data feed to ExterenlSource operator - extends ExternalSource example Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2020-06-04T20:16:46Z

!build

dali-automaton · 2020-06-04T20:36:30Z

CI MESSAGE: [1371416]: BUILD STARTED

dali-automaton · 2020-06-04T20:52:02Z

CI MESSAGE: [1371416]: BUILD FAILED

JanuszL · 2020-06-04T22:16:40Z

!build

dali-automaton · 2020-06-04T22:20:41Z

CI MESSAGE: [1371753]: BUILD STARTED

dali-automaton · 2020-06-04T23:01:58Z

CI MESSAGE: [1371753]: BUILD FAILED

JanuszL · 2020-06-05T08:25:40Z

!build

dali-automaton · 2020-06-05T08:30:39Z

CI MESSAGE: [1372990]: BUILD STARTED

dali-automaton · 2020-06-05T09:42:02Z

CI MESSAGE: [1372990]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2020-06-05T12:28:15Z

!build

dali-automaton · 2020-06-05T12:30:25Z

CI MESSAGE: [1373346]: BUILD STARTED

dali-automaton · 2020-06-05T14:11:20Z

CI MESSAGE: [1373346]: BUILD FAILED

JanuszL · 2020-06-05T15:42:28Z

!build

dali-automaton · 2020-06-05T15:45:58Z

CI MESSAGE: [1373692]: BUILD STARTED

dali-automaton · 2020-06-05T16:56:25Z

CI MESSAGE: [1373692]: BUILD FAILED

JanuszL · 2020-06-05T19:55:09Z

!builld

JanuszL · 2020-06-05T21:15:38Z

!builld

dali-automaton · 2020-06-17T01:25:29Z

CI MESSAGE: [1401034]: BUILD FAILED

szalpal · 2020-06-17T08:43:46Z

Same can probably happen for GPU data and CPU ExternalSource.
The SetExternalInput is non-blocki, apparently there was idea that the user can reuse the memory by observing the stream he provided by this is not documented (apparently @szalpal can say more about this) - we probably need to highlight that in the doc.

Correct, when user calls daliSetExternalInputAsync and synchronizes on the stream provided there, the memory he provided should not be needed anymore by DALI. But that's rather something obvious, isn't it? I mean, there are in fact 2 functions - sync and async - provided, but that's only for convenience. Stream management here is added, because the copy might be expensive, but user might be able to use few different streams to perform it, to make it faster.

User now has no idea when he can touch his memory again in a safe manner.

Again, I guess that's rather obvious - you can touch the memory after sync function returns. If you feel, that it's not that obvious as I think, let's add this to the docs

JanuszL · 2020-06-17T09:25:33Z

I think there are potential ownership issues if we call SetExternalInput and we deal with anything that does async copy as we synchronize on the event only in run (and we make the stream wait for it). So if we decide to do this for GPU->GPU copy and we change/deallocate the memory before DALI's async copy to ExternalSource queue is finished.

Same can probably happen for GPU data and CPU ExternalSource.
The SetExternalInput is non-blocki, apparently there was idea that the user can reuse the memory by observing the stream he provided by this is not documented (apparently @szalpal can say more about this) - we probably need to highlight that in the doc.

Also, if the user doesn't provide the stream for setting the external input we use some UserStream that is DALI-internal thing. User now has no idea when he can touch his memory again in a safe manner.

One solution would be to make SetExternalInput blocking in such case (till the internal copy is finished) or providing some other mechanism.

I added a sync parameter that is used when user doesn't provide the stream.

dali-automaton · 2020-06-17T10:28:16Z

CI MESSAGE: [1402155]: BUILD STARTED

jantonguirao · 2020-06-17T10:37:54Z

dali/python/backend_impl.cc

@@ -302,7 +311,7 @@ void ExposeTensor(py::module &m) {
      layout : str
            Layout of the data
      device_id: int
-            Device of where this tensor resides
+            Device of where this tensor resides. If no is provided the current device is used.


Suggested change

Device of where this tensor resides. If no is provided the current device is used.

Device of where this tensor resides. If not provided, the current device is used.

jantonguirao · 2020-06-17T10:38:05Z

dali/python/backend_impl.cc

-      device_id: int
-            Device of where this lists of tensors resides
+      device_id : int
+            Device of where this tensor resides. If no is provided the current device is used.


dali-automaton · 2020-06-17T10:46:42Z

CI MESSAGE: [1401034]: BUILD PASSED

klecki · 2020-06-17T10:58:56Z

include/dali/c_api.h

 */
 DLL_PUBLIC void
 daliSetExternalInputAsync(daliPipelineHandle *pipe_handle, const char *name,
                          device_type_t device, const void *data_ptr,
                          dali_data_type_t data_type, const int64_t *shapes,
                          int sample_dim, const char *layout_str,
-                          cudaStream_t stream);
+                          cudaStream_t stream, int sync);


There is a question if we still need both async and sync variants of this function if this can be handled by the parameter.

Sync has own stream, in this variant you still need to provide one.

Ok, makes sense.

dali-automaton · 2020-06-17T11:18:35Z

CI MESSAGE: [1402155]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

klecki

The sync for the SetExternalInput seems to solve the problems, I'm not sure how I feel about the name and behaviour of the blocking ExternalSource parameter, if it won't bring some confusion.

klecki · 2020-06-17T11:13:08Z

include/dali/c_api.h

 */
 DLL_PUBLIC void
 daliSetExternalInputAsync(daliPipelineHandle *pipe_handle, const char *name,
                          device_type_t device, const void *data_ptr,
                          dali_data_type_t data_type, const int64_t *shapes,
                          int sample_dim, const char *layout_str,
-                          cudaStream_t stream);
+                          cudaStream_t stream, int sync);


Ok, makes sense.

klecki · 2020-06-17T11:14:02Z

include/dali/c_api.h

@@ -139,13 +139,14 @@ DLL_PUBLIC void daliDeserializeDefault(daliPipelineHandle *pipe_handle,
 *                   Can be set to NULL.
 * @param stream CUDA stream to use when copying the data onto GPU. Remember to synchronize on the
 *               provided stream.
+ * @param sync If block until data provided is copied to the internal DALI buffer


Suggested change

* @param sync If block until data provided is copied to the internal DALI buffer

* @param sync Whether to block until the provided data is copied to the internal DALI buffer

klecki · 2020-06-17T11:17:15Z

dali/pipeline/operator/builtin/external_source.h

@@ -148,7 +144,7 @@ class ExternalSource : public Operator<Backend> {
      copy_to_storage_event = copy_to_storage_events_.GetEmpty();
    }

-    data.front()->Copy(tl, stream);
+    data.front()->Copy(t, stream);
    if (std::is_same<SrcBackend, GPUBackend>::value) {
      cudaEventRecord(*copy_to_storage_event.front(), stream);


Cool. Thanks for adding the comment.

klecki · 2020-06-17T11:27:00Z

dali/pipeline/operator/builtin/external_source.cc

  {
    std::unique_lock<std::mutex> busy_lock(busy_m_);
-    cv_.wait(busy_lock, [&data = tl_data_]{return !data.IsEmpty();});
+    cv_.wait(busy_lock, [&data = tl_data_, &blocking = blocking_] {


I think it's better to capture the blocking_ as copy instead of reference as it wouldn't change.

klecki · 2020-06-17T11:30:36Z

dali/pipeline/operator/builtin/external_source.cc

  {
    std::unique_lock<std::mutex> busy_lock(busy_m_);
-    cv_.wait(busy_lock, [&data = tl_data_]{return !data.IsEmpty();});
+    cv_.wait(busy_lock, [&data = tl_data_, &blocking = blocking_] {
+        return !(data.IsEmpty() && blocking);


We can also alternatively:

if (blocking) { cv_.wait(busy_lock, [&data = tl_data_]{return !data.IsEmpty();}); } else { // we have the lock, fail if there is no data. }

klecki · 2020-06-17T11:40:36Z

dali/python/backend_impl.cc

@@ -1075,7 +1089,7 @@ PYBIND11_MODULE(backend_impl, m) {
    .def("SetExternalTLInput",
        [](Pipeline *p, const string &name, const TensorList<CPUBackend> &tl,
           py::object /*cuda_stream*/) {
-          p->SetExternalInput(name, tl, 0);
+          p->SetExternalInput(name, tl, 0, true);


I assume it is fine to pass a false here as well (or rather it just doesn't matter)?

But it's more aligned with the docs that claim to be blocking.

It is CPU so it should sync anyway, so it is better to pass the true to be aligned with what happens under the hood.

klecki · 2020-06-17T11:42:46Z

dali/pipeline/operator/builtin/external_source.cu

+    cv_.wait(busy_lock, [&data = tl_data_, &blocking = blocking_] {
+        return !(data.IsEmpty() && blocking);
+      });
+    if (!blocking_ && tl_data_.IsEmpty()) {
+      DALI_FAIL("No data was provided to the ExternalSource. Make sure to feed it properly.");
+    }


Same suggestion as for the CPU.

JanuszL · 2020-06-17T11:54:57Z

The sync for the SetExternalInput seems to solve the problems, I'm not sure how I feel about the name and behaviour of the blocking ExternalSource parameter, if it won't bring some confusion.

I'm open for suggestions regarding the naming.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2020-06-17T14:55:06Z

!build

dali-automaton · 2020-06-17T15:00:26Z

CI MESSAGE: [1402575]: BUILD STARTED

dali-automaton · 2020-06-17T17:48:39Z

CI MESSAGE: [1402575]: BUILD PASSED

- changes introduced by NVIDIA#1997 were not applied to conda based test - ExternalSource jupyter example is extended by GPU case and requires cupy and imageio to run, this PR fixes this Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

- changes introduced by #1997 were not applied to conda based test - ExternalSource jupyter example is extended by GPU case and requires cupy and imageio to run, this PR fixes this Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Enable python ExternalSource operator for the GPU data

9169479

- adds Python side support for GPU data feed to ExterenlSource operator - extends ExternalSource example Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the external_python_gpu branch from b2f3f6d to bb87f0a Compare June 4, 2020 20:16

JanuszL force-pushed the external_python_gpu branch 2 times, most recently from f7893fb to 505712a Compare June 4, 2020 22:16

JanuszL force-pushed the external_python_gpu branch from 505712a to bfd2269 Compare June 5, 2020 08:26

Simplification

c3f60f9

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the external_python_gpu branch 2 times, most recently from 60e4ce0 to 90c80b5 Compare June 5, 2020 12:27

JanuszL force-pushed the external_python_gpu branch 2 times, most recently from 0e0743a to 20113e6 Compare June 5, 2020 14:34

JanuszL force-pushed the external_python_gpu branch from 20113e6 to 3d21b88 Compare June 5, 2020 19:54

JanuszL force-pushed the external_python_gpu branch from 3d21b88 to 6ce08bb Compare June 5, 2020 21:15

JanuszL force-pushed the external_python_gpu branch from c6e202f to fff2fdb Compare June 17, 2020 08:38

JanuszL force-pushed the external_python_gpu branch from fff2fdb to a001296 Compare June 17, 2020 09:05

jantonguirao approved these changes Jun 17, 2020

View reviewed changes

klecki reviewed Jun 17, 2020

View reviewed changes

JanuszL force-pushed the external_python_gpu branch from a001296 to 57d1152 Compare June 17, 2020 11:11

Add blocking option

f17c71a

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the external_python_gpu branch from 57d1152 to f17c71a Compare June 17, 2020 11:20

klecki reviewed Jun 17, 2020

View reviewed changes

klecki approved these changes Jun 17, 2020

View reviewed changes

JanuszL force-pushed the external_python_gpu branch 2 times, most recently from 47f976c to e1efecd Compare June 17, 2020 14:09

More review fixes

d46503b

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL force-pushed the external_python_gpu branch from e1efecd to d46503b Compare June 17, 2020 14:30

mzient approved these changes Jun 17, 2020

View reviewed changes

JanuszL merged commit dd39fa9 into NVIDIA:master Jun 17, 2020

JanuszL deleted the external_python_gpu branch June 17, 2020 19:27

JanuszL mentioned this pull request Jun 23, 2020

Question about ExternalSource and GPU #2052

Closed

JanuszL mentioned this pull request Jun 24, 2020

Fix TL1_jupyter_conda test #2058

Merged

	Device of where this tensor resides. If no is provided the current device is used.
	Device of where this tensor resides. If not provided, the current device is used.

	* @param sync If block until data provided is copied to the internal DALI buffer
	* @param sync Whether to block until the provided data is copied to the internal DALI buffer

Enable python ExternalSource operator for the GPU data #1997

Enable python ExternalSource operator for the GPU data #1997

Conversation

JanuszL commented Jun 4, 2020

Why we need this PR?

What happened in this PR?

review-notebook-app bot commented Jun 4, 2020

JanuszL commented Jun 4, 2020

dali-automaton commented Jun 4, 2020

dali-automaton commented Jun 4, 2020

JanuszL commented Jun 4, 2020

dali-automaton commented Jun 4, 2020

dali-automaton commented Jun 4, 2020

JanuszL commented Jun 5, 2020

dali-automaton commented Jun 5, 2020

dali-automaton commented Jun 5, 2020

JanuszL commented Jun 5, 2020

dali-automaton commented Jun 5, 2020

dali-automaton commented Jun 5, 2020

JanuszL commented Jun 5, 2020

dali-automaton commented Jun 5, 2020

dali-automaton commented Jun 5, 2020

JanuszL commented Jun 5, 2020

JanuszL commented Jun 5, 2020

dali-automaton commented Jun 17, 2020

szalpal commented Jun 17, 2020 • edited Loading

JanuszL commented Jun 17, 2020

dali-automaton commented Jun 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jun 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jun 17, 2020

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Jun 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Jun 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Jun 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Jun 17, 2020 • edited Loading

Choose a reason for hiding this comment

JanuszL commented Jun 17, 2020

JanuszL commented Jun 17, 2020

dali-automaton commented Jun 17, 2020

dali-automaton commented Jun 17, 2020

szalpal commented Jun 17, 2020 •

edited

Loading

JanuszL Jun 17, 2020 •

edited

Loading

JanuszL Jun 17, 2020 •

edited

Loading

JanuszL Jun 17, 2020 •

edited

Loading

JanuszL Jun 17, 2020 •

edited

Loading