Enable python operators in async pipelines #4965

banasraf · 2023-08-02T08:04:36Z

Category:

New feature

Description:

I enables support for Python operators in pipelines with exec_async=True and exec_pipelined=True.

Additional information:

Affected modules and functionalities:

The most important fix that prevent deadlock that was happening there, is releasing the GIL when waiting for the pipeline run results:

{
    py::gil_scoped_release interpreter_unlock{};
    p->Outputs(&ws);
}

The eliminated the main problem: when a thread was blocked on .Outputs() call, the python interpreter had no way of making it release the GIL. At the same time the executor threads tried to acquire the GIL to run the Python op and that caused the deadlock.

There was also a bunch of other fixes required:

DLTensorFunction desctructor needed to be written manually to acquire the GIL before releasing the python objects
operator input and output objects were added as fields because they need to outlive all the computation scheduled to GPU, so they're deleted only at the beginning of the next run
Before pipeline object is desctructed, the executor needs to be shutdown with the GIL released. That ensures that when we wait for all the scheduled prefetching to finish (which might include running python ops) we don't block the GIL.

Key points relevant for the review:

Tests:

test_dltensor_function, test_python_function_operator, test_gpu_python_function_operator

I modified them to run with async pipelines.

Checklist

Documentation

I leave updating the notebooks for another PR

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

banasraf · 2023-08-02T08:05:11Z

!build

dali-automaton · 2023-08-02T08:10:21Z

CI MESSAGE: [9205091]: BUILD STARTED

JanuszL · 2023-08-02T08:17:03Z

dali/python/backend_impl.cc

+        })
+    // On the pipeline destruction we need to release the GIL to shutdown the executor
+    // This way, Python operators that might be still running do not deadlock
+    .def("Shutdown", [](Pipeline *p) {


I wonder if Shutdown shouldn't be an internal method?

I think we can do it a bit more automatically using this approach.
The init of the Pipeline Python class returns a smart pointer. We can add a custom deleter to it that would release the lock for the time of p->Shutdown(); and then delete the pipeline. I think it is way less error-prone than exposing part of the teardown sequence to Python.

it should be, although i want to call it from the __del__ method of Python pipeline object. I could rename it to _Shutdown, though.

And it is not exposed through the Pyton pipeline wrapper

Ok, that's sounds like a good approach

JanuszL

Great change. I'm very happy to see it!

dali-automaton · 2023-08-02T10:21:32Z

CI MESSAGE: [9205091]: BUILD FAILED

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

JanuszL · 2023-08-04T13:32:41Z

dali/test/python/operator_2/test_python_function.py

@@ -315,13 +305,6 @@ def test_python_operator_invalid_function():
    invalid_pipe.run()


-@raises(TypeError, "do not support multiple input sets")


So it is supported now, right?

Nah, just removed too much tests. Reverted it

mzient · 2023-08-04T14:01:21Z

dali/python/backend_impl.cc

+  PyPipeline(int batch_size, int num_threads, int device_id, int64_t seed,
+             bool pipelined_execution, int prefetch_queue_depth,
+             bool async_execution, size_t bytes_per_sample_hint,
+             bool set_affinity, int max_num_stream,
+             int default_cuda_stream_priority):
+             Pipeline(batch_size, num_threads, device_id, seed, pipelined_execution,
+                      prefetch_queue_depth, async_execution, bytes_per_sample_hint, set_affinity,
+                      max_num_stream, default_cuda_stream_priority) {}
+
+  PyPipeline(string serialized_pipe, int batch_size, int num_threads, int device_id,
+             bool pipelined_execution, int prefetch_queue_depth, bool async_execution,
+             size_t bytes_per_sample_hint, bool set_affinity, int max_num_stream,
+             int default_cuda_stream_priority):
+             Pipeline(serialized_pipe, batch_size, num_threads, device_id, pipelined_execution,
+                      prefetch_queue_depth, async_execution, bytes_per_sample_hint, set_affinity,
+                      max_num_stream, default_cuda_stream_priority) {}


Suggested change

PyPipeline(int batch_size, int num_threads, int device_id, int64_t seed,

bool pipelined_execution, int prefetch_queue_depth,

bool async_execution, size_t bytes_per_sample_hint,

bool set_affinity, int max_num_stream,

int default_cuda_stream_priority):

Pipeline(batch_size, num_threads, device_id, seed, pipelined_execution,

prefetch_queue_depth, async_execution, bytes_per_sample_hint, set_affinity,

max_num_stream, default_cuda_stream_priority) {}

PyPipeline(string serialized_pipe, int batch_size, int num_threads, int device_id,

bool pipelined_execution, int prefetch_queue_depth, bool async_execution,

size_t bytes_per_sample_hint, bool set_affinity, int max_num_stream,

int default_cuda_stream_priority):

Pipeline(serialized_pipe, batch_size, num_threads, device_id, pipelined_execution,

prefetch_queue_depth, async_execution, bytes_per_sample_hint, set_affinity,

max_num_stream, default_cuda_stream_priority) {}

using Pipeline::Pipeline;

mzient · 2023-08-04T14:03:21Z

dali/python/backend_impl.cc

@@ -1588,6 +1588,30 @@ void FeedPipeline(Pipeline *p, const string &name, py::list list, AccessOrder or
  p->SetExternalInput(name, tv, order, sync, use_copy_kernel);
 }

+struct PyPipeline: public Pipeline {


What happend to @JanuszL 's idea of using a custom deleter for a smart pointer? Is there really no other way of doing it than inheritance? It seems rather intrusive.

PyBind doesn't accept it ;)

when we define init method for py:class_ it accepted std::unique_ptr but didn't accpept unique_ptr with a deleter.

I could define py::class_<Pipeline, std::unique_ptr<Pipeline, Deleter>>. It would accept such a ptr in init but, I would have to change the way all the methods are exposed
from

.def("device_id", &Pipeline::device_id)

to

.def("device_id", [](Pipeline *p) { return p->device_id(); })

And that would be a lot of work

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

banasraf · 2023-08-07T12:17:03Z

!build

dali-automaton · 2023-08-07T12:20:12Z

CI MESSAGE: [9267140]: BUILD STARTED

dali-automaton · 2023-08-07T16:36:20Z

CI MESSAGE: [9267140]: BUILD FAILED

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

banasraf · 2023-08-10T08:14:32Z

!build

dali-automaton · 2023-08-10T08:20:25Z

CI MESSAGE: [9316674]: BUILD STARTED

dali-automaton · 2023-08-10T09:37:18Z

CI MESSAGE: [9316674]: BUILD PASSED

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

banasraf added 3 commits August 2, 2023 09:46

Enable Python ops in async pielines

74ff6ff

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

Fix linter

b348e2d

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

Update docstring

cf9dc46

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

JanuszL reviewed Aug 2, 2023

View reviewed changes

jantonguirao assigned mzient and jantonguirao Aug 2, 2023

jantonguirao approved these changes Aug 2, 2023

View reviewed changes

Add Pipeline wrapper with desctructor that releases GIL

d54a871

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

JanuszL reviewed Aug 4, 2023

View reviewed changes

JanuszL approved these changes Aug 4, 2023

View reviewed changes

mzient reviewed Aug 4, 2023

View reviewed changes

Review fixes

01e9470

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

JanuszL approved these changes Aug 7, 2023

View reviewed changes

Fix pipeline shutdown

c170b97

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

Fix hang on shutdown

db80bd3

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

banasraf merged commit 4589b6a into NVIDIA:main Aug 10, 2023
3 checks passed

JanuszL pushed a commit to JanuszL/DALI that referenced this pull request Oct 13, 2023

Enable python operators in async pipelines (NVIDIA#4965)

589afbc

Signed-off-by: Rafal Banas <rbanas@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable python operators in async pipelines #4965

Enable python operators in async pipelines #4965

banasraf commented Aug 2, 2023

banasraf commented Aug 2, 2023

dali-automaton commented Aug 2, 2023

JanuszL Aug 2, 2023

JanuszL Aug 2, 2023

banasraf Aug 2, 2023

banasraf Aug 2, 2023

JanuszL left a comment •

edited

Loading

dali-automaton commented Aug 2, 2023

JanuszL Aug 4, 2023

banasraf Aug 7, 2023

mzient Aug 4, 2023

mzient Aug 4, 2023 •

edited

Loading

banasraf Aug 4, 2023

banasraf Aug 4, 2023

banasraf commented Aug 7, 2023

dali-automaton commented Aug 7, 2023

dali-automaton commented Aug 7, 2023

banasraf commented Aug 10, 2023

dali-automaton commented Aug 10, 2023

dali-automaton commented Aug 10, 2023

		@@ -315,13 +305,6 @@ def test_python_operator_invalid_function():
		invalid_pipe.run()


		@raises(TypeError, "do not support multiple input sets")

Enable python operators in async pipelines #4965

Enable python operators in async pipelines #4965

Conversation

banasraf commented Aug 2, 2023

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

banasraf commented Aug 2, 2023

dali-automaton commented Aug 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL left a comment • edited Loading

Choose a reason for hiding this comment

dali-automaton commented Aug 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

banasraf commented Aug 7, 2023

dali-automaton commented Aug 7, 2023

dali-automaton commented Aug 7, 2023

banasraf commented Aug 10, 2023

dali-automaton commented Aug 10, 2023

dali-automaton commented Aug 10, 2023

JanuszL left a comment •

edited

Loading

mzient Aug 4, 2023 •

edited

Loading