Add direct operator calls in debug mode #3734

ksztenderski · 2022-03-15T09:06:05Z

Category:

New feature (non-breaking change which adds functionality)

Description:

It adds backend implementation of direct operators and replaces minipipelines in debug mode with direct operators.

Additional information:

Affected modules and functionalities:

Debug mode pipeline

Key points relevant for the review:

WIP, no review needed yet.

Checklist

Tests

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

JanuszL · 2022-03-15T11:01:17Z

dali/pipeline/operator/direct_operator.h

+    const std::unordered_map<std::string, std::shared_ptr<TensorList<CPUBackend>>> &kwargs,
+    cudaStream_t cuda_stream) {
+  ws.set_stream(cuda_stream);
+  CUDA_CALL(cudaStreamSynchronize(cuda_stream));


Long term we can make this synchronization optional.

Why would we ever need to synchronize before?

Removed the synchronization before the run and left only synchronization after. For now (especially in terms of debug mode) it shouldn't bother us but in the future it seems unnecessary to synchronize immediately after the run and only synchronize when we actually need this data (that synchronization before the run was kind of supposed to show that concept).

dali/python/backend_impl.cc

* Basic implementation of calls in debug mode * Basic exposing of direct operators to ops.experimental * TODO: Create debug pipeline class in C++ to keep thread pool Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Moved workspace clear before setting cuda stream Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

lgtm-com · 2022-03-16T16:54:39Z

This pull request fixes 1 alert when merging 596bafe into 02d04aa - view on LGTM.com

fixed alerts:

1 for Unused import

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

ksztenderski · 2022-03-16T18:25:42Z

!build

dali-automaton · 2022-03-16T18:30:51Z

CI MESSAGE: [4164026]: BUILD STARTED

lgtm-com · 2022-03-16T19:12:10Z

This pull request fixes 1 alert when merging cc7a4dc into fd6a8b9 - view on LGTM.com

fixed alerts:

1 for Unused import

dali-automaton · 2022-03-16T20:11:02Z

CI MESSAGE: [4164026]: BUILD FAILED

mzient · 2022-03-17T08:53:26Z

dali/pipeline/operator/direct_operator.h

+ * @brief Direct operator providing eager execution of an operator in Run.
+ */
+template <typename Backend>
+class DLL_PUBLIC DirectOperator {


Suggested change

class DLL_PUBLIC DirectOperator {

class DLL_PUBLIC ImmediateOperator {

or

Suggested change

class DLL_PUBLIC DirectOperator {

class DLL_PUBLIC EagerOperator {

?
"Direct" is doesn't really convey this meaning. It's an merely an abbreviation for "DirectlyCalledOperator" which is lengthy.

done, went with EagerOperator

mzient · 2022-03-17T09:50:33Z

dali/pipeline/operator/direct_operator.h

+  CUDA_CALL(cudaStreamSynchronize(cuda_stream));
+  auto output = RunImpl<GPUBackend, GPUBackend, TensorList<GPUBackend>, TensorList<GPUBackend>>(
+      inputs, kwargs);
+  CUDA_CALL(cudaStreamSynchronize(cuda_stream));


We could get rid of this one, too - we could (and should) expose associated stream in TensorXxxGPU in Python and just tell the user that the data is available for that stream. We can (and already do) synchronize D2H copies.

Creating API for stream exposure seems like a good idea for a follow-up as it'll probably generate a lot of code and is not really that necessary for the debug mode. But fully agree with support for it in terms of eager operators.

JanuszL · 2022-03-17T12:09:34Z

dali/pipeline/operator/direct_operator.h

+template <typename Backend>
+std::shared_ptr<TensorList<Backend>> AsTensorList(std::shared_ptr<TensorList<Backend>> input) {
+  return input;
+}
+
+template <typename Backend>
+std::shared_ptr<TensorList<Backend>> AsTensorList(std::shared_ptr<TensorVector<Backend>> input) {
+  // TODO(ksztenderski): Remove copy.
+  auto tl = std::make_shared<TensorList<Backend>>();
+  tl->Copy(*input);
+  return tl;
+}


We have a very similar thing in workspace_policy.h - PresentAsTensorList. Maybe it should be unified and moved to a common untility.

We are doing a copy here (I guess for a prototype) which will always work but the PresentAsTensorList requires contiguous TV to work.

It can be another flavor or option of that function.

JanuszL · 2022-03-17T12:10:59Z

dali/pipeline/operator/direct_operator.h

+    DALI_FAIL("Unsupported backends in DirectOperator.Run().");
+  }
+
+  // Runs operator using specified thread pool.


Suggested change

// Runs operator using specified thread pool.

// Runs operator using specified thread pool and shared CUDA stream.

That's kind of the point, that it supports only CPU operators so the CUDA stream is not set.

Understood.

On the other hand this is template. So it can be used to any kind of op, including mixed and GPU one.

JanuszL · 2022-03-17T12:11:16Z

dali/pipeline/operator/direct_operator.h

+    DALI_FAIL("Unsupported backends in DirectOperator.Run() with thread pool.");
+  }
+
+  // Runs operator using specified CUDA stream.


Suggested change

// Runs operator using specified CUDA stream.

// Runs operator using shared thread and specified CUDA stream.

Here the opposite, supports only GPU operators and thread pool is not set.

I don't think that the template prevent creating a CPU only run function with such signature.

Well yes, but the thread pool won't be set anyway.

JanuszL · 2022-03-17T12:12:05Z

dali/pipeline/operator/direct_operator.h

+    DALI_FAIL("Unsupported backends in DirectOperator.Run() with CUDA stream");
+  }
+
+  // Set shared thread pool used for all direct operators.


Suggested change

// Set shared thread pool used for all direct operators.

// Creates thread pool used for all direct operators.

I think that "creates" might suggest that by default there is no thread pool, but I agree that "set" is not perfect either. Maybe "update"?

Sounds better.

JanuszL · 2022-03-17T12:15:18Z

dali/pipeline/operator/direct_operator.h

+    shared_thread_pool = std::make_unique<ThreadPool>(num_threads, device_id, set_affinity);
+  }
+
+  // Set shared CUDA stream used for all direct operators.


Suggested change

// Set shared CUDA stream used for all direct operators.

// Creates shared CUDA stream used for all direct operators.

I would rather expect that Set function accepts the value it should set.

done (changed to "update")

JanuszL · 2022-03-17T12:15:25Z

dali/pipeline/operator/direct_operator.h

+  }
+
+  // Set shared thread pool used for all direct operators.
+  DLL_PUBLIC inline static void SetThreadPool(int num_threads, int device_id, bool set_affinity) {


Suggested change

DLL_PUBLIC inline static void SetThreadPool(int num_threads, int device_id, bool set_affinity) {

DLL_PUBLIC inline static void CreateThreadPool(int num_threads, int device_id, bool set_affinity) {

done, (UpdateThreadPool)

JanuszL · 2022-03-17T12:15:36Z

dali/pipeline/operator/direct_operator.h

+  }
+
+  // Set shared CUDA stream used for all direct operators.
+  DLL_PUBLIC inline static void SetCudaStream(int device_id) {


Suggested change

DLL_PUBLIC inline static void SetCudaStream(int device_id) {

DLL_PUBLIC inline static void CreateCudaStream(int device_id) {

done, (UpdateCudaStream)

JanuszL · 2022-03-17T12:22:55Z

dali/pipeline/operator/direct_operator.h

+  OpSpec op_spec;
+  std::unique_ptr<OperatorBase> op;
+
+  static cudaStream_t shared_cuda_stream;


@mzient - I think it should be

Suggested change

static cudaStream_t shared_cuda_stream;

static CUDAStreamLease shared_cuda_stream_;

I'm not sure if having it static works well with SetCudaStream that can change it for all instances of this class.

Changed type to CUDAStreamLease. Is having static cuda_stream a problem?

I'm just afraid weird issues when the library is wrapping up and something is still using given stream.

lgtm-com · 2022-03-29T12:00:11Z

This pull request fixes 1 alert when merging 2da6e95 into 568826f - view on LGTM.com

fixed alerts:

1 for Unused import

dali-automaton · 2022-03-29T13:04:36Z

CI MESSAGE: [4261738]: BUILD FAILED

klecki

Looks ok, small nitpick.

I am also wandering if we can add a simple check for the current batch size and raise error that the variable batch size is not supported if we encounter smaller than max_batch_size batch until we start supporting it.

klecki · 2022-03-29T13:52:27Z

dali/python/nvidia/dali/_debug_mode.py

-
-        return op_helper, init_args, inputs_classification, kwargs_classification, len(inputs)
+        self._operators[key] = _OperatorManager(
+            op_class, self._seed_generator.integers(0, 2**32), inputs, kwargs)


This is a change that calls the seed generator every time instead of running it when the op didn't have the argument. I guess it doesn't really matter, but just wanted to check if it's intended.

Yes it is intended. I wanted to set seed in the operator after classification and I didn't want to pass seed generator to the OperatorManager. The reason why I wanted the seed to be set after classification is that when later we run the operator and check if its current arguments are correct we miss the seed argument (as intended), but in the expected classification we would have it. And, as you pointed out, it doesn't really matter because it's still deterministic.

dali-automaton · 2022-03-29T14:10:08Z

CI MESSAGE: [4261738]: BUILD PASSED

* Add error for variable batch_size Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

ksztenderski · 2022-03-30T16:22:16Z

!build

dali-automaton · 2022-03-30T16:25:19Z

CI MESSAGE: [4272690]: BUILD STARTED

lgtm-com · 2022-03-30T16:31:55Z

This pull request fixes 2 alerts when merging 92bfc65 into 9b24277 - view on LGTM.com

fixed alerts:

2 for Unused import

JanuszL · 2022-03-30T17:43:56Z

dali/test/python/test_pipeline_debug.py

+
+
+@pipeline_def(batch_size=8, num_threads=3, device_id=0, debug=True)
+def incorrect_input_Sets_pipeline():


Suggested change

def incorrect_input_Sets_pipeline():

def incorrect_input_sets_pipeline():

JanuszL · 2022-03-30T17:45:53Z

dali/test/python/test_pipeline_debug.py

+    return tuple(output)
+
+
+@raises(ValueError, glob="All argument lists for Multpile Input Sets used with operator 'Cat' must have the same length")


Can the error message have the name of the operator consistent with the API used - so Cat for ops and cat for fn?

As most of these error messages are for functionalities specifically for debug mode and the only way to use operators in debug mode is with fn API than I guess we can just change it to the snake_case.

So I would go for fn names.

dali-automaton · 2022-03-30T17:47:13Z

CI MESSAGE: [4272690]: BUILD FAILED

JanuszL · 2022-03-30T18:22:28Z

dali/python/nvidia/dali/fn.py

@@ -79,7 +79,7 @@ def fn_wrapper(*inputs, **kwargs):
        from nvidia.dali._debug_mode import _PipelineDebug
        current_pipeline = _PipelineDebug.current()
        if getattr(current_pipeline, '_debug_on', False):
-            return current_pipeline._wrap_op_call(op_wrapper, inputs, kwargs)
+            return current_pipeline._wrap_op_call(op_class, *inputs, **kwargs)


Suggested change

return current_pipeline._wrap_op_call(op_class, *inputs, **kwargs)

return current_pipeline._wrap_op_call(op_class, *inputs, name="_to_snake_case(op_class.__name__)", **kwargs)

or it doesn't make any sense?

It makes sense, that's the way the names are created here, but I think it's even better to just pass the name to the _wrap_op_class and have the name generation in one place.

JanuszL · 2022-03-30T18:28:09Z

dali/python/nvidia/dali/_debug_mode.py

        else:
-            raise RuntimeError(f"Unexpected operator '{op_wrapper.__name__}'. Debug mode does not support"
+            raise RuntimeError(f"Unexpected operator '{op_class}'. Debug mode does not support"


Why changing this? Won't we get Ops.Xyz style names even if we use fn API?

Reverted to fn style names.

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

lgtm-com · 2022-03-30T19:43:33Z

This pull request fixes 2 alerts when merging ea94e0e into 65616c5 - view on LGTM.com

fixed alerts:

2 for Unused import

klecki

One nitpick and I think there is small issue in MIS validation. Otherwise looks ok.

klecki · 2022-03-31T13:49:05Z

dali/python/nvidia/dali/_debug_mode.py


+    aritm_fn_name = _to_snake_case(_ops.ArithmeticGenericOp.__name__)


Nitpick, shouldn't this be a private member or something?

done (static member)

klecki · 2022-03-31T14:27:19Z

dali/python/nvidia/dali/_debug_mode.py

+                if input_set_len == 1:
+                    input_set_len = len(classification.is_batch)
+                else:
+                    raise ValueError("All argument lists for Multipile Input Sets used "


Won't we raise the error for the second input? If we save the input_set_len from first iteration, we will hit the else in the second one I think

klecki · 2022-03-31T14:43:52Z

dali/python/nvidia/dali/_debug_mode.py

+            classification = _Classification(input, f'Input {i}')
+
+            if isinstance(classification.is_batch, list):
+                if input_set_len == 1:


Wouldn't a -1 or something make more sense for the initial value?

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

ksztenderski · 2022-04-01T18:34:46Z

!build

dali-automaton · 2022-04-01T18:40:30Z

CI MESSAGE: [4330022]: BUILD STARTED

lgtm-com · 2022-04-01T18:47:44Z

This pull request fixes 2 alerts when merging b3846c4 into 999379b - view on LGTM.com

fixed alerts:

2 for Unused import

dali-automaton · 2022-04-01T22:10:08Z

CI MESSAGE: [4330022]: BUILD PASSED

* Add base backend implementation of eager operators * Add backend implementation of PipelineDebug managing backend operators * Add OperatorManager util class for debug mode * Replace minipipelines in debug mode by eager operators Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

JanuszL reviewed Mar 15, 2022

View reviewed changes

dali/python/backend_impl.cc Outdated Show resolved Hide resolved

ksztenderski added 8 commits March 16, 2022 15:25

Add direct operators prototype

1f31170

* Basic implementation of calls in debug mode * Basic exposing of direct operators to ops.experimental * TODO: Create debug pipeline class in C++ to keep thread pool Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Add PipelineDebug class in the backend and general cleanup

94c4978

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Add template of cuda stream support in PipelineDebug

c58d62c

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Remove copy of outputs for TL

83fe0a0

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Remove experimental ops exposure and direct_operator_call_test

6405be9

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Typo fix and cuda_stream fix

aef8a73

Moved workspace clear before setting cuda stream Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Add aritm_op support in direct operator for debug mode

da04779

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Add default layout support in direct op

596bafe

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

ksztenderski force-pushed the eager_operator_calls branch from cd6f4f7 to 596bafe Compare March 16, 2022 16:20

Clean up

cc7a4dc

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

ksztenderski assigned klecki Mar 16, 2022

ksztenderski marked this pull request as ready for review March 16, 2022 18:24

mzient reviewed Mar 17, 2022

View reviewed changes

jantonguirao assigned JanuszL Mar 17, 2022

JanuszL reviewed Mar 17, 2022

View reviewed changes

klecki approved these changes Mar 29, 2022

View reviewed changes

JanuszL mentioned this pull request Mar 30, 2022

DALI 2022 roadmap #3774

Closed

Fix multiple input sets support in debug mode

92bfc65

* Add error for variable batch_size Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

JanuszL reviewed Mar 30, 2022

View reviewed changes

ksztenderski added 2 commits March 30, 2022 21:28

Revert to fn stype names for operators in debug mode

aa0da7f

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Clean up

ea94e0e

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

JanuszL approved these changes Mar 31, 2022

View reviewed changes

klecki reviewed Mar 31, 2022

View reviewed changes

ksztenderski added 2 commits April 1, 2022 20:29

Fix input sets len check and adding inputs to OpSpec

8ff9509

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

Clean up

b3846c4

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>

klecki approved these changes Apr 4, 2022

View reviewed changes

JanuszL approved these changes Apr 5, 2022

View reviewed changes

ksztenderski merged commit 98c2a36 into NVIDIA:main Apr 5, 2022

	class DLL_PUBLIC DirectOperator {
	class DLL_PUBLIC ImmediateOperator {

	class DLL_PUBLIC DirectOperator {
	class DLL_PUBLIC EagerOperator {

	// Runs operator using specified thread pool.
	// Runs operator using specified thread pool and shared CUDA stream.

	// Runs operator using specified CUDA stream.
	// Runs operator using shared thread and specified CUDA stream.

	// Set shared thread pool used for all direct operators.
	// Creates thread pool used for all direct operators.

	// Set shared CUDA stream used for all direct operators.
	// Creates shared CUDA stream used for all direct operators.

	DLL_PUBLIC inline static void SetThreadPool(int num_threads, int device_id, bool set_affinity) {
	DLL_PUBLIC inline static void CreateThreadPool(int num_threads, int device_id, bool set_affinity) {

	DLL_PUBLIC inline static void SetCudaStream(int device_id) {
	DLL_PUBLIC inline static void CreateCudaStream(int device_id) {

	static cudaStream_t shared_cuda_stream;
	static CUDAStreamLease shared_cuda_stream_;



		@pipeline_def(batch_size=8, num_threads=3, device_id=0, debug=True)
		def incorrect_input_Sets_pipeline():

		return tuple(output)


		@raises(ValueError, glob="All argument lists for Multpile Input Sets used with operator 'Cat' must have the same length")

	return current_pipeline._wrap_op_call(op_class, inputs, *kwargs)
	return current_pipeline._wrap_op_call(op_class, inputs, name="_to_snake_case(op_class.__name__)", *kwargs)


		aritm_fn_name = _to_snake_case(_ops.ArithmeticGenericOp.__name__)

Add direct operator calls in debug mode #3734

Add direct operator calls in debug mode #3734

Conversation

ksztenderski commented Mar 15, 2022 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgtm-com bot commented Mar 16, 2022

ksztenderski commented Mar 16, 2022

dali-automaton commented Mar 16, 2022

lgtm-com bot commented Mar 16, 2022

dali-automaton commented Mar 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgtm-com bot commented Mar 29, 2022

dali-automaton commented Mar 29, 2022

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Mar 29, 2022

ksztenderski commented Mar 30, 2022

dali-automaton commented Mar 30, 2022

lgtm-com bot commented Mar 30, 2022

JanuszL Mar 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Mar 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgtm-com bot commented Mar 30, 2022

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ksztenderski commented Mar 15, 2022 •

edited

Loading

JanuszL Mar 17, 2022 •

edited

Loading

JanuszL Mar 30, 2022 •

edited

Loading