iter-to-iter variable batch size #2481

szalpal · 2020-11-18T16:58:44Z

TODO:

Signed-off-by: szalpal mszolucha@nvidia.com

What happened in this PR:

Update C API to support i2i variable batch size
Update nvJpegDecoder and its flavours to support variable BS
Update Constant op for the same reason
Partially split Executor into h and cc file, to make development convenient
Also add variable batch size routines to Executor (batch size queues, PreRun with BS inferring from the graph)
Introduce BatchSizeProvider interface and add it to ExternalSource
Add a feature to CachingList (inner ExternalSource memory), so that it can traverse over the data list asynchronously w.r.t current head and tail of this list
Modify constraint that stipulates constant batch size to stipulate uniform batch size within single iteration
Python API adjustment
Adding enormous test for i2i variable batch size (every op is tested)

Why we need this PR?

Pick one, remove the rest

To enable the i2i variable batch size

JIRA TASK: [Use DALI-XXXX or NA]

szalpal · 2020-12-18T02:31:58Z

!build

dali-automaton · 2020-12-18T03:22:36Z

CI MESSAGE: [1924708]: BUILD STARTED

dali-automaton · 2020-12-18T03:59:17Z

CI MESSAGE: [1924708]: BUILD FAILED

szalpal · 2020-12-18T14:06:19Z

!build

dali-automaton · 2020-12-18T14:53:32Z

CI MESSAGE: [1926219]: BUILD STARTED

dali-automaton · 2020-12-18T15:41:56Z

CI MESSAGE: [1926219]: BUILD FAILED

dali/operators/ssd/random_crop.cc

dali/c_api/c_api.cc

dali/operators/decoder/nvjpeg/decoupled_api/nvjpeg_decoder_decoupled_api.h

dali/python/nvidia/dali/pipeline.py

dali/test/python/test_dali_variable_batch_size.py

szalpal · 2021-01-07T23:42:14Z

!build

dali-automaton · 2021-01-08T03:37:34Z

CI MESSAGE: [1963736]: BUILD STARTED

dali-automaton · 2021-01-08T04:29:31Z

CI MESSAGE: [1963736]: BUILD FAILED

jantonguirao · 2021-01-08T08:52:52Z

dali/test/python/test_dali_variable_batch_size.py

+                        exec_pipelined=False)
+        with pipe:
+            data = fn.external_source(source=input_data, cycle=False, device=device)
+            processed = fn.python_function(data, function=resize, num_outputs=1)


Suggested change

processed = fn.python_function(data, function=resize, num_outputs=1)

processed = fn.python_function(data, function=resize)

jantonguirao · 2021-01-08T08:54:26Z

dali/pipeline/executor/executor.cc

+    DALI_ENFORCE(bsps[0]->NextBatchSize() == bsps[i]->NextBatchSize(),
+                 "Batch size must be uniform across an iteration");
+  }
+  auto batch_size = bsps[0]->NextBatchSize();


suggestion: move the variable to line 317 and use it in the loop

szalpal · 2021-01-08T12:46:46Z

!build

dali-automaton · 2021-01-08T12:50:39Z

CI MESSAGE: [1964913]: BUILD STARTED

dali-automaton · 2021-01-08T14:33:09Z

CI MESSAGE: [1964913]: BUILD FAILED

dali-automaton · 2021-01-08T15:23:43Z

CI MESSAGE: [1965134]: BUILD STARTED

dali-automaton · 2021-01-08T15:24:00Z

CI MESSAGE: [1965142]: BUILD STARTED

dali-automaton · 2021-01-08T15:24:03Z

CI MESSAGE: [1965140]: BUILD STARTED

dali-automaton · 2021-01-08T15:24:04Z

CI MESSAGE: [1965143]: BUILD STARTED

dali-automaton · 2021-01-08T17:03:38Z

CI MESSAGE: [1965151]: BUILD FAILED

szalpal · 2021-01-11T07:38:08Z

!build

dali-automaton · 2021-01-11T07:40:53Z

CI MESSAGE: [1969460]: BUILD STARTED

dali-automaton · 2021-01-11T09:25:37Z

CI MESSAGE: [1969460]: BUILD PASSED

include/dali/c_api.h

dali/python/nvidia/dali/pipeline.py

dali/pipeline/operator/batch_size_provider.h

dali/pipeline/data/type_traits.h

dali/test/python/test_utils.py

klecki

Few files still missing, I would like to check a bit more about the side-casts on my own.
My major gripe is mixing the C-API with the C++-API, not enough tests for the former one.

Also the prophet part probably doesn't happen in any scenario, but looking at the isolated CachingList it looks like a bug.

klecki · 2021-01-14T16:24:02Z

dali/pipeline/data/type_traits.h

+
+template <typename Backend>
+struct is_backend {
+  static constexpr bool value = std::is_same<Backend, CPUBackend>::value ||


Just a small nitpick, but probably 3 specializations and default for falls could be faster (but that is just guessing), I rarely measure perf for such things.

klecki · 2021-01-14T17:15:54Z

dali/pipeline/operator/batch_size_provider.h

+
+namespace dali {
+
+class BatchSizeProvider {


Maybe some doc what this class/interface is supposed to represent instead of usage patterns only?

klecki · 2021-01-14T18:51:53Z

dali/pipeline/executor/executor.cc

+
+template <typename WorkspacePolicy, typename QueuePolicy>
+template <typename Workspace>
+void Executor<WorkspacePolicy, QueuePolicy>::RunHelper(OpNode &op_node, Workspace &ws) {


I would put the RunHelper at the top of the functions that use it, but whatever.

klecki · 2021-01-14T18:53:42Z

dali/pipeline/executor/executor.cc

+                 make_string("Expected batch size lower or equal to max batch size. Actual: ",
+                             ws.GetInputBatchSize(i), " <= ", max_batch_size_));


I'm thinking if printing 500 < 128 is the best idea, maybe:

Suggested change

make_string("Expected batch size lower or equal to max batch size. Actual: ",

ws.GetInputBatchSize(i), " <= ", max_batch_size_));

make_string("Expected batch size lower or equal to max batch size. Expected at most: ",

max_batch_size_, ", got: ", ws.GetInputBatchSize(i)));

klecki · 2021-01-14T19:32:05Z

dali/pipeline/executor/executor.cc

+
+template <typename WorkspacePolicy, typename QueuePolicy>
+void Executor<WorkspacePolicy, QueuePolicy>::RunCPU() {
+  PreRun();


I wanted to suggest that PreRun shouldn't be a member of RunCPU, but probably it's complicated by the fact that it would need to be adjusted everywhere as those RunStage functions are used directly by users (in C++).

The implementation looks a bit weird for me with those push/pops, I must analyze it a bit more.

We decided, that PreRun belongs here after all. It is weird, but we don't have better idea ATM

klecki · 2021-01-15T18:13:03Z

dali/pipeline/operator/builtin/external_source.h

-#include <string>
-#include <vector>
-#include <memory>
+#include <chrono>


klecki · 2021-01-15T19:03:58Z

dali/pipeline/operator/op_spec.h

-                   "tensor with ", batch_size, " elements. Got:\n", shape));
-    }
-    return valid_shape;
+    return true;


This worries me. Why don't you do any verification? Where is this used?

Fixed. Added removed lines back and added one more check to operator.cc::EnforceUniformBatchSize

klecki · 2021-01-15T19:09:59Z

dali/pipeline/operator/builtin/external_source.h

+          "Attempted to step over the last element in the list. This operation is forbidden. Add "
+          "more elements to CachingList before calling AdvanceProphet.");
+    apprentice_ = prophet_;
+    advance(prophet_, 1);


What's wrong with prophet_++?
Also, what if our prophet (or apprentice_) is popped out of the list and put in the empty list? Wouldn't we start advancing in the set of empty nodes?

klecki · 2021-01-15T19:22:16Z

dali/pipeline/operator/builtin/external_source.h

@@ -415,9 +494,10 @@ class ExternalSource : public Operator<Backend> {
   */
  struct ExternalSourceState {
    bool copied_shared_data = false;
+    size_t batch_size = 0;


I see only modifications putting the current bs in, but no access to it?

klecki · 2021-01-15T19:25:55Z

dali/test/python/test_dali_variable_batch_size.py

+   0-255, with shape like [640, 480, 3]) and you want to test default arguments
+   only, please add a record to the `ops_image_default_args` list
+2. If the operator is typically processing image-like data (i.e. 3-dim, uint8,
+   0-255, with shape like [640, 480, 3]) and you want to specify any number of


How about more demanding scenarios (where maybe the shape is a bit more fuzzy?).
Also, didn't look for it yet, but do you test stuff like FCHW sequences? There might be some weirdness in sequence processing operators.

Added sequence tests

klecki

Finished the rest, there appear to be some issues in ImageDecoder and I would like to see the tests for sequences and with non-uniform input batches.

klecki · 2021-01-20T12:24:13Z

dali/operators/decoder/nvjpeg/decoupled_api/nvjpeg_decoder_decoupled_api.h

    if (spec_.GetSchema().HasArgument("hw_decoder_load")) {
      hw_decoder_load_ = spec.GetArgument<float>("hw_decoder_load");
+      try_init_hw_decoder = true;
    } else {
      hw_decoder_load_ = 0;
    }


This part makes it optional without default. HasArgument is true only for arguments that were specified by the user.

klecki · 2021-01-20T12:43:00Z

dali/operators/decoder/nvjpeg/decoupled_api/nvjpeg_decoder_decoupled_api.h

-    }
-
-    if (hw_decoder_bs_ > 0 &&
+    if (try_init_hw_decoder &&


By removing the part of code that sets the hw_decoder_bs_ you are making sure that the nvjpegDecodeBatchedPreAllocate uses batch=0.
Probably needs to be adjusted to use max_batch_size * hw_decoder_load_ in the nvjpegDecodeBatchedPreAllocate call.

klecki · 2021-01-20T12:51:15Z

dali/operators/decoder/nvjpeg/decoupled_api/nvjpeg_decoder_decoupled_api.h

@@ -909,6 +894,36 @@ class nvJPEGDecoder : public Operator<MixedBackend>, CachedDecoderImpl {
    RegisterDiagnostic("using_hw_decoder", &using_hw_decoder_);
  }

+  int CalcHwDecoderBatchSize(float hw_decoder_load, int curr_batch_size) {
+    if (hw_decoder_load == .0f) return 0;


Suggested change

if (hw_decoder_load == .0f) return 0;

if (hw_decoder_load == 0.f) return 0;

klecki · 2021-01-20T13:16:07Z

dali/test/python/test_dali_variable_batch_size.py

+    if isinstance(sample_shape, tuple):
+        size = sample_shape
+    elif inspect.isgeneratorfunction(sample_shape):
+        size = sample_shape()


I thought it will be called for every sample instead of once. It would allow to pass a random shape generator and have some variability in shapes.

Maybe it's worth to have some more randomness in the input (non uniform cases)?

klecki · 2021-01-20T13:22:59Z

dali/test/python/test_dali_variable_batch_size.py

+                         shape of every sample.
+    :param lo:
+    :param hi:
+    :param dtype: 'int' for uint8 or 'float' for float32


Those are just numpy types the doc is wrong.

klecki · 2021-01-20T13:57:02Z

dali/test/python/test_dali_variable_batch_size.py

+            "`sample_shape` shall be either a tuple or a callable. Provide `(val,)` tuple for 1D shape")
+
+    if np.issubdtype(dtype, np.integer):
+        return [np.random.randint(lo, hi, size=(bs,) + size, dtype=dtype) for bs in


If we want non-uniform shapes, we probably need to add another nested list of Tensors instead of generating a numpy array of shape (bs, ...)

Done, differently, but there's a non-uniformity

klecki · 2021-01-20T14:08:12Z

dali/test/python/test_dali_variable_batch_size.py

+        data = data * 2
+        data = data + 3
+        data = data - 4
+        data = data / 5
+        data = data // 6


Can you check unary -, ternary math.clamp, and throw in a binary op with two DALI tensor arguments, so something like data + data2?

szalpal · 2021-01-21T02:31:54Z

!build

dali-automaton · 2021-01-21T03:45:22Z

CI MESSAGE: [2000529]: BUILD STARTED

dali-automaton · 2021-01-21T05:50:06Z

CI MESSAGE: [2000529]: BUILD FAILED

klecki

Some nitpicks for subshape and enforces.

klecki · 2021-01-21T15:20:27Z

dali/c_api/c_api.cc

@@ -183,14 +205,20 @@ void daliPrefetchSeparate(daliPipelineHandle *pipe_handle,
 }


+void daliSetExternalInputBatchSize(daliPipelineHandle *pipe_handle, const char *name,
+                                   int batch_size) {
+  auto bs_map = reinterpret_cast<batch_size_map_t *>(pipe_handle->batch_sizes_map);


Suggested change

auto bs_map = reinterpret_cast<batch_size_map_t *>(pipe_handle->batch_sizes_map);

auto *bs_map = reinterpret_cast<batch_size_map_t *>(pipe_handle->batch_sizes_map);

I like the auto *. :P

klecki · 2021-01-21T15:58:22Z

include/dali/core/tensor_shape.h

+  assert(begin >= 0);
+  auto div_ceil = [](int x, int y) { return 1 + ((x - 1) / y); };
+  int nsamples = div_ceil(end - begin, step);
+  std::vector<TensorShape<ndims>> dst_shapes(nsamples);


Why not create a TensorListShape directly, as far as I remember it's movable, and you already have sample_dim() (ndims can be negative), and you can just .set_tensor_shape(j, in[i]);

klecki · 2021-01-21T16:27:26Z

dali/pipeline/operator/operator.cc

+  static_assert(!std::is_same<Backend, MixedBackend>::value,
+                "MixedBatch doesn't have an accessible output");
+  for (int i = 0; i < ws.NumOutput(); i++) {
+    auto ref_batch_size =


You can move this out of the loop.

klecki · 2021-01-21T16:28:21Z

dali/pipeline/operator/operator.h

@@ -293,6 +296,7 @@ class Operator<CPUBackend> : public OperatorBase {
    SetupSharedSampleParams(ws);
    RunImpl(ws);
    ws.GetThreadPool().WaitForWork();
+//    EnforceUniformOutputBatchSize<CPUBackend>(ws);


Why is this commentedo out?

klecki · 2021-01-21T16:28:29Z

dali/pipeline/operator/operator.h

    return SetupImpl(output_desc, ws);
  }

  void Run(DeviceWorkspace &ws) override {
    CheckInputLayouts(ws, spec_);
    SetupSharedSampleParams(ws);
    RunImpl(ws);
+//    EnforceUniformOutputBatchSize<GPUBackend>(ws);


klecki · 2021-01-21T16:29:25Z

dali/pipeline/operator/operator.cc

+
+template <typename Backend>
+void OperatorBase::EnforceUniformOutputBatchSize(const workspace_t<Backend> &ws) const {
+  static_assert(!std::is_same<Backend, MixedBackend>::value,


It should be possible for Mixed AFAIK.

klecki · 2021-01-21T16:30:58Z

dali/pipeline/pipeline.h

@@ -428,7 +429,8 @@ class DLL_PUBLIC Pipeline {
  DLL_PUBLIC int num_outputs() const;

  /**
-   * @brief Returns a string describing the type device type backing the output specified by given id.
+   * @brief Returns a string describing the type device type backing the output specified by given


I still don't understand the type device type

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

szalpal · 2021-01-21T17:32:52Z

!build

dali-automaton · 2021-01-21T17:35:45Z

CI MESSAGE: [2002764]: BUILD STARTED

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

szalpal · 2021-01-21T18:12:32Z

!build

dali-automaton · 2021-01-21T18:16:39Z

CI MESSAGE: [2002862]: BUILD STARTED

dali-automaton · 2021-01-21T20:16:14Z

CI MESSAGE: [2002862]: BUILD PASSED

szalpal mentioned this pull request Nov 18, 2020

Ops rework to prepare iter-to-iter batch size variability #2408

Merged

szalpal force-pushed the variable_bs_test branch from 60bf4f2 to f08e191 Compare December 7, 2020 15:28

szalpal force-pushed the variable_bs_test branch 2 times, most recently from 99a4f54 to a15ea34 Compare December 18, 2020 02:31

szalpal marked this pull request as ready for review December 18, 2020 04:01

szalpal changed the title ~~Test for iter-to-iter variable batch size~~ iter-to-iter variable batch size Dec 18, 2020

jantonguirao self-assigned this Jan 7, 2021

jantonguirao reviewed Jan 7, 2021

View reviewed changes

jantonguirao reviewed Jan 8, 2021

View reviewed changes

jantonguirao approved these changes Jan 8, 2021

View reviewed changes

klecki assigned klecki and banasraf Jan 8, 2021

szalpal force-pushed the variable_bs_test branch from 569af33 to 89ab45e Compare January 8, 2021 12:46

banasraf reviewed Jan 12, 2021

View reviewed changes

include/dali/c_api.h Show resolved Hide resolved

banasraf reviewed Jan 12, 2021

View reviewed changes

dali/python/nvidia/dali/pipeline.py Show resolved Hide resolved

banasraf reviewed Jan 13, 2021

View reviewed changes

dali/pipeline/operator/batch_size_provider.h Outdated Show resolved Hide resolved

banasraf reviewed Jan 13, 2021

View reviewed changes

dali/pipeline/operator/batch_size_provider.h Outdated Show resolved Hide resolved

banasraf reviewed Jan 13, 2021

View reviewed changes

dali/pipeline/data/type_traits.h Show resolved Hide resolved

banasraf reviewed Jan 13, 2021

View reviewed changes

dali/test/python/test_utils.py Show resolved Hide resolved

banasraf approved these changes Jan 13, 2021

View reviewed changes

klecki reviewed Jan 15, 2021

View reviewed changes

klecki reviewed Jan 20, 2021

View reviewed changes

szalpal force-pushed the variable_bs_test branch from c777b79 to 8f4e8ca Compare January 21, 2021 02:30

klecki reviewed Jan 21, 2021

View reviewed changes

klecki approved these changes Jan 21, 2021

View reviewed changes

szalpal force-pushed the variable_bs_test branch from f60b369 to 5aab6c8 Compare January 21, 2021 17:27

Enabling variable batch size

bc2191b

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

szalpal force-pushed the variable_bs_test branch from 5aab6c8 to bc2191b Compare January 21, 2021 17:32

bloody clang

e23217a

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

szalpal merged commit 3dd70d6 into NVIDIA:master Jan 22, 2021

szalpal deleted the variable_bs_test branch February 9, 2024 00:25

	processed = fn.python_function(data, function=resize, num_outputs=1)
	processed = fn.python_function(data, function=resize)

		make_string("Expected batch size lower or equal to max batch size. Actual: ",
		ws.GetInputBatchSize(i), " <= ", max_batch_size_));

	if (hw_decoder_load == .0f) return 0;
	if (hw_decoder_load == 0.f) return 0;

	auto bs_map = reinterpret_cast<batch_size_map_t *>(pipe_handle->batch_sizes_map);
	auto bs_map = reinterpret_cast<batch_size_map_t >(pipe_handle->batch_sizes_map);

iter-to-iter variable batch size #2481

iter-to-iter variable batch size #2481

Conversation

szalpal commented Nov 18, 2020 • edited Loading

What happened in this PR:

Why we need this PR?

szalpal commented Dec 18, 2020

dali-automaton commented Dec 18, 2020

dali-automaton commented Dec 18, 2020

szalpal commented Dec 18, 2020

dali-automaton commented Dec 18, 2020

dali-automaton commented Dec 18, 2020

szalpal commented Jan 7, 2021

dali-automaton commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szalpal commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

dali-automaton commented Jan 8, 2021

szalpal commented Jan 11, 2021

dali-automaton commented Jan 11, 2021

dali-automaton commented Jan 11, 2021

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szalpal commented Jan 21, 2021

dali-automaton commented Jan 21, 2021

dali-automaton commented Jan 21, 2021

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szalpal commented Jan 21, 2021

dali-automaton commented Jan 21, 2021

szalpal commented Jan 21, 2021

dali-automaton commented Jan 21, 2021

dali-automaton commented Jan 21, 2021

szalpal commented Nov 18, 2020 •

edited

Loading