Extend external source operator capacity #1127

JanuszL · 2019-07-30T14:26:27Z

makes external source to be able to hold more than one sample
at the time so the user can feed more data ahead before it is consumed
by RunCPU/RunGPU

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

makes external source to be able to hold more than one sample
at the time so the user can feed more data ahead before it is consumed

What happened in this PR?

so far ExternalSource was able to keep only one piece of data and every try to provide it with more was waiting for pipeline to consume what is already there
adds queue with data to ExternalSource which caches already consumed tensors to avoid expensive allocations when new piece is provided
test is added, tested on CI

JIRA TASK: DALI-954

JanuszL · 2019-07-30T16:41:12Z

!build

dali-automaton · 2019-07-30T16:45:08Z

CI MESSAGE: [834670]: BUILD STARTED

dali-automaton · 2019-07-30T16:58:46Z

CI MESSAGE: [834670]: BUILD FAILED

JanuszL · 2019-07-30T17:10:06Z

!build

dali-automaton · 2019-07-30T17:15:07Z

CI MESSAGE: [834754]: BUILD STARTED

dali-automaton · 2019-07-30T17:35:33Z

CI MESSAGE: [834754]: BUILD FAILED

JanuszL · 2019-07-30T19:18:31Z

!build

dali-automaton · 2019-07-30T20:03:37Z

CI MESSAGE: [835080]: BUILD STARTED

dali-automaton · 2019-07-30T20:24:25Z

CI MESSAGE: [835080]: BUILD FAILED

JanuszL · 2019-07-31T05:58:42Z

!build

dali-automaton · 2019-07-31T06:00:06Z

CI MESSAGE: [835829]: BUILD STARTED

dali-automaton · 2019-07-31T10:07:33Z

CI MESSAGE: [835829]: BUILD FAILED

dali/pipeline/operators/util/external_source.h

dali/pipeline/util/backend2workspace_map.h

dali/pipeline/operators/util/external_source.h

dali/pipeline/operators/util/external_source.cc

dali-automaton · 2019-07-31T13:32:36Z

CI MESSAGE: [836294]: BUILD STARTED

dali-automaton · 2019-07-31T17:47:05Z

CI MESSAGE: [836294]: BUILD FAILED

JanuszL · 2019-08-01T07:49:47Z

!build

dali-automaton · 2019-08-01T07:54:58Z

CI MESSAGE: [837881]: BUILD FAILED

dali-automaton · 2019-08-01T08:09:37Z

CI MESSAGE: [837896]: BUILD STARTED

dali-automaton · 2019-08-01T08:55:15Z

CI MESSAGE: [837896]: BUILD FAILED

JanuszL · 2019-08-01T09:41:22Z

!build

dali-automaton · 2019-08-01T09:45:10Z

CI MESSAGE: [837981]: BUILD STARTED

dali/pipeline/operators/operator.h

dali-automaton · 2019-08-01T10:39:41Z

CI MESSAGE: [837981]: BUILD PASSED

dali/pipeline/operators/util/external_source.cc

jantonguirao · 2019-08-02T13:51:29Z

dali/pipeline/operators/util/external_source.cu

-  }
-  cv_.notify_all();
+  output.Copy(*data, (ws->has_stream() ? ws->stream() : 0));
+  busy_lock.lock();


same as I wrote before, but not really strong opinion

jantonguirao · 2019-08-02T13:51:44Z

dali/pipeline/operators/util/external_source.h

+template <typename T>
+class CachingList {
+ public:
+  CachingList() {}


Suggested change

CachingList() {}

CachingList() = default;

jantonguirao · 2019-08-02T13:53:06Z

dali/pipeline/operators/util/external_source.h

+  }
+
+  std::unique_ptr<T> GetEmpty() {
+    if (!empty_data_.size()) {


Suggested change

if (!empty_data_.size()) {

if (empty_data_.empty()) {

jantonguirao · 2019-08-02T14:00:06Z

dali/pipeline/operators/util/external_source_test.cc

@@ -0,0 +1,254 @@
+// Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved.


dali-automaton · 2019-08-08T15:58:37Z

CI MESSAGE: [848584]: BUILD STARTED

dali-automaton · 2019-08-08T16:17:52Z

CI MESSAGE: [848584]: BUILD FAILED

dali-automaton · 2019-08-08T17:00:29Z

CI MESSAGE: [848665]: BUILD STARTED

dali-automaton · 2019-08-08T17:40:14Z

CI MESSAGE: [848665]: BUILD FAILED

JanuszL · 2019-08-09T07:27:20Z

!build

dali-automaton · 2019-08-09T07:44:05Z

CI MESSAGE: [849861]: BUILD STARTED

dali-automaton · 2019-08-09T09:19:05Z

CI MESSAGE: [849861]: BUILD FAILED

dali/pipeline/operators/util/external_source.cu

dali/pipeline/operators/util/external_source.h

jantonguirao · 2019-08-09T10:51:29Z

dali/pipeline/operators/util/external_source.h

+    if (cuda_event) {
+      cuda_events_.Recycle(std::move(cuda_event));
+    }
+  }



I find the RecycleHelper more complicated than it should be

Why not:

private: void RecycleHelper(std::unique_ptr<std::vector<Tensor<CPUBackend>>>> data) { t_data_.Recycle(std::move(data)); } void RecycleHelper(std::unique_ptr<TensorList<CPUBackend>> data) { tl_data_.Recycle(std::move(data)); }

and then in RecycleBuffer you just call

RecycleHelper(std::move(data));

?

jantonguirao · 2019-08-09T10:53:21Z

dali/pipeline/operators/util/external_source.h

+};
+
+template <typename DataType>
+std::enable_if_t<std::is_same<DataType, std::unique_ptr<TensorList<CPUBackend>>>::value>


see my comment below about this

dali-automaton · 2019-08-09T15:57:14Z

CI MESSAGE: [850380]: BUILD STARTED

dali-automaton · 2019-08-09T16:09:34Z

CI MESSAGE: [850380]: BUILD FAILED

dali-automaton · 2019-08-09T22:10:32Z

CI MESSAGE: [851002]: BUILD STARTED

dali-automaton · 2019-08-09T23:09:41Z

CI MESSAGE: [851002]: BUILD FAILED

dali-automaton · 2019-08-13T22:42:18Z

CI MESSAGE: [851002]: BUILD PASSED

jantonguirao · 2019-08-14T11:24:11Z

dali/pipeline/operators/util/external_source.cc

+        DALI_ENFORCE(!tl_data_.IsEmpty(), "ExternalSource is empty. Need to feed data first.");
+        tl_data = tl_data_.PopFront();
+        DALI_ENFORCE(OperatorBase::batch_size_ == static_cast<int>(tl_data.front()->ntensor()),
+        "Data list provided to ExternalSource needs to have batch_size length.");


missing indent

jantonguirao · 2019-08-14T11:24:19Z

dali/pipeline/operators/util/external_source.cc

+        DALI_ENFORCE(!t_data_.IsEmpty(), "ExternalSource is empty. Need to feed data first.");
+        t_data = t_data_.PopFront();
+        DALI_ENFORCE(OperatorBase::batch_size_ == static_cast<int>(t_data.front()->size()),
+        "Data list provided to ExternalSource needs to have batch_size length.");


missing indent

jantonguirao · 2019-08-14T11:36:36Z

dali/pipeline/operators/util/external_source.cu

-  cv_.notify_all();
+
+  auto &output = ws->Output<GPUBackend>(0);
+  output.Copy(*(data.front()), (ws->has_stream() ? ws->stream() : 0));


you have (ws->has_stream() ? ws->stream() : 0) twice. Consider using a variable instead

dali/pipeline/operators/util/external_source.h

dali/pipeline/operators/util/external_source.cc

dali/pipeline/operators/util/external_source.h

klecki · 2019-08-15T11:46:08Z

dali/pipeline/operators/util/external_source.cc

+      // HostWorkspace doesn't have any stream
+      cudaStream_t stream = 0;
+      if (is_tl_data) {
+        output.Copy(*(tl_data.front()), data_idx, stream);


Doesn't this copy the TensorList batch_size_ times? I guess it shouldn't be in a loop.

Not really. output is a tensor, Copy(*(tl_data.front()), data_idx, stream); extracts the data_idx tensor from tl_data and copies to the output.
I think we don't have a fast way to copy TensorList to a vector of Tensors other than go one by one.

Missed the data_idx argument. Everything's ok.

klecki · 2019-08-15T11:51:29Z

dali/pipeline/operators/util/external_source.h

+    return full_data_.empty();
+  }
+
+  ListT PopFront() {


Can you add a comment describing how the CachingList works and that it always operates on single elements wrapped in a list? I misread the doc for splice and was wandering why you don't handle a case with more than 1 element.

BTW is this really that much better than references + moves?

Description added. The idea with one element list was provided by @mzient as it saves the allocation for the new element in the list. I don't think it gives us much but on the other hand it doesn't cost much (regarding the code itself) to make it this way either.

The small allocations can be painful, esp. if there is a lot of them.

dali/pipeline/operators/util/external_source.cc

dali/pipeline/operators/util/external_source.h

dali/pipeline/operators/util/external_source.cu

JanuszL · 2019-08-16T00:07:07Z

!build

dali-automaton · 2019-08-16T00:10:16Z

CI MESSAGE: [859754]: BUILD STARTED

dali-automaton · 2019-08-16T01:27:27Z

CI MESSAGE: [859754]: BUILD PASSED

JanuszL · 2019-08-19T12:24:55Z

!build

dali-automaton · 2019-08-19T12:30:12Z

CI MESSAGE: [863164]: BUILD STARTED

dali-automaton · 2019-08-19T12:37:18Z

CI MESSAGE: [863164]: BUILD FAILED

- makes external source to be able to hold more than one sample at the time so the user can feed more data ahead before it is consumed by RunCPU/RunGPU Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2019-08-19T13:20:10Z

!build

dali-automaton · 2019-08-19T13:25:11Z

CI MESSAGE: [863215]: BUILD STARTED

dali-automaton · 2019-08-19T17:24:14Z

CI MESSAGE: [863215]: BUILD PASSED

klecki requested changes Jul 31, 2019

View reviewed changes

JanuszL requested review from klecki, jantonguirao, awolant, szalpal and banasraf July 31, 2019 12:49

mzient reviewed Aug 1, 2019

View reviewed changes

dali/pipeline/operators/operator.h Outdated Show resolved Hide resolved

JanuszL requested a review from mzient August 2, 2019 07:17

jantonguirao reviewed Aug 2, 2019

View reviewed changes

jantonguirao reviewed Aug 9, 2019

View reviewed changes

JanuszL requested a review from jantonguirao August 13, 2019 22:21

jantonguirao approved these changes Aug 14, 2019

View reviewed changes

klecki requested changes Aug 15, 2019

View reviewed changes

JanuszL requested a review from klecki August 17, 2019 22:05

klecki approved these changes Aug 19, 2019

View reviewed changes

Extend external source operator capacity

1dfae2a

- makes external source to be able to hold more than one sample at the time so the user can feed more data ahead before it is consumed by RunCPU/RunGPU Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL merged commit bee78ee into NVIDIA:master Aug 20, 2019

JanuszL deleted the external_source_upd branch August 20, 2019 06:08

		@@ -0,0 +1,254 @@
		// Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved.

Extend external source operator capacity #1127

Extend external source operator capacity #1127

Conversation

JanuszL commented Jul 30, 2019

Why we need this PR?

What happened in this PR?

JanuszL commented Jul 30, 2019

dali-automaton commented Jul 30, 2019

dali-automaton commented Jul 30, 2019

JanuszL commented Jul 30, 2019

dali-automaton commented Jul 30, 2019

dali-automaton commented Jul 30, 2019

JanuszL commented Jul 30, 2019

dali-automaton commented Jul 30, 2019

dali-automaton commented Jul 30, 2019

JanuszL commented Jul 31, 2019

dali-automaton commented Jul 31, 2019

dali-automaton commented Jul 31, 2019

dali-automaton commented Jul 31, 2019

dali-automaton commented Jul 31, 2019

JanuszL commented Aug 1, 2019

dali-automaton commented Aug 1, 2019

dali-automaton commented Aug 1, 2019

dali-automaton commented Aug 1, 2019

JanuszL commented Aug 1, 2019

dali-automaton commented Aug 1, 2019

dali-automaton commented Aug 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Aug 8, 2019

dali-automaton commented Aug 8, 2019

dali-automaton commented Aug 8, 2019

dali-automaton commented Aug 8, 2019

JanuszL commented Aug 9, 2019

dali-automaton commented Aug 9, 2019

dali-automaton commented Aug 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Aug 9, 2019

dali-automaton commented Aug 9, 2019

dali-automaton commented Aug 9, 2019

dali-automaton commented Aug 9, 2019

dali-automaton commented Aug 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL commented Aug 16, 2019

dali-automaton commented Aug 16, 2019

dali-automaton commented Aug 16, 2019

JanuszL commented Aug 19, 2019

dali-automaton commented Aug 19, 2019

dali-automaton commented Aug 19, 2019

JanuszL commented Aug 19, 2019

dali-automaton commented Aug 19, 2019

dali-automaton commented Aug 19, 2019