Add VideoReaderDecoder GPU #3668

awolant · 2022-02-10T10:17:44Z

Category:

New feature: Adds VideoReaderDecoderGpu op. This operator reads and decodes video files using NVDECODE API. It supports both CFR and VFR videos.

It provides basic functionality for now. Additional features: more formats, codecs, output types, input variants will be added in subsequent tasks

Description:

Additional information:

Affected modules and functionalities:

Added new operator and loader. Adjusted FramesDecoderGpu as some minor changes were needed (ability to set the stream after construction).

Key points relevant for the review:

Does this operator properly interface with FramesDeocderGpu?

Does this operator properly implement DALI Reader abstraction, given that it does not exactly fit to it?

Checklist

Tests

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-2593

Signed-off-by: Albert Wolant <awolant@nvidia.com>

…ecoder_gpu

Signed-off-by: Albert Wolant <awolant@nvidia.com>

…ecoder_gpu

…der_gpu

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL · 2022-02-10T14:46:11Z

dali/operators/reader/loader/video/video_loader_decoder_gpu.cc

+
+  // TODO(awolant): Extract decoding outside of ReadSample (ReaderDecoder abstraction)
+  for (int i = 0; i < sequence_len_; ++i) {
+    // TODO(awolant): This seek can be optimized - for consecutive frames not needed etc.


Maybe we can optimize the seek itself and keep it here even with the optimization?

I moved the comment to Seek. This will be done as DALI-2320.

JanuszL · 2022-02-10T14:53:28Z

dali/operators/reader/loader/video/video_loader_decoder_gpu.cc

+}
+
+void VideoLoaderDecoderGpu::PrepareMetadataImpl() {
+  video_files_.reserve(filenames_.size());


Ok. So we have input files number amount of FramesDecoderGpu instances (including decoder instances inside).
I'm not sure how many of them we can have in parallel.

Solving this properly is part of DALI-2321 to be done when we have benchmark (DALI-2594). Before it is hard too tell anything about performance impact of any possible solution.

I think it is not about the perf, rather about resource constrains. I think creating 1000 decoders and parsers will consume a lot of resources.
Also we have already hit a maximum amount of files opened in parallel in the old VideoReader (libaviutil).

JanuszL · 2022-02-10T14:56:47Z

dali/operators/reader/video_reader_decoder_gpu_op.cc

+  output_shape.resize(batch_size);
+
+  for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
+    auto &sample = current_batch[sample_id];


Suggested change

auto &sample = current_batch[sample_id];

auto &sample = GetSample(sample_id);

JanuszL · 2022-02-10T14:57:01Z

dali/operators/reader/video_reader_decoder_gpu_op.cc

+void VideoReaderDecoderGpu::RunImpl(DeviceWorkspace &ws) {
+  auto &video_output = ws.Output<GPUBackend>(0);
+  auto &current_batch = prefetched_batch_queue_[curr_batch_consumer_];
+  int batch_size = current_batch.size();


Suggested change

int batch_size = current_batch.size();

int batch_size = GetCurrBatchSize();

JanuszL · 2022-02-10T14:57:25Z

dali/operators/reader/video_reader_decoder_gpu_op.cc

+  video_output.Resize(output_shape, current_batch[0]->data_.type());
+
+  for (int sample_id = 0; sample_id < batch_size; ++sample_id) {
+    auto &sample = current_batch[sample_id];


Suggested change

auto &sample = current_batch[sample_id];

auto &sample =GetSample(sample_id);;

JanuszL · 2022-02-10T14:57:52Z

dali/operators/reader/video_reader_decoder_gpu_op.cc

+    output_shape.set_tensor_shape(sample_id, sample->data_.shape());
+  }
+
+  video_output.Resize(output_shape, current_batch[0]->data_.type());


Suggested change

video_output.Resize(output_shape, current_batch[0]->data_.type());

video_output.Resize(output_shape, GetSample(0);->data_.type());

JanuszL · 2022-02-10T14:59:57Z

dali/operators/reader/video_reader_decoder_gpu_op.cc

+    output_shape.set_tensor_shape(sample_id, sample->data_.shape());
+  }
+
+  video_output.Resize(output_shape, current_batch[0]->data_.type());


Maybe we can add SetupImpl and deal with shapes there?

I moved decoding the the RunImpl. This means that we do not have the shape of the output ready during SetupImpl.
We could pass the size from the FramesDecoder, through VideoSampleDesc but I don't want to do this as this will limit us in the future, when we optimize index building and might not know size without decoding.
When this is more or less feature complete, we can revisit this refactoring, ok?

Signed-off-by: Albert Wolant <awolant@nvidia.com>

dali/operators/reader/video_reader_decoder_op_test.cc

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL · 2022-02-16T17:54:28Z

dali/operators/reader/video_reader_decoder_op_test.cc

+        this->SaveFrame(
+          frame_cpu.data(),
+          i,
+          sample_id,
+          sequence_id,
+          "/home/wazka/Downloads/frames/reader/",
+          this->Width(video_idx),
+          this->Height(video_idx));
+
+        this->SaveFrame(
+          this->GetVfrFrame(video_idx, gt_frame_id + i * stride),
+          i,
+          sample_id,
+          sequence_id,
+          "/home/wazka/Downloads/frames/gt/",
+          this->Width(video_idx),
+          this->Height(video_idx));


Yes, I have that stashed for debugging. Removed

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL · 2022-02-16T18:13:15Z

dali/operators/reader/loader/video/frames_decoder_gpu.h

@@ -66,8 +66,12 @@ class DLL_PUBLIC FramesDecoderGpu : public FramesDecoder {

  int NextFramePts() { return index_[NextFrameIdx()].pts; }

+  void SetCudaStream(cudaStream_t stream) { stream_ = stream; }


Do you use it anywhere now? Or we keep going on the default stream?

No, removed this. Stream is set during the construction now.

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2022-02-17T09:52:11Z

!build

dali-automaton · 2022-02-17T09:55:31Z

CI MESSAGE: [3985100]: BUILD STARTED

jantonguirao · 2022-02-17T10:03:50Z

dali/operators/reader/loader/video/video_loader_decoder_gpu.cc

+  // TODO(awolant): Check per decoder stream
+  cudaStream_t stream;
+  DeviceGuard dg(device_id_);
+  CUDA_CALL(cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking));


Consider using CUDAStream, or even better, just lease one from the pool:

dali::CUDAStreamPool::instance().Get(device_id_);

We can't use CUDAStream as this is derived from UniqueHandle and I want to share this stream between decoders for now.

jantonguirao · 2022-02-17T10:11:16Z

dali/operators/reader/video_reader_decoder_gpu_op.cc

+  }
+
+  auto &labels_output = ws.Output<GPUBackend>(1);
+  vector<int> labels_cpu(batch_size);


SmallVector<int, 256> will save you an allocation for most batch size.

Done. I used smaller value, as batch sizes in video use cases tend to be smaller.

Signed-off-by: Albert Wolant <awolant@nvidia.com>

dali-automaton · 2022-02-17T10:57:57Z

CI MESSAGE: [3985100]: BUILD PASSED

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2022-02-17T12:43:52Z

!build

dali-automaton · 2022-02-17T12:45:29Z

CI MESSAGE: [3985950]: BUILD STARTED

dali-automaton · 2022-02-17T13:46:13Z

CI MESSAGE: [3985950]: BUILD FAILED

awolant · 2022-02-17T16:43:15Z

!build

dali-automaton · 2022-02-17T16:45:28Z

CI MESSAGE: [3987291]: BUILD STARTED

dali-automaton · 2022-02-17T17:43:03Z

CI MESSAGE: [3987291]: BUILD FAILED

awolant · 2022-02-17T18:38:49Z

!build

dali-automaton · 2022-02-17T18:40:31Z

CI MESSAGE: [3988097]: BUILD STARTED

dali-automaton · 2022-02-17T19:41:22Z

CI MESSAGE: [3988097]: BUILD PASSED

* Add VideoReaderDecoderGpu op Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant added 13 commits February 9, 2022 08:16

Add proper destructors

f678967

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Add proper packet unrefs

a5f3c59

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Add av_packet scope

39bc25d

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Add nvdecode state destructor

3c27142

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix

1a40724

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix double free

fa51886

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Add VideoReaderDecoderOp GPU

d883e6d

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix segfault in packet destruction

45028d1

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Merge branch 'fix_frames_decoder_destruction' into add_video_reader_d…

cec57de

…ecoder_gpu

Fix lint

18c3118

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Merge branch 'fix_frames_decoder_destruction' into add_video_reader_d…

519c034

…ecoder_gpu

Merge remote-tracking branch 'nvidia/main' into add_video_reader_deco…

468a3ad

…der_gpu

Fix linter

00c214f

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL self-assigned this Feb 10, 2022

JanuszL reviewed Feb 10, 2022

View reviewed changes

awolant added 2 commits February 10, 2022 16:02

More logging

14a9e07

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix indexing

30a02d4

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL reviewed Feb 10, 2022

View reviewed changes

dali/operators/reader/video_reader_decoder_op_test.cc Show resolved Hide resolved

awolant added 6 commits February 10, 2022 16:29

Fix indexing

9aace8b

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Improve logging

e50982c

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Make test not crash

4e109cd

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Make test test something

5b2cffb

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Tmp saving frames

8a9fc63

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Add labels support

2befc78

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Add SetumImpl, fix review comments

76c64ee

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL reviewed Feb 16, 2022

View reviewed changes

JanuszL approved these changes Feb 16, 2022

View reviewed changes

JanuszL self-requested a review February 16, 2022 17:59

Fix review comments

1994dda

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL reviewed Feb 16, 2022

View reviewed changes

Add stream support

493cd28

Signed-off-by: Albert Wolant <awolant@nvidia.com>

jantonguirao reviewed Feb 17, 2022

View reviewed changes

jantonguirao approved these changes Feb 17, 2022

View reviewed changes

Fix destructor

bfd225e

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix review comments

11bc332

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL approved these changes Feb 17, 2022

View reviewed changes

awolant merged commit a407c54 into NVIDIA:main Feb 18, 2022

cyyever pushed a commit to cyyever/DALI that referenced this pull request Feb 21, 2022

Add VideoReaderDecoder GPU (NVIDIA#3668)

68633bf

* Add VideoReaderDecoderGpu op Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL mentioned this pull request Mar 30, 2022

DALI 2022 roadmap #3774

Closed

cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022

Add VideoReaderDecoder GPU (NVIDIA#3668)

29423cc

* Add VideoReaderDecoderGpu op Signed-off-by: Albert Wolant <awolant@nvidia.com>

cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022

Add VideoReaderDecoder GPU (NVIDIA#3668)

974cf7d

* Add VideoReaderDecoderGpu op Signed-off-by: Albert Wolant <awolant@nvidia.com>

dschoerk mentioned this pull request Jul 11, 2022

Random clip per video #4039

Open

	auto &sample = current_batch[sample_id];
	auto &sample = GetSample(sample_id);

	int batch_size = current_batch.size();
	int batch_size = GetCurrBatchSize();

	auto &sample = current_batch[sample_id];
	auto &sample =GetSample(sample_id);;

	video_output.Resize(output_shape, current_batch[0]->data_.type());
	video_output.Resize(output_shape, GetSample(0);->data_.type());

		@@ -66,8 +66,12 @@ class DLL_PUBLIC FramesDecoderGpu : public FramesDecoder {

		int NextFramePts() { return index_[NextFrameIdx()].pts; }

		void SetCudaStream(cudaStream_t stream) { stream_ = stream; }

Add VideoReaderDecoder GPU #3668

Add VideoReaderDecoder GPU #3668

Conversation

awolant commented Feb 10, 2022 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL Feb 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awolant commented Feb 17, 2022

dali-automaton commented Feb 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Feb 17, 2022

awolant commented Feb 17, 2022

dali-automaton commented Feb 17, 2022

dali-automaton commented Feb 17, 2022

awolant commented Feb 17, 2022

dali-automaton commented Feb 17, 2022

dali-automaton commented Feb 17, 2022

awolant commented Feb 17, 2022

dali-automaton commented Feb 17, 2022

dali-automaton commented Feb 17, 2022

awolant commented Feb 10, 2022 •

edited

Loading

JanuszL Feb 10, 2022 •

edited

Loading