Support checkpointing in experimental video reader #5180

szkarpinski · 2023-11-21T08:41:31Z

Category:

New feature (non-breaking change which adds functionality)

Description:

This PR adds checkpointing support to fn.experimental.readers.video.

Additional information:

Adding checkpointing support is a simple task, similar for every loader and reader.

The following changes were required to enable checkpointing in video loader:

Make it inherit from Loader<..., supports_checkpointing=true>
Implement Skip() method. Skip should behave like ReadSample in terms of side effects, but should skip a sample instead of reading it. This method is used to implement fast-forwarding in Loader baseclass.
Change Reset to use virtual_shard_id_ (which is the shard currently processed)
instead of shard_id_ (which is the initial shard requested by the user). See [2] for more details.

The following changes were required to enable checkpointing in video reader:

Make it inherit from DataReader<..., supports_checkpointing=true>. See [1] for more details.
Store the very first snapshot in the constructor with this->SetInitialSnapshot().
Subsequent snapshots are saved by DataReader baseclass.

[1] Changing `DataReader` template parameters

Inheriting from DataReader<..., supports_checkpointing=true> might look strange in the diff, because the DataReader is defined as:

template <typename Backend, typename LoadTarget,
          typename ParseTarget = LoadTarget, bool supports_checkpointing = false>

and is used mostly as:

DataReader<Backend, Target>

so to enable checkpointing one needs to add two parameters:

DataReader<Backend, Target, Target, true>

[2] `virtual_shard_id_`

virtual_shard_id_ and shard_id_ might differ when stick_to_shard=False.

This change doesn't impact existing code, because Reset is normally called after each full pass over the data, so then those two are equal. It might happen that a checkpoint is saved when the reader was is processing a different shard that shard_id_, so to restore from such checkpoint we need to be able to reset to the current shard (virtual_shard_id_).

The same change was made in FileReader in #4954.

Affected modules and functionalities:

Video reader
Video loader

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-3702

Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>

mzient · 2023-11-21T11:53:20Z

dali/operators/reader/video_reader_decoder_cpu_op.h

@@ -19,7 +19,7 @@
 #include "dali/operators/reader/loader/video/video_loader_decoder_cpu.h"

 namespace dali {
-class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>> {
+class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>, VideoSample<CPUBackend>, true> {


This applies to all checkpointing PRs:
I recommend adding a template alias:

template <typename Backend, typename LoadTarget, typename ParseTarget = LoadTarget> using CheckpointingDataReader = DataReader<Backend, LoadTarget, ParseTarget, true>;

to avoid retyping the ParseTarget manually all over the place.

Suggested change

class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>, VideoSample<CPUBackend>, true> {

class VideoReaderDecoderCpu : public CheckpointingDataReader<CPUBackend, VideoSample<CPUBackend>> {

Alternatively - check if we even need a separate ParseTarget, or swap the order of arguments (chekpointing, parsetarget).

I'll be removing the supports_checkpointing parameter soon, once we have full support, so it's rather temporary

szkarpinski · 2023-11-23T08:48:58Z

!build

dali-automaton · 2023-11-23T08:51:07Z

CI MESSAGE: [10993505]: BUILD STARTED

dali-automaton · 2023-11-23T08:56:56Z

CI MESSAGE: [10993505]: BUILD FAILED

Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>

szkarpinski · 2023-11-23T09:04:37Z

!build

dali-automaton · 2023-11-23T09:10:10Z

CI MESSAGE: [10993812]: BUILD STARTED

dali-automaton · 2023-11-23T10:49:46Z

CI MESSAGE: [10993812]: BUILD PASSED

Add checkpointing to experimental video reader

c7cbc84

Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>

awolant self-assigned this Nov 21, 2023

dali-automaton assigned mzient Nov 21, 2023

mzient reviewed Nov 21, 2023

View reviewed changes

mzient approved these changes Nov 21, 2023

View reviewed changes

awolant approved these changes Nov 22, 2023

View reviewed changes

Fix linter

ee67a01

Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>

szkarpinski merged commit a326b46 into NVIDIA:main Nov 23, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support checkpointing in experimental video reader #5180

Support checkpointing in experimental video reader #5180

szkarpinski commented Nov 21, 2023 •

edited

Loading

mzient Nov 21, 2023 •

edited

Loading

szkarpinski Nov 23, 2023

szkarpinski commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

szkarpinski commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

	class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>, VideoSample<CPUBackend>, true> {
	class VideoReaderDecoderCpu : public CheckpointingDataReader<CPUBackend, VideoSample<CPUBackend>> {

Support checkpointing in experimental video reader #5180

Support checkpointing in experimental video reader #5180

Conversation

szkarpinski commented Nov 21, 2023 • edited Loading

Category:

Description:

Additional information:

[1] Changing DataReader template parameters

[2] virtual_shard_id_

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

mzient Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

szkarpinski Nov 23, 2023

Choose a reason for hiding this comment

szkarpinski commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

szkarpinski commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

dali-automaton commented Nov 23, 2023

szkarpinski commented Nov 21, 2023 •

edited

Loading

[1] Changing `DataReader` template parameters

[2] `virtual_shard_id_`

mzient Nov 21, 2023 •

edited

Loading