Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support checkpointing in experimental video reader #5180

Merged
merged 2 commits into from
Nov 23, 2023

Conversation

szkarpinski
Copy link
Collaborator

@szkarpinski szkarpinski commented Nov 21, 2023

Category:

New feature (non-breaking change which adds functionality)

Description:

This PR adds checkpointing support to fn.experimental.readers.video.

Additional information:

Adding checkpointing support is a simple task, similar for every loader and reader.

The following changes were required to enable checkpointing in video loader:

  • Make it inherit from Loader<..., supports_checkpointing=true>
  • Implement Skip() method. Skip should behave like ReadSample in terms of side effects, but should skip a sample instead of reading it. This method is used to implement fast-forwarding in Loader baseclass.
  • Change Reset to use virtual_shard_id_ (which is the shard currently processed)
    instead of shard_id_ (which is the initial shard requested by the user). See [2] for more details.

The following changes were required to enable checkpointing in video reader:

  • Make it inherit from DataReader<..., supports_checkpointing=true>. See [1] for more details.
  • Store the very first snapshot in the constructor with this->SetInitialSnapshot().
    Subsequent snapshots are saved by DataReader baseclass.

[1] Changing DataReader template parameters

Inheriting from DataReader<..., supports_checkpointing=true> might look strange in the diff, because the DataReader is defined as:

template <typename Backend, typename LoadTarget,
          typename ParseTarget = LoadTarget, bool supports_checkpointing = false>

and is used mostly as:

DataReader<Backend, Target>

so to enable checkpointing one needs to add two parameters:

DataReader<Backend, Target, Target, true>

[2] virtual_shard_id_

virtual_shard_id_ and shard_id_ might differ when stick_to_shard=False.

This change doesn't impact existing code, because Reset is normally called after each full pass over the data, so then those two are equal. It might happen that a checkpoint is saved when the reader was is processing a different shard that shard_id_, so to restore from such checkpoint we need to be able to reset to the current shard (virtual_shard_id_).

The same change was made in FileReader in #4954.

Affected modules and functionalities:

  • Video reader
  • Video loader

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-3702

Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>
@awolant awolant self-assigned this Nov 21, 2023
@@ -19,7 +19,7 @@
#include "dali/operators/reader/loader/video/video_loader_decoder_cpu.h"

namespace dali {
class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>> {
class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>, VideoSample<CPUBackend>, true> {
Copy link
Contributor

@mzient mzient Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to all checkpointing PRs:
I recommend adding a template alias:

template <typename Backend, typename LoadTarget, typename ParseTarget = LoadTarget>
using CheckpointingDataReader = DataReader<Backend, LoadTarget, ParseTarget, true>;

to avoid retyping the ParseTarget manually all over the place.

Suggested change
class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>, VideoSample<CPUBackend>, true> {
class VideoReaderDecoderCpu : public CheckpointingDataReader<CPUBackend, VideoSample<CPUBackend>> {

Alternatively - check if we even need a separate ParseTarget, or swap the order of arguments (chekpointing, parsetarget).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be removing the supports_checkpointing parameter soon, once we have full support, so it's rather temporary

@szkarpinski
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [10993505]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [10993505]: BUILD FAILED

Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>
@szkarpinski
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [10993812]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [10993812]: BUILD PASSED

@szkarpinski szkarpinski merged commit a326b46 into NVIDIA:main Nov 23, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants