Support checkpointing in Caffe reader #5181
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Category:
New feature (non-breaking change which adds functionality)
Description:
This PR adds checkpointing support to
fn.readers.caffe
andfn.readers.caffe2
.Additional information:
Adding checkpointing support is a simple task, similar for every loader and reader.
The following changes were required to enable checkpointing in LMDB loader:
Loader<..., supports_checkpointing=true>
Skip()
method.Skip
should behave likeReadSample
in terms of side effects, but should skip a sample instead of reading it. This method is used to implement fast-forwarding inLoader
baseclass.Reset
to usevirtual_shard_id_
(which is the shard currently processed)instead of
shard_id_
(which is the initial shard requested by the user). See [2] for more details.The following changes were required to enable checkpointing in caffe and caffe2 readers:
DataReader<..., supports_checkpointing=true>
. See [1] for more details.this->SetInitialSnapshot()
.Subsequent snapshots are saved by
DataReader
baseclass.[1] Changing
DataReader
template parametersInheriting from
DataReader<..., supports_checkpointing=true>
might look strange in the diff, because theDataReader
is defined as:and is used mostly as:
so to enable checkpointing one needs to add two parameters:
DataReader<Backend, Target, Target, true>
[2]
virtual_shard_id_
virtual_shard_id_
andshard_id_
might differ whenstick_to_shard=False
.This change doesn't impact existing code, because
Reset
is normally called after each full pass over the data, so then those two are equal. It might happen that a checkpoint is saved when the reader was is processing a different shard thatshard_id_
, so to restore from such checkpoint we need to be able to reset to the current shard (virtual_shard_id_
).The same change was made in
FileReader
in #4954.Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-3693